train/dev/test split #20

ghost · 2018-11-14T07:22:53Z

Why your dev triples are included in training data?

code/data/preprocessing_scripts/nell.py:
out_file.write(e1+'\t'+r+'\t'+e2+'\n')
if np.random.normal() > 0.2:
----dev.write(e1+'\t'+r+'\t'+e2+'\n')

Theoretically you are supposed to split it into 2 datasets (train/test) or 3 (train/dev/test) without overlaps. Please explain the reason behind this. Thank you.

shehzaadzd · 2018-11-14T15:00:58Z

The NELL dataset consisted of a train/test split. The dev was created for hyperparameter tuning. The preprocessing script is not complete. There was another script used to remove out the duplicates and inverse duplicates. You can use the dev set we created (https://github.com/shehzaadzd/MINERVA/blob/master/datasets/data_preprocessed/nell/dev.txt).

ghost · 2018-11-14T15:12:33Z

Thank you for your response. I understand you split NELL train data into train and dev sets, Would you please let me know what was the proportion of the train/dev split you used in your paper? Because I am trying to reproduce the experimental results on your paper. I notice you didn't mention it on your paper. Thank you.

shehzaadzd · 2018-11-14T15:23:46Z

We tried to extract 20% but after removing duplicates (and inverse duplicates) and removing triples which contained the only occurrence of an entity, we were left with ~500 triples. You could use https://github.com/shehzaadzd/MINERVA/blob/master/datasets/data_preprocessed/nell/dev.txt to reproduce our results.

ghost · 2018-11-14T15:25:25Z

I see, appreciate it.

shehzaadzd closed this as completed Nov 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train/dev/test split #20

train/dev/test split #20

ghost commented Nov 14, 2018 •

edited by ghost

shehzaadzd commented Nov 14, 2018

ghost commented Nov 14, 2018

shehzaadzd commented Nov 14, 2018

ghost commented Nov 14, 2018

train/dev/test split #20

train/dev/test split #20

Comments

ghost commented Nov 14, 2018 • edited by ghost

shehzaadzd commented Nov 14, 2018

ghost commented Nov 14, 2018

shehzaadzd commented Nov 14, 2018

ghost commented Nov 14, 2018

ghost commented Nov 14, 2018 •

edited by ghost