Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train/dev/test split #20

Closed
ghost opened this issue Nov 14, 2018 · 4 comments
Closed

train/dev/test split #20

ghost opened this issue Nov 14, 2018 · 4 comments

Comments

@ghost
Copy link

ghost commented Nov 14, 2018

Why your dev triples are included in training data?

code/data/preprocessing_scripts/nell.py:
out_file.write(e1+'\t'+r+'\t'+e2+'\n')
if np.random.normal() > 0.2:
----dev.write(e1+'\t'+r+'\t'+e2+'\n')

Theoretically you are supposed to split it into 2 datasets (train/test) or 3 (train/dev/test) without overlaps. Please explain the reason behind this. Thank you.

@shehzaadzd
Copy link
Owner

The NELL dataset consisted of a train/test split. The dev was created for hyperparameter tuning. The preprocessing script is not complete. There was another script used to remove out the duplicates and inverse duplicates. You can use the dev set we created (https://github.com/shehzaadzd/MINERVA/blob/master/datasets/data_preprocessed/nell/dev.txt).

@ghost
Copy link
Author

ghost commented Nov 14, 2018

Thank you for your response. I understand you split NELL train data into train and dev sets, Would you please let me know what was the proportion of the train/dev split you used in your paper? Because I am trying to reproduce the experimental results on your paper. I notice you didn't mention it on your paper. Thank you.

@shehzaadzd
Copy link
Owner

We tried to extract 20% but after removing duplicates (and inverse duplicates) and removing triples which contained the only occurrence of an entity, we were left with ~500 triples. You could use https://github.com/shehzaadzd/MINERVA/blob/master/datasets/data_preprocessed/nell/dev.txt to reproduce our results.

@ghost
Copy link
Author

ghost commented Nov 14, 2018

I see, appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant