-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduction of experimental results #2
Comments
Hi, The F1 scores vary by about 2% (we could probably do better with some more hyperparameter tuning) because of several factors such as random initialization and negative sampling. That's why we report the average of 5 runs with random seeds in the paper. Also, we retrained the model on the combined train and dev set after hyperparameter tuning ('datasets/conll04/conll04_train_dev.json'). Could you please report the average of 5 runs with random seeds and trained on train+dev? |
I have retrained on train+dev with the same random seed I used before, and the test result is much better now, and close to or even better than the paper reported.
And I think, the average of 5 runs with random seeds and trained on train+dev will meet the paper's result. Thanks again! |
I am tring to reproduce this work. I have some doubts about it. |
I'm not sure what you mean with "role of seeds". By using a random seed, we ensure that weights are initialized differently in each run (also things like random sampling depend on the seed). Yes, we train the final model on the train+dev dataset. This is a common thing to do after hyperparameter tuning. |
Thanks for your reply. PS: It is a very excellent work and the code is very very good. I have study that for weeks! |
Yes, you should evaluate the provided model on the test set. However, the provided model is the best out of 5 runs, whereas we report the average of 5 runs in our paper (...and due to random weight initialization and sampling the performance varies between runs). That's why you get a better performance compared to the results we reported in our paper. Thanks :)! |
I understand. Thanks a lot. |
First of all, thanks for sharing this cleaned and object-oriented code! I have learned a lot from this repo. I even want to say
Wow, you can really code!
^_^I have training the model on CoNLL04 dataset with the default configuration, according to the README, and the test results as follows:
The test result is worse than the original paper, especially for
macro-average
metrics.Is it possible that the random
seed
is different? I just setseed=42
inexample_train.conf
Thanks!
The text was updated successfully, but these errors were encountered: