Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why I can't reach your performance of baseline? #6

Closed
luckysheep861 opened this issue Nov 13, 2018 · 11 comments
Closed

why I can't reach your performance of baseline? #6

luckysheep861 opened this issue Nov 13, 2018 · 11 comments

Comments

@luckysheep861
Copy link

why I can't reach your performance of baseline?

@kimiyoung
Copy link
Collaborator

What results did you get? I would suggest deleting the cache files and reruning everything from scratch (make sure you follow the instructions closely). I ran through the process once and found the results reproducible.

@vanzytay
Copy link

hi @kimiyoung.

I also ran the entire codebase following the instructions. This was from a clean clone and building of dataset.

I got only a best dev F1 of 56.462817452313814.

I ran this a couple of times after that and it seems like the score is about 56+. EM is about 42.3+.

Any ideas on what might be the cause?

Thanks!

@Arjunsankarlal
Copy link

Arjunsankarlal commented Nov 14, 2018

I got slightly better results, best_dev_F1 56.881756072546665. I too did it from scratch

@kimiyoung
Copy link
Collaborator

I believe it is a matter of variance. AFAIK, there could be three factors that led to this:

  1. The current implementation of our model might be of high variance.
  2. According to my previous experiments, the results vary for different types of GPUs even with the same random seed. I used an old Titan X to get these results.
  3. I removed 100 training examples (of low quality) in v1.1 from the training set. This would result in some difference such as data batching which is controlled by the random seed.

I would suggest trying different random seeds to study the effects of model variance. Some random seeds might work better.

@vanzytay
Copy link

@kimiyoung Thanks for your reply!

Actually I tried both versions (with and without 100). Im guessing maybe it's an issue with system or dependencies. I'll try different seeds.

I have one question though, in your early experiments did you try different optimizers or just defaulted to SGD right from the start?

Thanks!

@kimiyoung
Copy link
Collaborator

@vanzytay I did not try other optimizers.

@woshiyyya
Copy link

I got a even worse result...
best_dev_F1 56.075717925566316, EM is about 42. (I ran the code on 1080 Ti.)

@Vimos
Copy link

Vimos commented Jan 9, 2019

1080Ti best_dev_F1 57.83286201117724

@ag1988
Copy link

ag1988 commented Jan 10, 2019

@kimiyoung Thanks for your work. Sure, will try to use other random seeds.

P.S. following are the results from the default run -
GPU: Titan Xp , Random Seed: default , Setting: distractor
Training (end): best_dev_F1 56.841881121141064

Evaluation: {'sp_em': 0.1950033760972316, 'joint_recall': 0.3910371172630571, 'f1': 0.5661927280885037, 'recall': 0.5830912961848933, 'joint_f1': 0.36950188461400907, 'sp_f1': 0.6090896536879039, 'joint_prec': 0.4142776503762301, 'em': 0.42822417285617825, 'sp_recall': 0.624765441625671, 'prec': 0.589656389075701, 'sp_prec': 0.664514002765185, 'joint_em': 0.09790681971640783}

@luckysheep861
Copy link
Author

After the update(V1.1), i got a acceptable result.
best_dev_F1 57.81 (on Tesla K40m)

@YeDeming
Copy link

I got best_dev_F1 56.37454881285825 (on 2080ti)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants