why I can't reach your performance of baseline? #6

luckysheep861 · 2018-11-13T13:10:49Z

why I can't reach your performance of baseline?

kimiyoung · 2018-11-14T01:58:56Z

What results did you get? I would suggest deleting the cache files and reruning everything from scratch (make sure you follow the instructions closely). I ran through the process once and found the results reproducible.

vanzytay · 2018-11-14T08:12:46Z

hi @kimiyoung.

I also ran the entire codebase following the instructions. This was from a clean clone and building of dataset.

I got only a best dev F1 of 56.462817452313814.

I ran this a couple of times after that and it seems like the score is about 56+. EM is about 42.3+.

Any ideas on what might be the cause?

Thanks!

Arjunsankarlal · 2018-11-14T13:43:17Z

I got slightly better results, best_dev_F1 56.881756072546665. I too did it from scratch

kimiyoung · 2018-11-14T15:13:31Z

I believe it is a matter of variance. AFAIK, there could be three factors that led to this:

The current implementation of our model might be of high variance.
According to my previous experiments, the results vary for different types of GPUs even with the same random seed. I used an old Titan X to get these results.
I removed 100 training examples (of low quality) in v1.1 from the training set. This would result in some difference such as data batching which is controlled by the random seed.

I would suggest trying different random seeds to study the effects of model variance. Some random seeds might work better.

vanzytay · 2018-11-14T15:58:53Z

@kimiyoung Thanks for your reply!

Actually I tried both versions (with and without 100). Im guessing maybe it's an issue with system or dependencies. I'll try different seeds.

I have one question though, in your early experiments did you try different optimizers or just defaulted to SGD right from the start?

Thanks!

kimiyoung · 2018-11-14T19:20:31Z

@vanzytay I did not try other optimizers.

woshiyyya · 2018-12-22T09:29:09Z

I got a even worse result...
best_dev_F1 56.075717925566316, EM is about 42. (I ran the code on 1080 Ti.)

Vimos · 2019-01-09T03:26:04Z

1080Ti best_dev_F1 57.83286201117724

ag1988 · 2019-01-10T11:52:37Z

@kimiyoung Thanks for your work. Sure, will try to use other random seeds.

P.S. following are the results from the default run -
GPU: Titan Xp , Random Seed: default , Setting: distractor
Training (end): best_dev_F1 56.841881121141064

Evaluation: {'sp_em': 0.1950033760972316, 'joint_recall': 0.3910371172630571, 'f1': 0.5661927280885037, 'recall': 0.5830912961848933, 'joint_f1': 0.36950188461400907, 'sp_f1': 0.6090896536879039, 'joint_prec': 0.4142776503762301, 'em': 0.42822417285617825, 'sp_recall': 0.624765441625671, 'prec': 0.589656389075701, 'sp_prec': 0.664514002765185, 'joint_em': 0.09790681971640783}

luckysheep861 · 2019-01-10T11:55:07Z

After the update(V1.1), i got a acceptable result.
best_dev_F1 57.81 (on Tesla K40m)

YeDeming · 2019-01-21T02:39:58Z

I got best_dev_F1 56.37454881285825 (on 2080ti)

luckysheep861 closed this as completed Mar 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why I can't reach your performance of baseline? #6

why I can't reach your performance of baseline? #6

luckysheep861 commented Nov 13, 2018

kimiyoung commented Nov 14, 2018

vanzytay commented Nov 14, 2018

Arjunsankarlal commented Nov 14, 2018 •

edited

kimiyoung commented Nov 14, 2018

vanzytay commented Nov 14, 2018

kimiyoung commented Nov 14, 2018

woshiyyya commented Dec 22, 2018

Vimos commented Jan 9, 2019

ag1988 commented Jan 10, 2019

luckysheep861 commented Jan 10, 2019

YeDeming commented Jan 21, 2019

why I can't reach your performance of baseline? #6

why I can't reach your performance of baseline? #6

Comments

luckysheep861 commented Nov 13, 2018

kimiyoung commented Nov 14, 2018

vanzytay commented Nov 14, 2018

Arjunsankarlal commented Nov 14, 2018 • edited

kimiyoung commented Nov 14, 2018

vanzytay commented Nov 14, 2018

kimiyoung commented Nov 14, 2018

woshiyyya commented Dec 22, 2018

Vimos commented Jan 9, 2019

ag1988 commented Jan 10, 2019

luckysheep861 commented Jan 10, 2019

YeDeming commented Jan 21, 2019

Arjunsankarlal commented Nov 14, 2018 •

edited