Predictions written to disk don't match metrics #343

sleepinyourhat · 2018-08-10T17:33:47Z

I just tried evaluating Alex's 72.9 run. The numbers reported at the end of the eval run match what's on the sheet, so I was able to reproduce those, but the actual outputs written to disk for the dev set don't make sense.

WNLI: log reports acc of 12.7%. comparing outputs to ground truth yields 38/71=53.5%.
CoLA: Similar, but not as extreme. log reports acc of 74.4. Model outputs get me ~73%.

The ground truth prediction column is the same in the output files and in the original input TSVs, so it's not a loading issue. It may be relevant that WNLI is so short—if something is wrong in the first or last batch, that would be magnified for WNLI.

sleepinyourhat · 2018-08-10T17:43:38Z

Very likely related to #341.
Some chance this is related to #342. Maaaybe related to #185.

sleepinyourhat · 2018-08-10T17:43:59Z

I'll try reproducing this, esp w/o ELMo.

sleepinyourhat · 2018-08-10T17:56:12Z

Failed to reproduce without save/restore, even with ELMo. Will try with save/restore.

sleepinyourhat · 2018-08-10T18:40:59Z

Checked that Alex's predictions and mine are different (both after restore), even though the reported accuracy scores are the some.

sleepinyourhat · 2018-08-10T18:44:28Z

Couldn't reproduce even with save/restore. I suspect this is an index-related bug.

W4ngatang · 2018-08-10T18:54:24Z

What's the path to your predictions on NYU? And these are dev predictions?

sleepinyourhat · 2018-08-10T18:56:45Z

I just overwrote them on the server. One sec.

sleepinyourhat · 2018-08-10T19:01:47Z

/nfs/jsalt/exp/sam-worker2/final/mtl-glue-elmo/wnli_val.tsv

I rebuilt all pickles from scratch this time to be sure. Do you have stale pickles? That's my only guess at this point for this particular run mismatch...

index	prediction	sentence_1	sentence_2	true_label
0	0	The drain is clogged with hair . It has to be cleaned .	The hair has to be cleaned .	0
1	1	Jane knocked on Susan &apos;s door but she did not answer .	Susan did not answer .	1
2	1	Beth didn &apos;t get angry with Sally , who had cut her off , because she stopped and counted to ten .	Sally stopped and counted to ten .	0
3	0	No one joins Facebook to be sad and lonely . But a new study from the University of Wisconsin psychologist George Lincoln argues that that &apos;s exactly how it makes us feel .	That &apos;s exactly how Facebook makes us feel .	1
4	1	The man couldn &apos;t lift his son because he was so heavy .	The son was so heavy .	1
5	1	Susan knew that Ann &apos;s son had been in a car accident , so she told her about it .	Ann told her about it .	0
6	1	When Tommy dropped his ice cream , Timmy giggled , so father gave him a stern look .	Father gave Timmy a stern look .	1
7	0	There is a pillar between me and the stage , and I can &apos;t see around it .	I can &apos;t see around the pillar .	1
8	0	Look ! There is a minnow swimming right below that duck ! It had better get away to safety fast !	The duck had better get away to safety fast !	0
9	0	Bernard , who had not told the government official that he was less than 21 when he filed for a homestead claim , did not consider that he had done anything dishonest . Still , anyone who knew	Anyone who knew that he was 19 years old could take his claim away from anyone .	0
10	0	When the sponsors of the bill got to the town hall , they were surprised to find that the room was full of opponents . They were very much in the majority .	The sponsors were very much in the majority .	0
11	1	I can &apos;t cut that tree down with that axe ; it is too thick .	The tree is too thick .	1
12	0	The large ball crashed right through the table because it was made of styrofoam .	The large ball was made of styrofoam .	0
13	0	I tried to paint a picture of an orchard , with lemons in the lemon trees , but they came out looking more like light bulbs .	The lemon trees came out looking more like light bulbs .	0
14	1	Madonna fired her trainer because she slept with her boyfriend .	Madonna slept with her boyfriend .	0
15	1	If the con artist has succeeded in fooling Sam , he would have gotten a lot of money .	Sam would have gotten a lot of money .	0
16	1	The lawyer asked the witness a question , but he was reluctant to repeat it .	The lawyer was reluctant to repeat it .	1
17	0	Everyone really loved the oatmeal cookies ; only a few people liked the chocolate chip cookies . Next time , we should make fewer of them .	We should make fewer of the oatmeal cookies .	0
18	0	Bob collapsed on the sidewalk . Soon he saw Carl coming to help . He was very ill .	Carl was very ill .	0
19	0	Mr. Moncrieff visited Chester &apos;s luxurious New York apartment , thinking that it belonged to his son Edward . The result was that Mr. Moncrieff has decided to cancel Edward &apos;s allowance on the ground that he no	He no longer requires Chester &apos;s financial support .	0
20	0	Tatyana knew that Grandma always enjoyed serving an abundance of food to her guests . Now Tatyana watched as Grandma gathered Tatyana &apos;s small mother into a wide , scrawny embrace and then propelled her to the table	Grandma gathered Tatyana &apos;s small mother into a wide , scrawny embrace and then propelled Tatyana to the table .	0
21	0	Ann asked Mary what time the library closes , because she had forgotten .	Mary had forgotten .	0
22	1	George got free tickets to the play , but he gave them to Eric , even though he was particularly eager to see it .	Eric was particularly eager to see it .	0
23	0	The delivery truck zoomed by the school bus because it was going so slow .	The school bus was going so slow .	1
24	1	Jane gave Joan candy because she was hungry .	Jane was hungry .	0
25	0	Mark heard Steve &apos;s feet going down the ladder . The door of the shop closed after him . He ran to look out the window .	Mark ran to look out the window .	1
26	1	Although they ran at about the same speed , Sue beat Sally because she had such a good start .	Sally had such a good start .	0
27	1	Fred is the only man still alive who remembers my great-grandfather . He is a remarkable man .	Fred is a remarkable man .	1
28	0	Always before , Larry had helped Dad with his work . But he could not help him now , for Dad said that his boss at the railroad company would not want anyone but him to work in	Larry could not help him now .	1
29	0	George got free tickets to the play , but he gave them to Eric , because he was particularly eager to see it .	Eric was particularly eager to see it .	1
30	0	They broadcast an announcement , but a subway came into the station and I couldn &apos;t hear over it .	I couldn &apos;t hear the subway .	1
31	0	Grant worked hard to harvest his beans so he and his family would have enough to eat that winter , His friend Henry let him stack them in his barn where they would dry . Later , he	Later , he and Tatyana would shell them and cook them for the beans &apos; Sunday dinners .	0
32	1	Beth didn &apos;t get angry with Sally , who had cut her off , because she stopped and apologized .	Sally stopped and apologized .	1
33	0	There is a pillar between me and the stage , and I can &apos;t see it .	I can &apos;t see around the stage .	1
34	1	The older students were bullying the younger ones , so we rescued them .	We rescued the older students .	0
35	1	I &apos;m sure that my map will show this building ; it is very famous .	The building is very famous .	1
36	0	I tried to paint a picture of an orchard , with lemons in the lemon trees , but they came out looking more like telephone poles .	The lemons came out looking more like telephone poles .	0
37	0	Always before , Larry had helped Dad with his work . But he could not help him now , for Dad said that his boss at the railroad company would not want anyone but him to work in	He could not help Larry now .	0
38	0	Bernard , who had not told the government official that he was less than 21 when he filed for a homestead claim , did not consider that he had done anything dishonest . Still , anyone who knew	Bernard was less than 21 when he filed for a homestead claim .	1
39	0	The drain is clogged with hair . It has to be cleaned .	The drain has to be cleaned .	1
40	0	The drain is clogged with hair . It has to be removed .	The drain has to be removed .	0
41	0	Emma did not pass the ball to Janie although she was open .	She saw that Emma was open .	0
42	1	John was doing research in the library when he heard a man humming and whistling . He was very annoying .	John was very annoying .	0
43	0	The city councilmen refused the demonstrators a permit because they advocated violence .	The demonstrators advocated violence .	1
44	1	I couldn &apos;t put the pot on the shelf because it was too high .	The pot was too high .	0
45	0	The police arrested all of the gang members . They were trying to run the drug trade in the neighborhood .	The police were trying to run the drug trade in the neighborhood .	0
46	0	The cat was lying by the mouse hole waiting for the mouse , but it was too cautious .	The mouse was too cautious .	1
47	1	Sam tried to paint a picture of shepherds with sheep , but they ended up looking more like golfers .	The sheep ended up looking more like golfers .	0
48	1	Fred covered his eyes with his hands , because the wind was blowing sand around . He opened them when the wind stopped .	He opened his eyes when the wind stopped .	1
49	0	When they had eventually calmed down a bit , and had gotten home , Mr. Farley put the magic pebble in an iron safe . Some day they might want to use it , but really for now	Some day they might want to use the safe .	0
50	1	The cat was lying by the mouse hole waiting for the mouse , but it was too cautious .	The cat was too cautious .	0
51	0	By rolling over in her upper berth , Tatyana could look over the edge of it and see her mother plainly . How very small and straight and rigid she lay in the bunk below ! Her eyes	How very small and straight and rigid her mother lay in the bunk below !	1
52	0	Fred was supposed to run the dishwasher , but he put it off , because he wanted to watch TV . But the show turned out to be boring , so he changed his mind and turned it	He changed his mind and turned the dishwasher off .	0
53	0	Sam &apos;s drawing was hung just above Tina &apos;s and it did look much better with another one below it .	Tina &apos;s drawing did look much better with another one below it .	0
54	0	Papa looked down at the children &apos;s faces , so puzzled and sad now . It was bad enough that they had to be denied so many things because he couldn &apos;t afford them .	He couldn &apos;t afford the things .	1
55	1	Sam took French classes from Adam , because he was known to speak it fluently .	Sam was known to speak it fluently .	0
56	1	The journalists interviewed the stars of the new movie . They were very persistent , so the interview lasted for a long time .	The journalists were very persistent , so the interview lasted for a long time .	1
57	0	They broadcast an announcement , but a subway came into the station and I couldn &apos;t hear it .	I couldn &apos;t hear the subway .	0
58	0	Fred and Alice had very warm down coats , but they were not enough for the cold in Alaska .	coats were not enough for the cold in Alaska .	1
59	0	The father carried the sleeping boy in his arms .	The father carried the sleeping boy in the father &apos;s arms .	1
60	0	Madonna fired her trainer because she slept with her boyfriend .	She slept with the trainer &apos;s boyfriend .	0
61	0	Papa looked down at the children &apos;s faces , so puzzled and sad now . It was bad enough that they had to be denied so many things because he couldn &apos;t afford them .	He couldn &apos;t afford the children .	0
62	1	If the con artist has succeeded in fooling Sam , he would have gotten a lot of money .	The con artist would have gotten a lot of money .	1
63	0	Mr. Taylor was a man of uncertain temper and his general tendency was to think that David was a poor chump and that whatever step he took in any direction on his own account was just another proof	Any direction on his own account was just another proof of David &apos;s innate idiocy .	1
64	0	This morning , Joey built a sand castle on the beach , and put a toy flag in the highest tower , but this afternoon the wind knocked it down .	This afternoon the wind knocked The sand castle down .	0
65	0	In the storm , the tree fell down and crashed through the roof of my house . Now , I have to get it repaired .	Now I have to get The roof repaired .	1
66	0	I poured water from the bottle into the cup until it was empty .	The bottle was empty .	1
67	1	Alice looked for her friend Jade in the crowd . Since she always wears a red turban , Alice spotted her quickly .	Since Alice always wears a red turban , Alice spotted her quickly .	0
68	1	The dog chased the cat , which ran up a tree . It waited at the top .	The cat waited at the top .	1
69	0	It was a summer afternoon , and the dog was sitting in the middle of the lawn . After a while , it got up and moved to a spot under the tree , because it was hot	The spot under the tree was hot .	0
70	0	George got free tickets to the play , but he gave them to Eric , because he was not particularly eager to see it .	Eric was not particularly eager to see it .	0

sleepinyourhat · 2018-08-10T19:06:32Z

Aaagh nevermind. There was a filename mismatch, and that was from the other run.

sleepinyourhat · 2018-08-10T19:07:08Z

(The predictions and metrics do, in fact, match.)

W4ngatang · 2018-08-10T19:07:26Z

Cool, I was just evaluating my CoLA predictions offline and they seemed to match.

sleepinyourhat · 2018-08-10T19:09:55Z

Tx. Sorry for the scare. Now I have very little idea what's wrong, but this nondeterminism thing is worrying.

…

On Fri, Aug 10, 2018 at 3:07 PM Alex Wang ***@***.***> wrote: Cool, I was just evaluating my CoLA predictions offline and they seemed to match. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#343 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOZWU1l7aBYkMVYb6Tu_Jfw3RBXVgUbks5uPdnvgaJpZM4V4ju4> .

sleepinyourhat added bug Something isn't working help wanted Extra attention is needed fix-before-release labels Aug 10, 2018

sleepinyourhat closed this as completed Aug 10, 2018

jeswan mentioned this issue Sep 17, 2020

[CLOSED] Predictions written to disk don't match metrics nyu-mll/jiant-v1-legacy#343

Closed

jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions written to disk don't match metrics #343

Predictions written to disk don't match metrics #343

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

W4ngatang commented Aug 10, 2018 •

edited

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

W4ngatang commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018 via email

Predictions written to disk don't match metrics #343

Predictions written to disk don't match metrics #343

Comments

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

W4ngatang commented Aug 10, 2018 • edited

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018

W4ngatang commented Aug 10, 2018

sleepinyourhat commented Aug 10, 2018 via email

W4ngatang commented Aug 10, 2018 •

edited