QQP test evaluation is extremely slow #209

sleepinyourhat · 2018-07-20T18:26:20Z

One run has been on QQP Test for 2.5 hours without signs of progress. GPU usage is non-zero but low. This seems to have changed since #121.

@Jan21 @iftenney, any guesses? Did you verify that QQP test works?

Dev also appears to be quite slow, but I don't have numbers yet.

sleepinyourhat · 2018-07-20T18:26:37Z

(This is why we couldn't do the test set evaluation this week.)

Jan21 · 2018-07-20T18:50:22Z

investigating

sleepinyourhat · 2018-07-20T20:34:56Z

QQP test is quite big, so that's got to be most of it. a 100% slowdown in evaluation is okay, but would be pretty conspicuous here.

See also: #145

sleepinyourhat · 2018-07-20T20:35:04Z

I can confirm that QQP test eval works, though, so it may be safe to ignore.

Jan21 · 2018-07-20T20:46:03Z

ok

W4ngatang · 2018-07-23T20:13:26Z

Wow this is excruciatingly slow, 1hr+ for me on P100.

sleepinyourhat · 2018-07-24T19:25:08Z

This is pretty bad—it seems to be a problem even on dev, and it's almost certainly a result of #121.
@W4ngatang - do you have any bandwidth to see if there's an easy fix? We can get by without one, but Jan, Ian, and I are all booked for today/tomorrow.

W4ngatang · 2018-07-25T21:03:52Z

Just occurred to me: we should use a smarter batcher / iterator during eval (and validation). IIRC we got pretty decent speed ups.

sleepinyourhat · 2018-07-25T21:07:27Z

How easy would it be? Would it break streaming? If *very* and *no*, give it a try!

…

On Wed, Jul 25, 2018 at 5:04 PM Alex Wang ***@***.***> wrote: Just occurred to me: we should use a smarter batcher / iterator during eval (and validation). IIRC we got pretty decent speed ups. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOZWYhK_oRn6XMlsnGuRjbvl4w2ewbXks5uKN04gaJpZM4VYyTy> .

W4ngatang · 2018-07-25T21:18:28Z

I think it would not be that bad; we already use it during training. Wouldn't break anything.

W4ngatang · 2018-07-26T01:40:10Z

Smart batching doesn't seem to help because the batch utilization was already pretty high, I guess because the QQP are pretty similar in length.

What we could do is jack up the batch size during evaluation only?

sleepinyourhat · 2018-07-26T01:41:29Z

Worth a try, if it's easy to set up.

…

On Wed, Jul 25, 2018 at 9:40 PM Alex Wang ***@***.***> wrote: Smart batching doesn't seem to help because the batch utilization was already pretty high, I guess because the QQP are pretty similar in length. What we could do is jack up the batch size during evaluation only? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOZWSM-dzC-R1Ck0HWxQcIEFMnsoSmKks5uKR36gaJpZM4VYyTy> .

W4ngatang · 2018-07-26T02:17:21Z

What GPUs were you running on previously?

For some reason I'm getting pretty relatively fast eval times (~10m) for QQP test on 1080s...
Running untrained encoder (2 layers, 1024d, attn, batch size 128)

sleepinyourhat · 2018-07-26T02:30:29Z

Odd—this was also on NYU 1080s. I don't recall seeing any evaluation speed fixes since Friday... Much of the time was spent on sorting—is that process cached somehow?

…

On Wed, Jul 25, 2018 at 10:17 PM Alex Wang ***@***.***> wrote: What GPUs were you running on previously? For some reason I'm getting pretty relatively fast eval times (~10m) for QQP test on 1080s... — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOZWTi9QZv2r1Cypk-aH23hKF7j6819ks5uKSaxgaJpZM4VYyTy> .

Jan21 · 2018-07-26T02:36:30Z

Sorting is not cached...I was also waiting 2,5 hours on p100....I'm trying to debug it on CPU

sleepinyourhat · 2018-07-26T02:47:59Z

Alex—was that definitely test? QQP dev is much smaller than test.

…

On Wed, Jul 25, 2018 at 10:36 PM Jan Hůla ***@***.***> wrote: Sorting is not cached...I was also waiting 2,5 hours on p100....I'm trying to debug it on CPU — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOZWYuG2OhsOeQ3nO6KzoaRcrU0BzGmks5uKSsvgaJpZM4VYyTy> .

W4ngatang · 2018-07-26T02:51:47Z

Yes, on dev it's also very fast (~1m) and test is ~12m

Profiling now.

Maybe it's due to having an untrained encoder? But I'm running the same script on the p100 and it's much slower (though less than an hour).

pitrack · 2018-07-26T20:10:15Z

Is this really slow? it seems to go through a rate of ~60 batches/30 seconds, which is the same rate as qnli or mnli.

qqp is just a big dataset, (test is ~390K. the rest of the test sets are ~20K or less; val is 40k, the rest of the val sets are also <20K).

The logging helps a lot.

EDIT: Not going to close because others might also wonder why qqp takes for ever

sleepinyourhat · 2018-07-26T20:13:58Z

Fair enough. Feel free to close if there's nothing more to be done.

…

On Thu, Jul 26, 2018 at 4:10 PM Patrick Xia ***@***.***> wrote: Is this really slow? it seems to go through a rate of ~60 batches/30 seconds, which is the same rate as qnli or mnli. qqp is just a big dataset, (test is ~390K. the rest of the test sets are ~20K or less; val is 40k, the rest of the val sets are also <20K). The logging helps a lot. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOZWeF3rFy0yLE0MVvIA8Woo10AJxI_ks5uKiIngaJpZM4VYyTy> .

sleepinyourhat added the low-priority Only if you're bored. Ask Sam/Ian/Alex before starting. label Jul 20, 2018

W4ngatang mentioned this issue Jul 24, 2018

Log eval #234

Merged

pitrack added wontfix This will not be worked on and removed low-priority Only if you're bored. Ask Sam/Ian/Alex before starting. labels Jul 26, 2018

W4ngatang closed this as completed Apr 26, 2019

jeswan mentioned this issue Sep 17, 2020

[CLOSED] QQP test evaluation is extremely slow nyu-mll/jiant-v1-legacy#209

Closed

jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QQP test evaluation is extremely slow #209

QQP test evaluation is extremely slow #209

sleepinyourhat commented Jul 20, 2018

sleepinyourhat commented Jul 20, 2018

Jan21 commented Jul 20, 2018

sleepinyourhat commented Jul 20, 2018

sleepinyourhat commented Jul 20, 2018 •

edited

Jan21 commented Jul 20, 2018

W4ngatang commented Jul 23, 2018 •

edited

sleepinyourhat commented Jul 24, 2018

W4ngatang commented Jul 25, 2018

sleepinyourhat commented Jul 25, 2018 via email

W4ngatang commented Jul 25, 2018

W4ngatang commented Jul 26, 2018

sleepinyourhat commented Jul 26, 2018 via email

W4ngatang commented Jul 26, 2018 •

edited

sleepinyourhat commented Jul 26, 2018 via email

Jan21 commented Jul 26, 2018

sleepinyourhat commented Jul 26, 2018 via email

W4ngatang commented Jul 26, 2018 •

edited

pitrack commented Jul 26, 2018 •

edited

sleepinyourhat commented Jul 26, 2018 via email

QQP test evaluation is extremely slow #209

QQP test evaluation is extremely slow #209

Comments

sleepinyourhat commented Jul 20, 2018

sleepinyourhat commented Jul 20, 2018

Jan21 commented Jul 20, 2018

sleepinyourhat commented Jul 20, 2018

sleepinyourhat commented Jul 20, 2018 • edited

Jan21 commented Jul 20, 2018

W4ngatang commented Jul 23, 2018 • edited

sleepinyourhat commented Jul 24, 2018

W4ngatang commented Jul 25, 2018

sleepinyourhat commented Jul 25, 2018 via email

W4ngatang commented Jul 25, 2018

W4ngatang commented Jul 26, 2018

sleepinyourhat commented Jul 26, 2018 via email

W4ngatang commented Jul 26, 2018 • edited

sleepinyourhat commented Jul 26, 2018 via email

Jan21 commented Jul 26, 2018

sleepinyourhat commented Jul 26, 2018 via email

W4ngatang commented Jul 26, 2018 • edited

pitrack commented Jul 26, 2018 • edited

sleepinyourhat commented Jul 26, 2018 via email

sleepinyourhat commented Jul 20, 2018 •

edited

W4ngatang commented Jul 23, 2018 •

edited

W4ngatang commented Jul 26, 2018 •

edited

W4ngatang commented Jul 26, 2018 •

edited

pitrack commented Jul 26, 2018 •

edited