add tests for CKY #53

zhaoyanpeng · 2020-03-07T23:10:30Z

This PR fixes several bugs in k-best parsing with dist.topk() and includes a simple test to test the function.

I made incremental changes so that existing modules relying on the CKY will not be affected.

srush · 2020-03-07T23:24:24Z

Hi Yanpeng,

This is great, thanks.

Mind running black on your new test file (https://github.com/psf/black) that should fix the test failure?
Is TopK fast enough for your use case? I was thinking of trying a different approach.

srush · 2020-03-07T23:25:03Z

Could that also be the issue here as well?

#50

zhaoyanpeng · 2020-03-07T23:37:32Z

Hi Yanpeng,

This is great, thanks.

Mind running black on your new test file (https://github.com/psf/black) that should fix the test failure?

working on it now.

Is TopK fast enough for your use case? I was thinking of trying a different approach.

Yes. But speed is not a big concern. TopK consumes quite a lot of GPU memory with large K. I will give you more details later.

zhaoyanpeng · 2020-03-07T23:53:09Z

Could that also be the issue here as well?

#50

Yes, it could be. I set cache=False in initializing Chart and fixed the issue in CKY. Could you double check if this can resolve the issue once for all?

srush · 2020-03-08T21:21:51Z

Gotcha. Maybe we can add a GPU topk.

zhaoyanpeng · 2020-03-08T22:25:49Z

Gotcha. Maybe we can add a GPU topk.

I tested topk on GTX 1080 with 11g GPU memory. Setting k = 15 parsing sentences of length above 25 will run out of memory.

It might be far from being practically usable since re-ranking based parsers require more than 15 best parses (e.g., Richard Socher et. al., 2013 require top 200 parses), and most sentences are longer than 25.

What is the GPU topk? The current topk can run on GPUs but is memory-hungry.

srush · 2020-03-08T22:49:52Z

Gotcha. Let me know if you personally need topk for research. I can take a look at some memory reduction ideas. I mostly included it because it was fun.

For some semirings I wrote custom cuda implementations to save memory. We could do that for topk.

zhaoyanpeng · 2020-03-09T00:13:52Z

Gotcha. Let me know if you personally need topk for research. I can take a look at some memory reduction ideas. I mostly included it because it was fun.

Yes. My project involves analyzing the top k parses of a parser. I can do k passes to get k best parses without memory issues. But I find this repo amazing! So It would be great to include a memory-efficient implementation of topk.

zhaoyanpeng added 2 commits March 7, 2020 22:12

minimize the CKY for debugging

01d0b97

add tests for the CKY

51ba7c5

srush mentioned this pull request Mar 7, 2020

Bug DependencyCRF function topk #50

Closed

fix formatting issues

b0d3f42

zhaoyanpeng force-pushed the test_cky branch 2 times, most recently from ce8123e to b0d3f42 Compare March 8, 2020 00:16

srush merged commit 67aa60d into harvardnlp:master Mar 8, 2020

zhaoyanpeng deleted the test_cky branch March 10, 2020 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tests for CKY #53

add tests for CKY #53

zhaoyanpeng commented Mar 7, 2020

srush commented Mar 7, 2020

srush commented Mar 7, 2020

zhaoyanpeng commented Mar 7, 2020

zhaoyanpeng commented Mar 7, 2020

srush commented Mar 8, 2020

zhaoyanpeng commented Mar 8, 2020

srush commented Mar 8, 2020

zhaoyanpeng commented Mar 9, 2020

add tests for CKY #53

add tests for CKY #53

Conversation

zhaoyanpeng commented Mar 7, 2020

srush commented Mar 7, 2020

srush commented Mar 7, 2020

zhaoyanpeng commented Mar 7, 2020

zhaoyanpeng commented Mar 7, 2020

srush commented Mar 8, 2020

zhaoyanpeng commented Mar 8, 2020

srush commented Mar 8, 2020

zhaoyanpeng commented Mar 9, 2020