Another 28% decrease in inference time in NER on GPU when CRF is used #1053

pommedeterresautee · 2019-09-01T13:12:24Z

Like PR #1038, this is an optimization of the Viterbi decoding.
On my French dataset, using the same setup than #1038 , I went from 63 seconds to 45 seconds (all_tag_prob set to False).

This time, no easy trick but careful rewriting of Viterbi decoding using Numpy only and lots of vectorization. It appeared than Pytorch Tensor is both slightly slower on CPU and offers less opportunities of vectorization and broadcasting.

2 interesting points:

this optimization will also benefit to users inferring on CPU only
the decrease in processing time is much stronger when all_tag_prob is set to True

FYI, I tried to parallelize with Ray and Numba.
Ray adds too much overhead compared to the time to process one Sentence and batching the minibatch introduces too much complexity for very small gain and lots of GPU memory issues.
Numba support of Numpy is too limited for now, most axis parameter are still not supported, and at the end it is not able to optimize the process.

If you see some other optimization opportunity in NER, let me know.

apply black

alanakbik · 2019-09-02T11:33:37Z

This is great, thanks very much for improving this! Also thanks for sharing your experience with Ray for parallelization - we haven't yet had a chance to take a deeper look at CPU parallelization to improve speed but its been on our list for a long while.

alanakbik · 2019-09-02T11:33:43Z

👍

yosipk · 2019-09-02T11:34:58Z

👍

pommedeterresautee · 2019-09-02T11:41:42Z

@alanakbik it s a pleasure to work on this project, the codebase is quite clean :-)
If you have made tests on your v100, can you tell me what was the gain in time?

alanakbik · 2019-09-02T12:03:26Z

@pommedeterresautee one evaluation run on the CoNLL-03 test split with the current 'ner' model on a v100 is now done in ~14 seconds (was ~15 before this PR and ~20 before your last PR) and predicting over train+test is now done in ~41 seconds (was ~60 before this PR and ~95 before your last PR), so we are seeing great speed improvements :)

(We get these numbers when setting in_memory=True for the dataset and use embeddings_storage_mode='gpu'.)

pommedeterresautee added 7 commits September 1, 2019 09:53

rewrite of viterbi decode in Numpy to accelerate execution

9505ec3

fix a small bug

05a63d3

apply black

refactoring

a2e59a4

fix accuracy bug

49ab6d6

apply black

fabf3db

fix some integration tests

62f1834

fix test_keep_word_embeddings when executed on GPU

812d4ee

pommedeterresautee changed the title ~~Another 28% decrease in inference time in NER when CRF is used~~ Another 28% decrease in inference time in NER on GPU when CRF is used Sep 1, 2019

pommedeterresautee added 2 commits September 1, 2019 19:15

small optimization

4b669e2

remove dead code

90e7ed7

yosipk merged commit 005ae90 into flairNLP:master Sep 2, 2019

pommedeterresautee mentioned this pull request Sep 9, 2019

RTX2080 GPU is only being used at 30% - whilst a single CPU core is maxed out #1080

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Another 28% decrease in inference time in NER on GPU when CRF is used #1053

Another 28% decrease in inference time in NER on GPU when CRF is used #1053

pommedeterresautee commented Sep 1, 2019

alanakbik commented Sep 2, 2019

alanakbik commented Sep 2, 2019

yosipk commented Sep 2, 2019

pommedeterresautee commented Sep 2, 2019

alanakbik commented Sep 2, 2019

Another 28% decrease in inference time in NER on GPU when CRF is used #1053

Another 28% decrease in inference time in NER on GPU when CRF is used #1053

Conversation

pommedeterresautee commented Sep 1, 2019

alanakbik commented Sep 2, 2019

alanakbik commented Sep 2, 2019

yosipk commented Sep 2, 2019

pommedeterresautee commented Sep 2, 2019

alanakbik commented Sep 2, 2019