Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another 28% decrease in inference time in NER on GPU when CRF is used #1053

Merged
merged 9 commits into from Sep 2, 2019

Conversation

pommedeterresautee
Copy link
Contributor

Like PR #1038, this is an optimization of the Viterbi decoding.
On my French dataset, using the same setup than #1038 , I went from 63 seconds to 45 seconds (all_tag_prob set to False).

This time, no easy trick but careful rewriting of Viterbi decoding using Numpy only and lots of vectorization. It appeared than Pytorch Tensor is both slightly slower on CPU and offers less opportunities of vectorization and broadcasting.

2 interesting points:

  • this optimization will also benefit to users inferring on CPU only
  • the decrease in processing time is much stronger when all_tag_prob is set to True

FYI, I tried to parallelize with Ray and Numba.
Ray adds too much overhead compared to the time to process one Sentence and batching the minibatch introduces too much complexity for very small gain and lots of GPU memory issues.
Numba support of Numpy is too limited for now, most axis parameter are still not supported, and at the end it is not able to optimize the process.

If you see some other optimization opportunity in NER, let me know.

@pommedeterresautee pommedeterresautee changed the title Another 28% decrease in inference time in NER when CRF is used Another 28% decrease in inference time in NER on GPU when CRF is used Sep 1, 2019
@alanakbik
Copy link
Collaborator

This is great, thanks very much for improving this! Also thanks for sharing your experience with Ray for parallelization - we haven't yet had a chance to take a deeper look at CPU parallelization to improve speed but its been on our list for a long while.

@alanakbik
Copy link
Collaborator

👍

1 similar comment
@yosipk
Copy link
Collaborator

yosipk commented Sep 2, 2019

👍

@yosipk yosipk merged commit 005ae90 into flairNLP:master Sep 2, 2019
@pommedeterresautee
Copy link
Contributor Author

@alanakbik it s a pleasure to work on this project, the codebase is quite clean :-)
If you have made tests on your v100, can you tell me what was the gain in time?

@alanakbik
Copy link
Collaborator

@pommedeterresautee one evaluation run on the CoNLL-03 test split with the current 'ner' model on a v100 is now done in ~14 seconds (was ~15 before this PR and ~20 before your last PR) and predicting over train+test is now done in ~41 seconds (was ~60 before this PR and ~95 before your last PR), so we are seeing great speed improvements :)

(We get these numbers when setting in_memory=True for the dataset and use embeddings_storage_mode='gpu'.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants