Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU storing option is slower than None storing during inference on GPU #1070

Closed
pommedeterresautee opened this issue Sep 5, 2019 · 2 comments · Fixed by #1074
Closed

CPU storing option is slower than None storing during inference on GPU #1070

pommedeterresautee opened this issue Sep 5, 2019 · 2 comments · Fixed by #1074
Labels
question Further information is requested

Comments

@pommedeterresautee
Copy link
Contributor

Regarding storing of embedding, my understanding is the following:

  • gpu: dynamic embeddings (Flair LM) are deleted after each batch, static embeddings are kept on GPU
  • cpu: dynamic embeddings (Flair LM) are deleted after each batch, static embeddings are moved to RAM, and if required later, moved back on CPU
  • none: dynamic and static embeddings are deleted after each batch

Expected time during inference:
none > cpu > gpu
However, making some measures, I saw:
cpu (43s) > none (35s) > gpu (34s)

So I apply to the very same code, a profiler, and got the following graphs:

None
anonymisation16_none

CPU
anonymisation15_cpu

GPU
anonymisation17_gpu

Attention: reported times are higher than real times because of the profiler.
What appears is that the move back Tensor to CPU is costly (on my configuration, for my dataset), that's why, for me, during inference, none is a better option.

I expect that my situation is not isolated, and most users of Flair will find the same results.

So I am wondering, if a warning should be raised if CPU option is used during inference on GPU.

My questions are:

  • can you check if you find the same pattern on your computers?
  • do you want me to push a PR on that? (plus short explanation in the documentation, python and tuto)
@pommedeterresautee pommedeterresautee added the question Further information is requested label Sep 5, 2019
@pommedeterresautee pommedeterresautee changed the title CPU storing is slower than None storing CPU storing option is slower than None storing during inference on GPU Sep 5, 2019
@alanakbik
Copy link
Collaborator

Hi @pommedeterresautee thanks for sharing this analysis. What profiler are you using?

Yes, moving tensor to/from GPU is costly, which is why by default we set 'none' as storage mode for the predict() method. The only reason to change this to something else would be if we want to not only use the predictions but also the embeddings after prediction.

A PR adding a warning would be great - we should probably also point this out in the docs that during inference you almost always want to use storage mode 'none'.

@pommedeterresautee
Copy link
Contributor Author

The profiler is cProfile and the graph is produced by PyCharm (paid version, I don't know if the community version have such feature).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants