How embeddings clearing works on embedding storage set to GPU #1076

pommedeterresautee · 2019-09-06T07:33:17Z

My understanding is that after the batch, all dynamics embeddings are cleared from GPU memory. It makes sense as they are specific to each sentence and there is no obvious reason to keep them in memory.

However, when I run GPU storage mode on my computer I get a OOM exception on the 72th batch. My understanding is that the longest sentence are performed first (for padding optimization).
I use Fasttext + Flair embeddings

Did I get this error only because more and more Fasttext word are kept in GPU memory?
It s quite surprising as the full Fasttext matrix takes 1.2 Gb when serialized on hard drive and I am wondering if I am not missing something regarding Flair embeddings.

Looking at the memory consumption, it increases linearly with batches, I was not expecting such behaviour because of zipf law.

I suspect that this line is the cause of what I see https://github.com/zalandoresearch/flair/blob/ddba219c1deea9c7d12725741cf8d041b68ae738/flair/training_utils.py#L354 (in inference there is no gradient if I am right) so dynamics embeddings stay in memory

alanakbik · 2019-09-06T08:37:53Z

Hello @pommedeterresautee yes that is correct. However, the Flair embeddings are by default static so they are kept in GPU memory as well. The reason for this is that fine-tuning is by default disabled for FlairEmbeddings since across our experiments it worked much better to freeze the weights in the LM. This means all models that we distribute have non-dynamic Flair embeddings. So if you select 'gpu' this will slowly fill up your GPU memory since Flair embeddings are quite large.

pommedeterresautee · 2019-09-06T09:10:07Z

Ok I understand now, in my understanding dynamic (from comment # else delete only dynamic embeddings (otherwise autograd will keep everything in memory)) was about the embeddings which have been "dynamically" generated for each sentence (something Flair LM is doing but not Fasttext).

So follow up question: Flair embeddings are specific to each sentence and keeping them in GPU should not bring any speed improvement. Why someone would want to keep the Flair embeddings on GPU? Same reason than #1070 ?

Can you tell me where the Fasttext Matrix embeddings is stored with "none" option? I think in computer general RAM and I am wondering if it would make sense to have this kind of embeddings staying in GPU RAM (because they never change) and Flair cleared after each batch.

alanakbik · 2019-09-06T09:13:23Z

It makes sense to keep them in memory during training, since you do many epochs over the same training dataset. So if the embeddings for all sentences are already generated you can always reuse them in the next epoch. This is why 'gpu' is the fastest option for training and 'cpu' in most cases is faster than 'none' since generating embeddings is often slower than moving the tensors from cpu memory to gpu.

pommedeterresautee · 2019-09-06T09:15:04Z

Thanks a lot @alanakbik I focus so much on inference that I forgot to take training into account!
Will probably come up with new questions soon :-)

pommedeterresautee added the question Further information is requested label Sep 6, 2019

pommedeterresautee closed this as completed Sep 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How embeddings clearing works on embedding storage set to GPU #1076

How embeddings clearing works on embedding storage set to GPU #1076

pommedeterresautee commented Sep 6, 2019 •

edited

alanakbik commented Sep 6, 2019

pommedeterresautee commented Sep 6, 2019

alanakbik commented Sep 6, 2019

pommedeterresautee commented Sep 6, 2019

How embeddings clearing works on embedding storage set to GPU #1076

How embeddings clearing works on embedding storage set to GPU #1076

Comments

pommedeterresautee commented Sep 6, 2019 • edited

alanakbik commented Sep 6, 2019

pommedeterresautee commented Sep 6, 2019

alanakbik commented Sep 6, 2019

pommedeterresautee commented Sep 6, 2019

pommedeterresautee commented Sep 6, 2019 •

edited