-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Hey,
I encountered a bug while using GloveEmbeddings in a featurizer class.
Describe the bug
Multiple initializations and usage of WordEmbeddings, DocumentEmbeddings and their related embedding methods for a sentence, cause memory to add up, without releasing it after a function call. This leads to an OOM error in my application.
An example for reproduction can be found below.
To Reproduce
After each function call memory adds up and is not released, despite things being deleted and embeddings being cleared.
The clear_embeddings() method deletes objects in Sentence._embeddings as expected, but nonetheless no memory is released.
from flair.embeddings import WordEmbeddings, DocumentPoolEmbeddings
from flair.data import Sentence
from memory_profiler import profile
@profile
def featurizer_memory_test():
glove_embeddings = WordEmbeddings("de")
doc_embeddings = DocumentPoolEmbeddings([glove_embeddings])
sentence = Sentence("Hello World!")
glove_embeddings.embed(sentence)
sentence.clear_embeddings()
del glove_embeddings
del doc_embeddings
if __name__ == "__main__":
for i in range(2):
featurizer_memory_test()
The bug is related to the initialization of WordEmbeddings("de"). Once I initialize them in a class I don't run into issues.
from flair.embeddings import WordEmbeddings, DocumentPoolEmbeddings
from flair.data import Sentence
from memory_profiler import profile
class Embedder:
def __init__(self):
self.glove_embeddings = WordEmbeddings("de")
@profile
def featurizer_memory_test(self):
doc_embeddings = DocumentPoolEmbeddings([self.glove_embeddings])
sentence = Sentence("Hello World!")
doc_embeddings.embed(sentence)
sentence.clear_embeddings()
if __name__ == "__main__":
embedder = Embedder()
for i in range(2):
embedder.featurizer_memory_test()
However this is not a fix for me. I would have to write a singleton in my application, because despite it being a local variable in a function, whenever the function is done with execution the memory will not be released.
Expected behavior
I would assume that after each function call the memory gets released properly. Meaning when I run a memory profiler in this case, I would not have higher initial memory in the second run than in my first run.
Environment (please complete the following information):
- Ubuntu 18.04
- python 3.7.8
- flair 0.6.1.post1, flair 0.6.1, flair 0.7