New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cant use nltk functions in parallel programs #1576
Comments
Try to import nltk inside of a worker function. In your case it should be |
Thanks @dimazest. |
Hey, I am having the similar issue. Is there any way to solve this error? I have tried importing nltk locally and tried the solution on this page thanks in advance!!!! |
Any news here? Is NLTK thread safe? |
It would help to have a minimal reproducible test case for this issue. |
@purificant simply to run wordnet synsets in multiple threads, generate these annoying problems. |
Is there any progress on this? I am running into this when using pytest-xdist to parallelize a series of pytest unit tests. In a module, I import wordnet, and that module is used for several parallelized tests. At the pytest-xdist level, I have no control over sequestering this into an isolated function to be called in each worker thread. |
I solved it by wrapping the wn.synsets(word) with a lock:
|
@ndvbd in the context of parallel unit test execution with pytest-xdist, this does not appear to work. I placed the import of wordnet into a session-level pytest fixture, and then made all the relevant functions accept the module itself as an argument, such as:
and then a test file will use the fixture
such as
I found that if I don't wrap the import of wordnet inside a pytest fixture like this, I can't even get pytest to proceed past the test collection phase. It seems to choke immediately at the first attempt to import my module where I am importing wordnet and calling Even after getting it to proceed past module imports, this fixture approach creates all kinds of crazy issues with parallel test execution, ranging from unexplainable deadlocks, crashing test workers, and errant attempts to re-download wordnet (even though it is verifiable installed in the nltk data folder being used by all the tests). I feel very much at a loss of what else to try. |
when I want to use nltk stemmer and lemmatizer in a multithreaded program only one thread works fine and other ones throw this exception :
Exception in thread 2:
Traceback (most recent call last):
File "C:\Users\Danial\AppData\Local\Programs\Python\Python35\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "C:\Users\Danial\AppData\Local\Programs\Python\Python35\lib\threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "D:/iaun/0_Tez/141095-TFATOwithdiscriminative/01/11_MultiThreadedDoctTermInfo.py", line 21, in myfunc
lemma = lemmatizer.lemmatize(stem)
File "C:\Users\Danial\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\stem\wordnet.py", line 40, in lemmatize
lemmas = wordnet._morphy(word, pos)
File "C:\Users\Danial\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\corpus\util.py", line 99, in getattr
self.__load()
File "C:\Users\Danial\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\corpus\util.py", line 73, in __load
args, kwargs = self.__args, self.__kwargs
AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
The text was updated successfully, but these errors were encountered: