Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cant use nltk functions in parallel programs #1576

Closed
ghost opened this issue Jan 3, 2017 · 9 comments
Closed

cant use nltk functions in parallel programs #1576

ghost opened this issue Jan 3, 2017 · 9 comments

Comments

@ghost
Copy link

ghost commented Jan 3, 2017

when I want to use nltk stemmer and lemmatizer in a multithreaded program only one thread works fine and other ones throw this exception :

Exception in thread 2:
Traceback (most recent call last):
File "C:\Users\Danial\AppData\Local\Programs\Python\Python35\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "C:\Users\Danial\AppData\Local\Programs\Python\Python35\lib\threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "D:/iaun/0_Tez/141095-TFATOwithdiscriminative/01/11_MultiThreadedDoctTermInfo.py", line 21, in myfunc
lemma = lemmatizer.lemmatize(stem)
File "C:\Users\Danial\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\stem\wordnet.py", line 40, in lemmatize
lemmas = wordnet._morphy(word, pos)
File "C:\Users\Danial\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\corpus\util.py", line 99, in getattr
self.__load()
File "C:\Users\Danial\AppData\Local\Programs\Python\Python35\lib\site-packages\nltk\corpus\util.py", line 73, in __load
args, kwargs = self.__args, self.__kwargs
AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'

@dimazest
Copy link
Contributor

dimazest commented Jan 4, 2017

Try to import nltk inside of a worker function. In your case it should be myfunc.

@stevenbird
Copy link
Member

Thanks @dimazest.
Is this now resolved @danitiger33?

@lasit19
Copy link

lasit19 commented May 30, 2018

Hey, I am having the similar issue. Is there any way to solve this error? I have tried importing nltk locally and tried the solution on this page

thanks in advance!!!!

@ndvbd
Copy link

ndvbd commented May 5, 2019

Any news here? Is NLTK thread safe?

@purificant
Copy link
Member

It would help to have a minimal reproducible test case for this issue.

@ndvbd
Copy link

ndvbd commented Jul 12, 2019

@purificant simply to run wordnet synsets in multiple threads, generate these annoying problems.

@spearsem
Copy link

Is there any progress on this? I am running into this when using pytest-xdist to parallelize a series of pytest unit tests. In a module, I import wordnet, and that module is used for several parallelized tests. At the pytest-xdist level, I have no control over sequestering this into an isolated function to be called in each worker thread.

@ndvbd
Copy link

ndvbd commented Jul 26, 2019

I solved it by wrapping the wn.synsets(word) with a lock:


from threading import Lock
synset_lock = Lock() 
synset_lock.acquire()
all_sysnets = list(wn.synsets(word))
synset_lock.release()

@spearsem
Copy link

spearsem commented Jul 26, 2019

@ndvbd in the context of parallel unit test execution with pytest-xdist, this does not appear to work. I placed the import of wordnet into a session-level pytest fixture, and then made all the relevant functions accept the module itself as an argument, such as:

synset_lock = Lock()  # multiprocessing.Lock for pytest-xdist

def get_synset(word, pos, wordnet):
    synset_lock.acquire()
    all_synsets = list(wordnet.synsets(word, pos=pos)
    synset_lock.release()
    return all_synsets

and then a test file will use the fixture

@pytest.fixture(scope='session')
def threadsafe_wordnet():
    nltk.download('wordnet')
    from nltk.corpus import wordnet
    wordnet.ensure_loaded()
    return wordnet

such as

@pytest.mark.parametrize('word', ['dog', 'cat', 'bat'])
def test_wordnet(threadsafe_wordnet, word):
    assert len(get_synsets(word, pos=threadsafe_wordnet.NOUN, threadsafe_wordnet)) >= 1

I found that if I don't wrap the import of wordnet inside a pytest fixture like this, I can't even get pytest to proceed past the test collection phase. It seems to choke immediately at the first attempt to import my module where I am importing wordnet and calling nltk.download.

Even after getting it to proceed past module imports, this fixture approach creates all kinds of crazy issues with parallel test execution, ranging from unexplainable deadlocks, crashing test workers, and errant attempts to re-download wordnet (even though it is verifiable installed in the nltk data folder being used by all the tests).

I feel very much at a loss of what else to try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants