Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoherenceModel does not finish with computing #3368

Closed
PrimozGodec opened this issue Jul 25, 2022 · 5 comments · Fixed by #3406
Closed

CoherenceModel does not finish with computing #3368

PrimozGodec opened this issue Jul 25, 2022 · 5 comments · Fixed by #3406
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix impact HIGH Show-stopper for affected users reach LOW Affects only niche use-case users
Milestone

Comments

@PrimozGodec
Copy link
Contributor

Problem description

When computing coherence scores, it newer finishes with computing on a bit bigger dataset. Run the code below (with the provided dataset) to reproduce.

Steps/code/corpus to reproduce

    with open("coherence-bug.pkl", "rb") as f:
        model, tokens = pickle.load(f)

    print("conherence")
    print(datetime.now())
    t = time.time()
    cm = CoherenceModel(model=model, texts=tokens, coherence="c_v")
    coherence = cm.get_coherence()
    print(time.time() - t)

coherence-bug.pkl.zip

Versions

The bug appears on Gensim version 4.2, but it does not happen on 4.1.2

macOS-10.16-x86_64-i386-64bit
Python 3.8.12 (default, Oct 12 2021, 06:23:56)
[Clang 10.0.0 ]
Bits 64
NumPy 1.22.3
SciPy 1.8.1
gensim 4.2.1.dev0
FAST_VERSION 0

@piskvorky
Copy link
Owner

piskvorky commented Jul 25, 2022

@silviatti could you check this one please? #3197 was the only change in CoherenceModel, although I don't see how it's related.

@PrimozGodec could you interrupt your stuck computation with ctrl-c and post the traceback? Thanks.

@piskvorky piskvorky added bug Issue described a bug difficulty easy Easy issue: required small fix impact HIGH Show-stopper for affected users reach LOW Affects only niche use-case users labels Jul 25, 2022
@piskvorky piskvorky added this to the Next release milestone Jul 25, 2022
@PrimozGodec
Copy link
Contributor Author

Thank you for your fast response. Here is the traceback.

Process AccumulatingWorker-1:
Process AccumulatingWorker-2:
Process AccumulatingWorker-3:
Traceback (most recent call last):
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
    util._exit_function()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 360, in _exit_function
    _run_finalizers()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/queues.py", line 195, in _finalize_join
    thread.join()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
Traceback (most recent call last):
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
    util._exit_function()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 360, in _exit_function
    _run_finalizers()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/queues.py", line 195, in _finalize_join
    thread.join()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
Traceback (most recent call last):
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
    util._exit_function()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 360, in _exit_function
    _run_finalizers()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/queues.py", line 195, in _finalize_join
    thread.join()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

@nadiaelen

This comment was marked as abuse.

@mpenkov mpenkov changed the title ConherenceModel does not finish with computing CoherenceModel does not finish with computing Aug 22, 2022
@felixrech
Copy link

I also had the issue that the cm.get_coherence() call would not terminate for larger lists of texts. Here's how I "fixed" it:

The base issue was not actually due to Gensim, but a problem on my end. It's just that (I presume) due to multiprocessing Gensim did not properly raise an error but simply never terminated. You can find out if you have the same problem using the following steps:

  1. Disable multiprocessing: cm = CoherenceModel(model=model, texts=tokens, coherence="c_v", processes=1)
  2. Rerun the code and see if there is an error.

For me it was an IndexError: There was a bug in my upstream code and some empty texts, i.e. empty lists, snuck into the final tokens list. If you also have an error, then you can might be able to fix it like this:
3. Identify the underlying upstream bug and fix it.
4. Rerun the code. cm.get_coherence() should now return a coherence!

I hope this helps! Though I'm not sure if @PrimozGodec and @nadiaelen are facing the same issue, the root cause might still lie somewhere with multiprocessing.

@PrimozGodec
Copy link
Contributor Author

@felixrech thank you for the suggestion. When switching to processes=1 I fond the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix impact HIGH Show-stopper for affected users reach LOW Affects only niche use-case users
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants