CoherenceModel does not finish with computing #3368

PrimozGodec · 2022-07-25T07:23:04Z

Problem description

When computing coherence scores, it newer finishes with computing on a bit bigger dataset. Run the code below (with the provided dataset) to reproduce.

Steps/code/corpus to reproduce

    with open("coherence-bug.pkl", "rb") as f:
        model, tokens = pickle.load(f)

    print("conherence")
    print(datetime.now())
    t = time.time()
    cm = CoherenceModel(model=model, texts=tokens, coherence="c_v")
    coherence = cm.get_coherence()
    print(time.time() - t)

coherence-bug.pkl.zip

Versions

The bug appears on Gensim version 4.2, but it does not happen on 4.1.2

macOS-10.16-x86_64-i386-64bit
Python 3.8.12 (default, Oct 12 2021, 06:23:56)
[Clang 10.0.0 ]
Bits 64
NumPy 1.22.3
SciPy 1.8.1
gensim 4.2.1.dev0
FAST_VERSION 0

piskvorky · 2022-07-25T08:43:40Z

@silviatti could you check this one please? #3197 was the only change in CoherenceModel, although I don't see how it's related.

@PrimozGodec could you interrupt your stuck computation with ctrl-c and post the traceback? Thanks.

PrimozGodec · 2022-07-25T09:07:37Z

Thank you for your fast response. Here is the traceback.

Process AccumulatingWorker-1:
Process AccumulatingWorker-2:
Process AccumulatingWorker-3:
Traceback (most recent call last):
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
    util._exit_function()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 360, in _exit_function
    _run_finalizers()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/queues.py", line 195, in _finalize_join
    thread.join()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
Traceback (most recent call last):
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
    util._exit_function()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 360, in _exit_function
    _run_finalizers()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/queues.py", line 195, in _finalize_join
    thread.join()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
Traceback (most recent call last):
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap
    util._exit_function()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 360, in _exit_function
    _run_finalizers()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/multiprocessing/queues.py", line 195, in _finalize_join
    thread.join()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/Users/primoz/miniconda3/envs/orange3/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

felixrech · 2022-10-29T11:43:02Z

I also had the issue that the cm.get_coherence() call would not terminate for larger lists of texts. Here's how I "fixed" it:

The base issue was not actually due to Gensim, but a problem on my end. It's just that (I presume) due to multiprocessing Gensim did not properly raise an error but simply never terminated. You can find out if you have the same problem using the following steps:

Disable multiprocessing: cm = CoherenceModel(model=model, texts=tokens, coherence="c_v", processes=1)
Rerun the code and see if there is an error.

For me it was an IndexError: There was a bug in my upstream code and some empty texts, i.e. empty lists, snuck into the final tokens list. If you also have an error, then you can might be able to fix it like this:
3. Identify the underlying upstream bug and fix it.
4. Rerun the code. cm.get_coherence() should now return a coherence!

I hope this helps! Though I'm not sure if @PrimozGodec and @nadiaelen are facing the same issue, the root cause might still lie somewhere with multiprocessing.

PrimozGodec · 2022-11-18T13:30:08Z

@felixrech thank you for the suggestion. When switching to processes=1 I fond the error.

piskvorky added bug Issue described a bug difficulty easy Easy issue: required small fix impact HIGH Show-stopper for affected users reach LOW Affects only niche use-case users labels Jul 25, 2022

piskvorky added this to the Next release milestone Jul 25, 2022

PrimozGodec mentioned this issue Aug 1, 2022

Topic Modeling Crash biolab/orange3-text#820

Closed

This comment was marked as abuse.

Sign in to view

mpenkov changed the title ~~ConherenceModel does not finish with computing~~ CoherenceModel does not finish with computing Aug 22, 2022

PrimozGodec mentioned this issue Nov 18, 2022

Path Coherence Model to correctly handle empty documents #3406

Merged

mpenkov closed this as completed in #3406 Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoherenceModel does not finish with computing #3368

CoherenceModel does not finish with computing #3368

PrimozGodec commented Jul 25, 2022

piskvorky commented Jul 25, 2022 •

edited

PrimozGodec commented Jul 25, 2022

This comment was marked as abuse.

felixrech commented Oct 29, 2022

PrimozGodec commented Nov 18, 2022

CoherenceModel does not finish with computing #3368

CoherenceModel does not finish with computing #3368

Comments

PrimozGodec commented Jul 25, 2022

Problem description

Steps/code/corpus to reproduce

Versions

piskvorky commented Jul 25, 2022 • edited

PrimozGodec commented Jul 25, 2022

This comment was marked as abuse.

felixrech commented Oct 29, 2022

PrimozGodec commented Nov 18, 2022

piskvorky commented Jul 25, 2022 •

edited