Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug OOV words in WECoherenceCentroid #45

Closed
rsimd opened this issue Nov 28, 2021 · 1 comment
Closed

Bug OOV words in WECoherenceCentroid #45

rsimd opened this issue Nov 28, 2021 · 1 comment

Comments

@rsimd
Copy link

rsimd commented Nov 28, 2021

OCTIS version: 1.10.0
Python version :3.9.7
Operating System:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"

Description

In this line (https://github.com/MIND-Lab/OCTIS/blob/master/octis/evaluation_metrics/coherence_metrics.py#L180
), topic[0] contains a word, so if this is a word that is not included in self._wv, it will cause an error.

Since Gensim's KeyedVectors class has a vector_size variable, I think this code should be rewritten to create a zero vector with reference to vector_size.

#t = [0] * len(self._wv.__getitem__(topic[0]))
t = np.zeros(self._wv.vector_size)

Examples of error messages

  File "/root/.cache/pypoetry/virtualenvs/sktopic-L2WRRFYm-py3.9/lib/python3.9/site-packages/octis-1.10.0-py3.9.egg/octis/evaluation_metrics/coherence_metrics.py", line 180, in score
    t = [0] * len(self._wv.__getitem__(topic[0]))
  File "/root/.cache/pypoetry/virtualenvs/sktopic-L2WRRFYm-py3.9/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 395, in __getitem__
    return self.get_vector(key_or_keys)
  File "/root/.cache/pypoetry/virtualenvs/sktopic-L2WRRFYm-py3.9/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 438, in get_vector
    index = self.get_index(key)
  File "/root/.cache/pypoetry/virtualenvs/sktopic-L2WRRFYm-py3.9/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 412, in get_index
    raise KeyError(f"Key '{key}' not present")
KeyError: "Key 'elsevi' not present"
@silviatti silviatti changed the title Please fix the code that is likely to cause KeyError in WECoherenceCentroid Bug OOV words in WECoherenceCentroid Dec 7, 2021
@silviatti
Copy link
Collaborator

Hello,
thanks for reporting this issue and for your patience. I'm working on it and I will fix this by tomorrow.

Silvia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants