Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lemmatize: StopIteration error in Python 3.7 #2438

Closed
ajdapretnar opened this issue Apr 8, 2019 · 15 comments
Closed

lemmatize: StopIteration error in Python 3.7 #2438

ajdapretnar opened this issue Apr 8, 2019 · 15 comments

Comments

@ajdapretnar
Copy link

Problem description

Trying to run simple lemmatization as described in the documentation. Getting:
RuntimeError: generator raised StopIteration

Steps/code/corpus to reproduce

from gensim.utils import lemmatize
lemmatize('Hello World! How is it going?! Nonexistentword, 21')

Versions

Darwin-17.7.0-x86_64-i386-64bit
Python 3.7.2 (default, Dec 29 2018, 00:00:04) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.16.2
SciPy 1.2.1
gensim 3.7.1
FAST_VERSION 1
@piskvorky
Copy link
Owner

piskvorky commented Apr 8, 2019

@ajdapretnar what's your version of Pattern? (import pattern; print(pattern.__version__))

Also, please include the full stack trace.

@ajdapretnar
Copy link
Author

'3.6'

@piskvorky
Copy link
Owner

And what's the stack trace?

@ajdapretnar
Copy link
Author

Traceback (most recent call last):
  File "/Users/ajda/miniconda3/envs/o3/lib/python3.7/site-packages/pattern/text/__init__.py", line 609, in _read
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ajda/miniconda3/envs/o3/lib/python3.7/site-packages/gensim/utils.py", line 1692, in lemmatize
    parsed = parse(content, lemmata=True, collapse=False)
  File "/Users/ajda/miniconda3/envs/o3/lib/python3.7/site-packages/pattern/text/en/__init__.py", line 169, in parse
    return parser.parse(s, *args, **kwargs)
  File "/Users/ajda/miniconda3/envs/o3/lib/python3.7/site-packages/pattern/text/__init__.py", line 1183, in parse
    s[i] = self.find_lemmata(s[i], **kwargs)
  File "/Users/ajda/miniconda3/envs/o3/lib/python3.7/site-packages/pattern/text/en/__init__.py", line 107, in find_lemmata
    return find_lemmata(tokens)
  File "/Users/ajda/miniconda3/envs/o3/lib/python3.7/site-packages/pattern/text/en/__init__.py", line 99, in find_lemmata
    lemma = conjugate(word, INFINITIVE) or word
  File "/Users/ajda/miniconda3/envs/o3/lib/python3.7/site-packages/pattern/text/__init__.py", line 2208, in conjugate
    b = self.lemma(verb, parse=kwargs.get("parse", True))
  File "/Users/ajda/miniconda3/envs/o3/lib/python3.7/site-packages/pattern/text/__init__.py", line 2172, in lemma
    self.load()
  File "/Users/ajda/miniconda3/envs/o3/lib/python3.7/site-packages/pattern/text/__init__.py", line 2127, in load
    for v in _read(self._path):
RuntimeError: generator raised StopIteration

@piskvorky
Copy link
Owner

piskvorky commented Apr 8, 2019

Seems unrelated to Gensim; try contacting the Pattern maintainers. FWIW, I have Pattern 2.6 and lemmatize works without problems there.

@ajdapretnar
Copy link
Author

Strangely, it works when I am using just Pattern itself.

from pattern.en import lemma
[lemma(w) for w in s.split(' ')]

I will investigate this a bit further, but it seems like a common Python 3.7 problem.

@piskvorky
Copy link
Owner

piskvorky commented Apr 8, 2019

Great, thanks. Let us know if Pattern changed its APIs recently, and the issue is somehow connected to Gensim after all.

Note that Gensim is using pattern.en.parse(content, lemmata=True, collapse=False) to get the lemmata & POS tags.

@zeyger
Copy link

zeyger commented Jul 24, 2019

Actually, it's related to new python 3.7 behavior:

PEP 479 is enabled for all code in Python 3.7, meaning that StopIteration exceptions raised directly or indirectly in coroutines and generators are transformed into RuntimeError exceptions. (Contributed by Yury Selivanov in bpo-32670.)

https://stackoverflow.com/questions/51700960/runtimeerror-generator-raised-stopiteration-every-time-i-try-to-run-app

Switching to python 3.6 should solve the issue

@mpenkov
Copy link
Collaborator

mpenkov commented Jul 25, 2019

Sounds like something the pattern guys should fix, right?

@NicolasBizzozzero
Copy link

I tried to open a PR fixing this issue, but pattern seems to be abandoned since August 2018. I hope that it will be solved one day, because it makes all versions of Python >= 3.7 unable to use gensim's lemmatizer.

@mpenkov
Copy link
Collaborator

mpenkov commented Dec 21, 2019

@NicolasBizzozzero Can you please elaborate? I thought gensim has only a soft dependency on pattern, and if that library is not available, then things still work?

@vquilon
Copy link

vquilon commented Jun 16, 2020

There is no need to use StopIteration on python generators. Remove all StopIterators, or adds a try execpt outside all functions that returns an generators. Infact you can comment the StopIterator inside read method (\pattern\text_init.py", line 609, in _read)

**BEFORE**
...
StopIteration
#return

AFTER

...
#StopIteration
return

TL;DR
The problem is when pattern tries to lemmatize, it uses a file or libraries that are loaded in lazy mode, that means that only when you use the lemma function, it loads the libraries.

But the method that raises the StopIteration exception, specifically, it fails when creating an instance of the Verbs class, which uses a lazy dictionary, that is, it loads when it is going to be used.

This is the doc of class Verbs inside pattern

"""
    A dictionary of verb infinitives, each linked to a list of conjugated forms.
    Each line in the file at the given path is one verb, with the tenses separated by a comma.
    The format defines the order of tenses (see TENSES).
    The default dictionary defines default tenses for omitted tenses.
"""

The real problem is inside the _read method, which has a poorly implemented generator, we can see it in the code

if path:
        if isinstance(path, str) and os.path.exists(path):
            # From file path.
            f = open(path, "r", encoding="utf-8")
        elif isinstance(path, str):
            # From string.
            f = path.splitlines()
        else:
            # From file or buffer.
            f = path
        for i, line in enumerate(f):
            line = line.strip(BOM_UTF8) if i == 0 and isinstance(line, str) else line
            line = line.strip()
            line = decode_utf8(line, encoding)
            if not line or (comment and line.startswith(comment)):
                continue
            yield line
raise StopIteration

The last raise StopIteration will be inside the if else condition, because if the parameter path is null or empty, this method will raise a StopIteration. It would be necessary to add an else after the if path and put the StopIteration inside the else. In addition, the StopIteration would have to be captured by try catch to capture that the file is not found in that path, and in this way it would return well if it finds the file.

if path:
        if isinstance(path, str) and os.path.exists(path):
            # From file path.
            f = open(path, "r", encoding="utf-8")
        elif isinstance(path, str):
            # From string.
            f = path.splitlines()
        else:
            # From file or buffer.
            f = path
        for i, line in enumerate(f):
            line = line.strip(BOM_UTF8) if i == 0 and isinstance(line, str) else line
            line = line.strip()
            line = decode_utf8(line, encoding)
            if not line or (comment and line.startswith(comment)):
                continue
            yield line
else:
    raise StopIteration

...
class Verbs(lazydict):
    def __init__
    ...
    def load(self):
        # have,,,has,,having,,,,,had,had,haven't,,,hasn't,,,,,,,hadn't,hadn't
        id = self._format[TENSES_ID[INFINITIVE]]
        try:
            for v in _read(self._path):
                v = v.split(",")
                dict.__setitem__(self, v[id], v)
                for x in (x for x in v if x):
                    self._inverse[x] = v[id]
        except StopIteration as no_path:
            raise("The path is empty or False")

@Pikamander2
Copy link

This appears to be an easier workaround:

def pattern_stopiteration_workaround():
    try:
        print(lexeme('gave'))
    except:
        pass

def main():
    pattern_stopiteration_workaround()
    #Add your other code here

Basically, the pattern code will fail the first time you run it, so you first need to run it once and catch the Exception it throws.

It's worked well enough for my own scripts, but I don't know if it fixes every possible issue.

Ideally though, somebody should fork the clips/pattern project since it's no longer maintained.

@mpenkov
Copy link
Collaborator

mpenkov commented May 15, 2021

We removed the pattern dependency, so this problem is no longer relevant to gensim.

@brenorb
Copy link

brenorb commented Jun 16, 2021

We removed the pattern dependency, so this problem is no longer relevant to gensim.

For sure it did not ask for pattern as a dependency, but trying to run gensim.utils.lemmatize("The quick brown fox jumps over the lazy dog!") and it won't if I don't have pattern installed. And then I got here with the same problem...

i-be-snek added a commit to i-be-snek/pattern-StopIteration-fix that referenced this issue Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants