Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugger causes stack overflow because the property 'sent_start' is infinitely recursive #1640

Closed
cmckain opened this issue Nov 26, 2017 · 8 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@cmckain
Copy link

cmckain commented Nov 26, 2017

"Unhandled exception at 0x00007FFC12181517 (token.cp36-win_amd64.pyd) in python.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000000402603FF8)."
Iterating through a sentence causes a crash when PyCharm's debugger attempts to break after the first word (first word->second word->crash). Attached is the memory dump from Python after it crashed. If the dump with the heap would be useful, I can send it but it is over 2 GB.

Sample Code

import spacy
nlp = spacy.load('en_core_web_lg')
sentence = "Ashley graded Bob's term paper after he completed his assignment and she finished her group project with June."
text = nlp(sentence)
for token in text:
    print(token.text, token.ent_id, token.lemma_, token.pos_, token.tag_, token.dep_) #break here

Your Environment

  • spaCy version: 2.0.3
  • Platform: Windows-10-10.0.17025-SP0
  • Python version: 3.6.2
    python2.zip
@cmckain cmckain changed the title PyCharm's debugger causes stack overflow during iteration through sentence Debugger causes stack overflow during iteration through sentence Nov 28, 2017
@honnibal
Copy link
Member

@cmckain Do you have any insight about the likely implications of this?

I don't regularly use debuggers, and I've never used PyCharm, so I'm not sure whether this points towards a deeper issue. If the crash is exposing a memory error in spaCy (e.g. a use after free or out-of-bounds access), obviously we're very interested in that! But if it's just that we hit some tight stack-size limit in this tool that we don't hit in regular execution, I don't think that's a problem we'd work on.

@ruiEnca
Copy link

ruiEnca commented Nov 29, 2017

I have another problem with non-English models of Spacy 2.0 and PyCharm's debugger.
With the English model, I run and debug my project as usual.
With other models (I tried Portuguese and Spanish) I can run the project normally but when I try to use the debugger it halts on the line where I load the model:
nlp = spacy.load('pt')
or even:
nlp = spacy.load('pt', disable=['parser', 'tagger', 'ner'])

For some days I thought it was a PyCharm problem but after reinstall and even downgrade PyCharm I found that the source of the problem is the loading of the Spacy model. By chance, I left the debugger stopped in that line and discover that it restarts and follow to the next line after 10-15 minutes!!!
Is there any difference in the load process of these models that can explain this incredible delay?
It can't be a memory problem since the models' size is similar and I have enough memory (I run the project with the three models loaded without any problem).
This is really puzzling me...
Any tip would be welcome!

My Environment

spaCy version      2.0.3
Platform           Windows-10-10.0.17046-SP0
Python version     3.6.3
Models             en, es, pt

TYPE        NAME                  MODEL                 VERSION
package     pt-core-news-sm       pt_core_news_sm       2.0.0
package     es-core-news-sm       es_core_news_sm       2.0.0
package     en-core-web-sm        en_core_web_sm        2.0.0
link        en                    en_core_web_sm        2.0.0
link        es                    es_core_news_sm       2.0.0
link        pt                    pt_core_news_sm       2.0.0

@cmckain
Copy link
Author

cmckain commented Nov 30, 2017

Although I haven't stepped through the assembly line-by-line yet, I would infer that the debugger is somehow forcing the program into an unbreakable loop or, perhaps, some recursion which calls the same function over and over again. My first notice of this problem was when a depreciation warning kept printing until the program crashed. Oddly enough, the debugger works the first time you open a Spacy data structure but the second attempt (both on a child structure and something else) causes a crash. Perhaps the debugger is maintaining a lock on some of the data and the program just keeps failing to get it back and, as a result, just crashes?

@cmckain
Copy link
Author

cmckain commented Nov 30, 2017

I can confirm @ruiEnca's problem. Using the above code, here are my timed results with the debugger on and off:
en_core_web_lg: 0:00:18.781000, 0:00:26.921000
en_core_web_sm: 0:00:03.859000, 0:00:04.188000
es_core_news_sm: 0:03:12.625000, 0:00:03.812000

@cmckain
Copy link
Author

cmckain commented Dec 3, 2017

I think I figured it out, @honnibal. In the file "spaCy/spacy/tokens/token.pyx", it sets up the 'get' and 'set' functions of the property "sent_start". If the word is literally the beginning of the sentence, it returns false but, if not, it returns the value of "sent_start" which calls the 'get' of "sent_start" which returns the value of "sent_start" and on and on. The stack overflow is because the function never stops calling itself after the first word. For most people, this isn't a problem because they don't need to call a depreciated function but the debugger does (since it lists all possible properties). My previous observation that it only occurred on the second listing was because I always tested the first word first and the second word second; when I removed the logic and had "return self.sent_start" always run, the program failed regardless of what word I chose to debug. My temporary solution was to change line 356 to "return True" although I'm not sure what it normally showed in the past. I would recommend that the removal of that property take place sooner rather than later. For @ruiEnca's issue, I would guess that the debugger is forcing some extra code to run that isn't normally run during startup as the debugger tries to load everything that it can.

@cmckain cmckain changed the title Debugger causes stack overflow during iteration through sentence Debugger causes stack overflow because the property 'sent_start' is infinitely recursive Dec 3, 2017
@ruiEnca
Copy link

ruiEnca commented Dec 7, 2017

@honnibal @cmckain I found the place where the debugger stops for some minutes while loading a non-English model. It is in the import_file function of compat.py in line 119:
spec.loader.exec_module(module)
This function is called from load_model and load_model_from_link in util.py.
With spacy.load('en') everything is ok. With other languages (I tried pt and es) it halts.
I can't spot the difference between loading the English model and the others.

My Environment

spaCy version      2.0.4
Location           C:\Anaconda\lib\site-packages\spacy-2.0.4-py3.6-win-amd64.egg\spacy
Platform           Windows-10-10.0.17046-SP0
Python version     3.6.3
Models             en, en_core_web_lg, en_core_web_md, es, pt

TYPE        NAME                  MODEL                 VERSION
package     pt-core-news-sm       pt_core_news_sm       2.0.0
package     es-core-news-sm       es_core_news_sm       2.0.0
package     en-core-web-sm        en_core_web_sm        2.0.0
package     en-core-web-md        en_core_web_md        2.0.0
package     en-core-web-lg        en_core_web_lg        2.0.0
link        en                    en_core_web_sm        2.0.0
link        en_core_web_lg        en_core_web_lg        2.0.0
link        en_core_web_md        en_core_web_md        2.0.0
link        es                    es_core_news_sm       2.0.0
link        pt                    pt_core_news_sm       2.0.0

@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Dec 13, 2017
@honnibal
Copy link
Member

@cmckain Thanks!! Was on holidays for most of December, so just getting back to this now. I've fixed the infinite loop -- I meant to write self.c.sent_start, but wrote self.sent_start...

@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

3 participants