New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity recognition is inconsistent across runs #1336

Closed
jaju opened this Issue Sep 19, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@jaju
Copy link

jaju commented Sep 19, 2017

I'm attempting some entity recognition, and the results keep changing for the same sentence when I reload the same model.

Sample run snapshot (verbatim)

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> d = nlp('The company that IBM bought had rejected Apple and Google bids')
>>> d.ents
()
>>> nlp = spacy.load('en_core_web_sm')
>>> d = nlp('The company that IBM bought had rejected Apple and Google bids')
>>> d.ents
()
>>> nlp = spacy.load('en_core_web_sm')
>>> d = nlp('The company that IBM bought had rejected Apple and Google bids')
>>> d.ents
(IBM,)
>>> nlp = spacy.load('en_core_web_sm')
>>> d = nlp('The company that IBM bought had rejected Apple and Google bids')
>>> d.ents
(IBM,)
>>> nlp = spacy.load('en_core_web_sm')
>>> d = nlp('The company that IBM bought had rejected Apple and Google bids')
>>> d.ents
(IBM, Apple)

I'm not sure, but this is unexpected given nothing really changes.
Output of spacy info --markdown

Info about spaCy

  • spaCy version: 2.0.0a13
  • Platform: Darwin-16.7.0-x86_64-i386-64bit
  • Python version: 3.6.2
  • Models: en, en_core_web_sm, en_vectors_web_lg

@honnibal honnibal added the bug label Sep 19, 2017

@honnibal

This comment has been minimized.

Copy link
Member

honnibal commented Sep 19, 2017

Thanks for the report -- definitely something wrong here.

@jaju

This comment has been minimized.

Copy link
Author

jaju commented Sep 19, 2017

Please let me know if I can provide any more information, or run additional tests.
Thanks!

@honnibal

This comment has been minimized.

Copy link
Member

honnibal commented Sep 19, 2017

I have it reproduced now, so it shouldn't be long to get the fix sorted :)

@socialglass

This comment has been minimized.

Copy link

socialglass commented Sep 19, 2017

Something similar -

>>> doc = nlp(u"She Was among first investors to get approval for Biotech Fund.")
>>> doc.ents
(first, Biotech Fund)
>>> doc = nlp(u"She Was among the first investors to get approval for Biotech Fund.")
>>> doc.ents
()
>>> doc = nlp(u"I Was among first investors to get approval for Biotech Fund.")
>>> doc.ents
(Biotech Fund,)
@honnibal

This comment has been minimized.

Copy link
Member

honnibal commented Sep 19, 2017

The current version on develop seems to have this fixed already. Hopefully I can get it pushed to spacy-nightly tonight. (The new model also has better parse accuracy, which is nice...)

There are two possible explanations for the inconsistency:

  1. Some model preserves its random initialization even after loading (e.g. the model adds the loaded weights, instead of replacing them)

  2. Somewhere there's an out-of-bounds read. The eventual calculations would then depend on values from neighbouring memory locations, which would vary between runs.

I think 2) is probably more likely. The good news is that the instability occurs also in the tensor values, not just in the parser or tagger. This means there's only a few places to look. It's probably the maxout or convolution functions.

@jaju

This comment has been minimized.

Copy link
Author

jaju commented Sep 20, 2017

I updated to the latest nightly build, and the issue has disappeared.
Thanks a lot! That's admirably quick! :)

@honnibal

This comment has been minimized.

Copy link
Member

honnibal commented Sep 21, 2017

No worries!

@honnibal honnibal closed this Sep 21, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.