Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online version finds the right antecedents, but actual version does not #16

Closed
kleinias opened this issue Feb 19, 2018 · 5 comments
Closed

Comments

@kleinias
Copy link

kleinias commented Feb 19, 2018

The online version is working fine on the following text:
"I know that Barbara and Sandy are here. I see Barbara watching TV. I hear Sandy breathing."

https://huggingface.co/coref/?text=I%20know%20that%20Barbara%20and%20Sandy%20are%20here.%20I%20see%20Barbara%20watching%20TV.%20I%20hear%20Sandy%20breathing.

But the actual version doesn't find ennough, just Barbara. It outputs by running:

clusters = coref.one_shot_coref(utterances=u"I know that Barbara and Sandy are here. I see Barbara watching TV. I hear Sandy breathing.")
print(clusters)
print (coref.get_most_representative())
mentions = coref.get_mentions()
print(mentions)

Loading spacy model

Info about model en_core_web_sm

lang               en             
pipeline           ['tagger', 'parser', 'ner']
accuracy           {'token_acc': 99.8698372794, 'ents_p': 84.9664503965, 'ents_r': 85.6312524451, 'uas': 91.7237657538, 'tags_acc': 97.0403350292, 'ents_f': 85.2975560875, 'las': 89.800872413}
name               core_web_sm    
license            CC BY-SA 3.0   
author             Explosion AI   
url                https://explosion.ai
vectors            {'keys': 0, 'width': 0, 'vectors': 0}
sources            ['OntoNotes 5', 'Common Crawl']
version            2.0.0          
spacy_version      >=2.0.0a18     
parent_package     spacy          
speed              {'gpu': None, 'nwords': 291344, 'cpu': 5122.3040471407}
email              contact@explosion.ai
description        English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.
source             /usr/local/lib/python3.6/dist-packages/en_core_web_sm

loading model from /usr/local/lib/python3.6/dist-packages/neuralcoref/weights/
{3: [3, 0]}
{}
[Barbara, Barbara and Sandy, Sandy, Barbara, TV, Sandy, Sandy breathing]

@bea-alex
Copy link

I'd be interested in resolving this issue too. I got the same result as the previous commenter and here are the underlying scores:

{u'pair_scores': {0: {}, 1: {0: -1.8137825597308108}, 2: {0: -1.738390801732288, 1: -1.6511597972712726}, 3: {0: 6.5473994047601911, 1: -0.57869067045464151, 2: -1.6598056098030169}, 4: {0: -1.8103805461400377, 1: -1.5256500224140488, 2: -1.5399936662599227, 3: -1.6966305608918302}, 5: {0: -2.2999057893775179, 1: -1.7149788666508408, 2: 0.68513795195160965, 3: -2.0966374729906301, 4: -1.8540071211764726}, 6: {0: -1.9504528157206593, 1: -1.8210641784028945, 2: -1.8300293314203429, 3: -1.8767248759882404, 4: -1.6296482123249305, 5: -1.9901970079817037}}, u'single_scores': {0: None, 1: 1.5870258214636452, 2: 1.6899656734067761, 3: 1.5896249109319895, 4: 1.8004470287030618, 5: 1.5748515581318938, 6: 1.6261232857954271}}

@thomwolf
Copy link
Member

thomwolf commented May 15, 2018

Hi @bea-alex, @kleinias, after trying the new version it still seems to be different between our production setup (online demo) and the open-sourced version.
My guess is that it is related to a difference in spacy model. In our production setup we selected a large spacy 1 model with a higher parsing accuracy.
I will investigate further and keep up updated.

@thomwolf
Copy link
Member

thomwolf commented May 16, 2018

Ok more investigation indicated it's indeed an issue with the accuracy of the spacy model you use.

Parsing the sentence I hear Sandy breathing. with the default spacy 2 en_core_web_sm model incorrectly label Sandy as an ADJ.

The most simple solution is to use the larger model en_core_web_lg which label this example correctly.

Currently spacy model is hard coded to en_core_web_sm, you can overcome it by passing an nlp object to coref constructor. In the next version I will release in a few days I will make it easier to use any spacy model.

@thomwolf
Copy link
Member

We are now on release v3.0 so I am closing this old issue.
Feel free to open it again (or a new one) if you are experiencing some issues with the new release.

@boehm-e
Copy link

boehm-e commented Oct 4, 2018

Hi,

I am using the en_core_web_lg model, but am still having the issue:
with this sentence:

Once upon a time a lion lived in a forest. One day after a heavy meal it was sleeping. After a while, a mouse came and it started to play. Suddenly the lion got up with anger and looked for those who disturbed it's nice sleep. Then it saw a small mouse standing trembling with fear. The lion jumped on it. The mouse begged the lion to forgive it. The lion felt pity and left. The mouse ran away. On another day, the lion was caught in a net.

One day after a heavy meal IT was sleeping, is replaced by
One day after a heavy meal ONE DAY AFTER A HEAVY MEAL was sleeping

and it is replaced correctly in the web version.

Do you have any idea on how to solve this issue?

thank you very much for open-sourcing this project!

https://huggingface.co/coref/my_story

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants