Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge_noun_chunk trying to merge disjoint spans #5458

Closed
Fourthought opened this issue May 19, 2020 · 2 comments · Fixed by #5470
Closed

Merge_noun_chunk trying to merge disjoint spans #5458

Fourthought opened this issue May 19, 2020 · 2 comments · Fixed by #5470
Labels
bug Bugs and behaviour differing from documentation duplicate Issues that have been reported before feat / pipeline Feature: Processing pipeline and components lang / en English language data and models

Comments

@Fourthought
Copy link

Fourthought commented May 19, 2020

How to reproduce the behaviour

` There appears to be a problem where overlapping noun_chunks are being created for a particular text. This seems to be a bug within in-built functionality. Reproduction of the problem as follows:

nlp = spacy.load("en_core_web_md")
nlp2 =  spacy.load("en_core_web_md"

merge_nps = nlp.create_pipe("merge_noun_chunks")
nlp.add_pipe(merge_nps)
  
merge_ents = nlp.create_pipe("merge_entities")
nlp.add_pipe(merge_ents)

text taken from: 
https://www.americanrhetoric.com/speeches/gwbushcubaindependence100th.htm
 
doc = nlp(text)

error message: ValueError: [E102] Can't merge non-disjoint spans. 'markets' is already part of 
tokens to merge. If you want to find the longest non-overlapping spans, you can use the 
util.filter_spans helper: https://spacy.io/api/top-level#util.filter_spans

for chunk in nlp(text).noun_chunks:
    if str(chunk).find("markets") != -1:
        print(chunk.start, '|', chunk)

Output: 
1232 | markets      # <= the start of this chunk overlaps with the next
1231 | where markets have brought prosperity`

I should be able to temporarily resolve the problem by creating a custom add-on using filterspans, but thought you would want to know this.

Your Environment

spaCy version: 2.2.4
Platform: Windows-10-10.0.18362-SP0
Python version: 3.7.6

@adrianeboyd adrianeboyd added bug Bugs and behaviour differing from documentation duplicate Issues that have been reported before feat / pipeline Feature: Processing pipeline and components lang / en English language data and models labels May 19, 2020
@adrianeboyd
Copy link
Contributor

I think this is more or less a duplicate of #5393. Thanks for the additional example!

@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation duplicate Issues that have been reported before feat / pipeline Feature: Processing pipeline and components lang / en English language data and models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants