append the noun_chunk generator object #3856

Fourthought · 2019-06-17T22:27:36Z

Feature description

Is there a way to append the doc.noun_chunk generator object in the way its possible to append the doc.ents tuple?

I can see the doc.ents can be appended with doc.ents += (new_entity,), but I've been unable to recreate with itertools.chain() for doc.noun_chunks.

The new noun_chunks are based on patterns identified by the pattern matcher.

Reproducing example code

doc = nlp('the enemy of America is')

print(*doc.noun_chunks)
>> the enemy
>> America

pattern = [
    [{'POS': 'DET', 'OP' : '?'}, {'LEMMA' : 'enemy'}, {'LEMMA' : 'of'}, {'POS' : 'DET', 'OP' : '?' }, {'POS' : 'PROPN', 'OP' : '+'}]
]

matcher2 = Matcher(nlp.vocab)
matcher2.add('OUTGROUP', None, *pattern)

matches = matcher(doc)

for match_id, start, end in matcher(doc):
    print(test_doc[start:end])
>> the enemy of America
>> enemy of America

in this case, would it be possible to add 'the enemy of America' to doc.noun_chunks?

ines · 2019-06-18T08:04:31Z

The doc.noun_chunks iterator is read-only, because it's computed by a getter function that uses the tokens' dependencies and part-of-speech tags. See lang/en/syntax_iterators.py for an example of this.

However, you could use a custom extension attribute to create your own custom noun chunks property on the Doc, and then make it return

from spacy.tokens import Doc

def get_custom_noun_chunks(doc):
    default_noun_chunks = list(doc.noun_chunks)
    # Add your logic with the matcher etc. here
    custom_noun_chunks = get_your_custom_chunks(doc)
    return default_noun_chunks + custom_noun_chunks

Doc.set_extension("custom_noun_chunks", getter=get_custom_noun_chunks)

You can then access doc._.custom_noun_chunks and it should return a list of the combined spans.

Fourthought · 2019-06-18T13:23:07Z

Great, thank you Ines

lock · 2019-07-18T13:49:27Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added feat / doc Feature: Doc, Span and Token objects usage General spaCy usage labels Jun 18, 2019

ines closed this as completed Jun 18, 2019

lock bot locked as resolved and limited conversation to collaborators Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

append the noun_chunk generator object #3856

append the noun_chunk generator object #3856

Fourthought commented Jun 17, 2019 •

edited by ines

Loading

ines commented Jun 18, 2019

Fourthought commented Jun 18, 2019

lock bot commented Jul 18, 2019

append the noun_chunk generator object #3856

append the noun_chunk generator object #3856

Comments

Fourthought commented Jun 17, 2019 • edited by ines Loading

Feature description

Reproducing example code

ines commented Jun 18, 2019

Fourthought commented Jun 18, 2019

lock bot commented Jul 18, 2019

Fourthought commented Jun 17, 2019 •

edited by ines

Loading