In [1]:
%%html

<h1>co-reference</h1>

<ol>
    <li>spacy - https://github.com/huggingface/neuralcoref, seems good w/ anaphora.</li>
    <li>wiki - https://en.wikipedia.org/wiki/Coreference
</ol>

In [2]:
import spacy
import neuralcoref

from spacy import displacy

In [3]:
nlp = spacy.load('en_core_web_sm')

## add to pipeline,
neuralcoref.add_to_pipe(nlp)

<spacy.lang.en.English at 0x129076da0>

In [4]:
%%html

<h3>Anaphora</h3>

<p>"... expression that depends specifically upon an antecedent expression ..."</p>

In [5]:
doc = nlp("Carol and Bod saw a movie together. They did not enjoy it.")

print(doc._.has_coref)
print(doc._.coref_clusters)
print([(ent.label_, ent.text) for ent in doc.ents])
print()
print()

## expect [Carol and Bob, they]
print(doc._.coref_resolved)

True
[Carol and Bod: [Carol and Bod, They], a movie: [a movie, it]]
[('PERSON', 'Bod')]


Carol and Bod saw a movie together. Carol and Bod did not enjoy a movie.


In [6]:
%%html

<h3>Cataphora</h3>

<p>"... preceding expression, whose meaning is determined or specified by the later expression ..."</p>

In [7]:
doc = nlp("He was angry about the music, so Bob called the cops.")

print(doc._.has_coref)
print(doc._.coref_clusters)
print([(ent.label_, ent.text) for ent in doc.ents])
print()
print()

## expect [they, neighbors]
print(doc._.coref_resolved)

False
[]
[('PERSON', 'Bob')]


He was angry about the music, so Bob called the cops.


In [8]:
%%html

<h3>Split antecedents</h3>

<p>Anaphora / Cataphora refers to multiple.</p>

In [9]:
doc = nlp('Carol told Bob to attend the party. They arrived together.')

print(doc._.has_coref)
print(doc._.coref_clusters)
print([(ent.label_, ent.text) for ent in doc.ents])
print()
print()

## expect [carol and bob, they]
print(doc._.coref_resolved)

True
[the party: [the party, They]]
[('PERSON', 'Bob')]


Carol told Bob to attend the party. the party arrived together.


In [10]:
%%html

<h3>Coreferring noun phrases</h3>

<p>Coreferring noun phrases, whereby the second noun phrase is a predication.</p>

In [11]:
doc = nlp('The project leader is refusing to help. The jerk thinks only of himself.')

print(doc._.has_coref)
print(doc._.coref_clusters)
print([(ent.label_, ent.text) for ent in doc.ents])
print()
print()

## expect [the project leader, jerk]
print(doc._.coref_resolved)

True
[The jerk: [The jerk, himself]]
[]


The project leader is refusing to help. The jerk thinks only of The jerk.
