## **Import the neccessary libraries**

In [None]:
!python --version
!pip install -q gwpy
!pip install spacy-experimental
!pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.1/en_coreference_web_trf-3.4.0a2-py3-none-any.whl
!python3 -m pip install coreferee
!python3 -m coreferee install en
!python -m spacy download en_core_web_trf
!python -m spacy download en_core_web_lg
# !pip install spacy-transformers
# !pip uninstall spacy
# !pip install spacy==3.5.0
# !pip install crosslingual-coreference

Python 3.10.12
Collecting en-coreference-web-trf==3.4.0a2
  Downloading https://github.com/explosion/spacy-experimental/releases/download/v0.6.1/en_coreference_web_trf-3.4.0a2-py3-none-any.whl (490.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m490.3/490.3 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
2024-12-02 23:50:15.800238: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-02 23:50:15.851544: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-02 23:50:15.870134: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Collecting https://github.com/richardpa

In [None]:
import spacy

## **Exercise**

In [None]:
alice = "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversations?'"

Please note that we tried crosslingual-coreference, but it wasn't working.

## **Coreference in spaCy Experimental**

In [None]:
nlp = spacy.load("en_coreference_web_trf")
doc = nlp(alice)
doc.spans

{'coref_clusters_1': [Alice, her, she, her, Alice], 'coref_clusters_2': [her sister, her sister], 'coref_clusters_3': [the book her sister was reading, it, it]}

## **Coreference in spaCy Plugins**

### **Coreferee**

In [None]:
nlp = spacy.load('en_core_web_trf')
nlp.add_pipe('coreferee')
doc = nlp(alice)
doc._.coref_chains.print()


0: Alice(0), her(10), she(26), her(32), Alice(59)
1: bank(14), it(45)
2: book(31), it(38)


### **Crosslingual-coreference**

In [None]:
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
"xx_coref", config={"chunk_size": 2500, "chunk_overlap": 2, "device": 0}
)
doc = nlp(alice)
print(doc._.coref_clusters)

## **Analysis and conclusions**

In [None]:
# Tokenize the sentence using SpaCy in order to see the indexes of each word
nlp = spacy.load("en_core_web_sm")
doc = nlp(alice)

print(", ".join([f"{index}: {token.text}" for index, token in enumerate(doc)]))

0: Alice, 1: was, 2: beginning, 3: to, 4: get, 5: very, 6: tired, 7: of, 8: sitting, 9: by, 10: her, 11: sister, 12: on, 13: the, 14: bank, 15: ,, 16: and, 17: of, 18: having, 19: nothing, 20: to, 21: do, 22: :, 23: once, 24: or, 25: twice, 26: she, 27: had, 28: peeped, 29: into, 30: the, 31: book, 32: her, 33: sister, 34: was, 35: reading, 36: ,, 37: but, 38: it, 39: had, 40: no, 41: pictures, 42: or, 43: conversations, 44: in, 45: it, 46: ,, 47: ', 48: and, 49: what, 50: is, 51: the, 52: use, 53: of, 54: a, 55: book, 56: ,, 57: ', 58: thought, 59: Alice, 60: ', 61: without, 62: pictures, 63: or, 64: conversations, 65: ?, 66: '


### **spaCY Experimental**
It produced the following clusters:
- 1 -> [Alice, her, she, her, Alice] - In this case "Alice" is the main subjet of the sentence, while "her" and "she" are pronouns referring back to Alice. This cluster captures all references to Alice, ensuring that pronouns like "her" and "she" are correctly linked back to her. We also checked and identified that all instances referring to Alice are included.
- 2 -> [her sister, her sister] - "her sister" refers to Alice's sister. Both instances of "her sister" are correctly grouped together, indicating that they refer to the same individual.
- 3 -> [the book her sister was reading, it, it] - "the book her sisters was reading" is the specific book alice's sister is reading, while "it" is the pronouns referring back to the book. This cluster correctly links the description of the book to the subsequent pronouns "it", maintaining clarity about what it is discussed.

### **spaCY Coreferee**
It produces the following reslts :
- 0: Alice(0), her(10), she(26), her(32), Alice(59) - Again here we have the same result as in experimental example. As we said previously this cluster accurately captures all references to Alice, ensuring that "her" and "she" are correctly linked back to her.
- 1: bank(14), it(45) - "bank" is the location where Alice is sitting, while "it" refers back to "the bank" . According to this example "it had no pictures or conversations in it" is talking about the bank, although in the sentence, "it" more logically refers to "the book." There's a possible ambiguity here. In the sentence, "it" most logically refers to "the book" , not "the bank." However, spaCy's model has linked "it" at position 45 to "bank," which may be a misinterpretation based on sentence structure.
- 2: book(31), it(38) - The pronoun "it" at position 38 correctly refers to "the book", but as we mentioned above we also think that "it" in position 45 referes to book .

Here, in contrst with the first example we are missing the cluster for "her sister". However, since "her sister" is mentioned without pronouns referring back to her, this absence is understandable. Coreference resolution typically links pronouns to their antecedents, and repeated noun phrases without pronouns may not always form distinct clusters.



