                                                                            virginia Uhi & Han Zhang


# Assignment 2 starter code


This notebook contains code to run [coreferee](https://github.com/explosion/coreferee), a coreference system running under spaCy to extract coreference chains (or clusters) from text.
To run the notebook, you first have to intall coreferee. See instructions here: https://spacy.io/universe/project/coreferee, but mostly what you need to do is, from a command prompt:

`$ python -m pip install coreferee`

`$ python -m coreferee install en`
    
You'll also need to download the spaCy large language model and the transformer model for English. It turns out, spacy has just released new versions and coreferee is not yet compatible with them, so you need to download specific versions of each model:

`$ python -m pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.4.0/en_core_web_lg-3.4.0.tar.gz`

`$ python -m pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.4.0/en_core_web_trf-3.4.0.tar.gz`

You can also run install these things by running the next two cells instead. You'll only need to do this once.

## Part 1: Installation

You only need to do this **once**, no matter how many times you modify this notebook. Or you can do it from the command prompt, with the commands above. There is no harm in doing in more than once; it just takes a very long time. 

The first cell installs coreferee and the English model for it (coreferee works for other languages too). The second cell installs the right versions of the English models for spaCy. You need these specific versions because newer versions of the English models don't work with coreferee. 

In [None]:
!python -m pip install coreferee
!python -m coreferee install en

In [None]:
!python -m pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.4.0/en_core_web_lg-3.4.0.tar.gz
!python -m pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.4.0/en_core_web_trf-3.4.0.tar.gz

## Part 2: Test coreferee

In [1]:
# import what we need, load the transformer model, 
# add coreferee to the spacy nlp pipeline

import coreferee, spacy
nlp = spacy.load('en_core_web_trf')
nlp.add_pipe('coreferee')

<coreferee.manager.CorefereeBroker at 0x29be4fcd0>

In [2]:
# this is just a test, so that you can see what the coreference chains look like
# you may get a CUDA warning here. As long as it's only a warning, things should run just fine

doc = nlp('Although he was very busy with his work, Peter had had enough of it. He and his wife decided they needed a holiday. They travelled to Spain because they loved the country very much.')

In [3]:
# now we print the coreference chains found

doc._.coref_chains.print()

0: he(1), his(6), Peter(9), He(16), his(18)
1: work(7), it(14)
2: [He(16); wife(19)], they(21), They(26), they(31)
3: Spain(29), country(34)


A few things to note about the output:

* We have 4 coreference chains, relating to: *Peter, work, wife(+Peter), Spain*
* Coreferee is able to deal with cataphora, where the pronoun (*he*) appears before the referent (*Peter*)
* Coreferee can deal with groups: *\[he+wife\], they*
* The wife does not appear as an entity with a chain, because there is no referring expression to that entity. It only appears as part of *he and his wife*

In [4]:
# once we have an index for a particular referring expression, 
# we can ask coreferee to resolve it. For instance, printing
# the following expression gives us the referent for 
# the referring expression 31 (they)

print(doc._.coref_chains.resolve(doc[31]))

[Peter, wife]


## Part 3: Run coreferee on local files

In [5]:
# do coreference chains for 5 documents in the data/ directory
# below is a sample for the first text

with open ("A1_data/5c1dbe1d1e67d78e2797d611.txt", "r", encoding='utf-8') as f:
    text0 = f.read()

In [6]:
doc0 = nlp(text0)

In [7]:
doc0._.coref_chains.print()

0: couple(9), couple(76)
1: years(16), their(19)
2: letter(48), them(59), they(78)
3: Ayo(72), Ayo(150)
4: custody(87), it(96)
5: News(112), News(136)
6: fact(117), It(165)
7: orphanage(161), orphanage(206)
8: letter(171), letter(219)
9: Kim(174), Kim(211)
10: children(254), their(266)
11: adoption(300), adoption(333)
12: headlines(317), they(325)
13: Kim(338), Kim(401), Kim(425), her(437), she(452)
14: Nigeria(345), country(359)
15: papers(372), them(376)
16: Clark(395), Clark(432), He(444), him(478), him(486), him(496), he(501)
17: [Kim(401); son(404)], their(403)
18: Canada(474), They(484)
19: Nigeria(489), They(494)
20: Morans(561), Morans(602)
21: family(578), family(636), they(658), their(660), they(684)
22: government(589), it(598)
23: Kim(633), she(652), she(695)
24: agency(648), it(672)


In [8]:
# example: who does she(452) refer to?

print(doc0._.coref_chains.resolve(doc0[452]))

[Kim]


In [9]:
# print referring expressions that are people
# we are interested in those because they are the sources of quotes
for ent in doc0.ents:
    if ent.label_ in ["PERSON"]:
        print(ent.text, ent.label_)

Kim PERSON
Clark Moran PERSON
Ayo PERSON
Kim PERSON
Ayo PERSON
Kim PERSON
Clark PERSON
Kim PERSON
Ayo PERSON
Morans PERSON
Kim PERSON
Ayo PERSON
Clark PERSON
Kim PERSON
Kim PERSON
Clark PERSON
Ayo PERSON
Kim PERSON
Morans PERSON
Ayo PERSON
Kim PERSON
Ben Miljure PERSON


## TEXT1

In [10]:
#1 Text file
with open ("A1_data/5c1548a31e67d78e2771624f.txt", "r", encoding='utf-8') as f:
    text1 = f.read()

doc1 = nlp(text1)

In [11]:
doc1._.coref_chains.print()

0: people(30), their(33)
1: Action(106), Action(174)
2: Some(145), they(158)
3: Canadians(205), Canadians(236)
4: rate(220), rate(246)
5: society(300), it(308)
6: Palvetzian(305), Palvetzian(336)
7: study(339), it(355)
8: their(351), million(364)
9: employers(402), their(407)
10: people(444), their(447)
11: study(468), study(511)
12: communication(565), it(569)
13: stereotype(581), it(600)
14: skills(609), they(613)
15: Palvetzian(625), Palvetzian(642)


In [12]:
# What does 'their' refers to

print(doc1._.coref_chains.resolve(doc1[33]))

[people]


## TEXT2

In [13]:
#2 Text file
with open ("A1_data/5c182ac21e67d78e277944ad.txt", "r", encoding='utf-8') as f:
    text2 = f.read()

doc2 = nlp(text2)

In [14]:
doc2._.coref_chains.print()

0: Lonechild(1), herself(12), her(18), she(42), She(75), She(87), her(91), Lonechild(111), she(113), her(115), her(123), she(158), Lonechild(162), She(181), she(214), she(221), Lonechild(225), she(227), she(274), Lonechild(278), she(280), she(327), She(360), she(373), Lonechild(388), Lonechild(403), Lonechild(407), her(411), She(417), she(419), Her(439), she(453), she(497), Lonechild(555), she(592), her(640), Lonechild(645), she(694)
1: future(197), future(232)
2: letter(305), they(315), they(337)
3: youth(343), it(345)
4: treatment(503), it(520), It(560)
5: people(531), they(536)
6: Council(665), it(669)


In [15]:
# what does 'her' in index0 refer to
print(doc2._.coref_chains.resolve(doc2[18]))

[Lonechild]


## TEXT3

In [16]:
#3 Text file
with open ("A1_data/5c28972a795bd2fac69fa974.txt", "r", encoding='utf-8') as f:
    text3 = f.read()

doc3 = nlp(text3)

In [17]:
doc3._.coref_chains.print()

0: party(3), its(13)
1: candidate(53), She(65)
2: Wang(111), she(118), Wang(149), she(152), she(154), she(165), she(202)
3: poem(117), poem(145), it(157)
4: Singh(162), He(184), He(192)
5: Wang(244), her(246)
6: Trudeau(271), Trudeau(301)
7: South(283), South(310)
8: Singh(353), he(362), Singh(391), he(394), he(409), his(416)
9: fears(368), they(383)
10: byelection(467), byelection(496)
11: Singh(472), he(477), he(481)
12: candidate(527), candidate(557)
13: Burnaby(616), country(621)


In [18]:
# what does "its" refer to
print(doc3._.coref_chains.resolve(doc3[13]))

[party]


## TEXT4

In [19]:
#4 Text file
with open ("A1_data/5c29beda1e67d78e27b74939.txt", "r", encoding='utf-8') as f:
    text4 = f.read()

doc4 = nlp(text4)

In [20]:
doc4._.coref_chains.print()

0: Australia(0), Australia(55)
1: Canadians(9), Canadians(86), Canadians(94)
2: China(12), country(16), China(83), China(97), China(143)
3: Payne(43), men(65)
4: Meng(201), She(226), her(310), she(317), She(337)
5: government(276), its(279)
6: China(290), China(371)
7: Canada(298), Canada(328)
8: concerns(301), them(304)
9: arrest(401), it(459)
10: Canada(446), Canada(465)
11: societies(539), their(571), their(596)
12: statement(560), it(626)
13: EU(563), EU(609)


In [21]:
# what does "their" refer to

print(doc4._.coref_chains.resolve(doc4[596]))

[societies]


## TEXT5

In [22]:
#5 Text file
with open ("A1_data/5c489df91e67d78e271d66c5.txt", "r", encoding='utf-8') as f:
    text5 = f.read()

doc5 = nlp(text5)

In [23]:
doc5._.coref_chains.print()

0: stretch(4), it(16)
1: Flames(10), Flames(33)
2: Backlund(22), Backlund(117)
3: break(58), break(144)
4: Rittich(202), his(215)
5: Calgary(225), Calgary(256)
6: Carolina(240), Carolina(275)
7: 1:54(263), it(273)
8: Hurricanes(289), their(303)
9: East(313), It(387)
10: Mrazek(317), He(332)
11: game(348), game(382)
12: teams(371), teams(394)
13: Jankowski(437), Jankowski(458), he(469), he(488)
14: corner(462), corner(505)
15: Mrazek(474), Mrazek(508), Mrazek(557)
16: Hurricanes(480), Hurricanes(522)
17: That(617), it(637)
18: opposite(640), they(645)
19: break(658), break(691)
20: Peters(684), Peters(707)
21: looks(749), them(758)


In [24]:
# What does 'it' refer to

print(doc5._.coref_chains.resolve(doc5[16]))

[stretch]


## Side note: visualizations using another five text
If you want to see this all in a much prettier format, you can use [displacy](https://spacy.io/usage/visualizers). 

In [25]:
from spacy import displacy

## TEXT6

In [26]:
# TEXT6

with open ("A1_data/5c286d031e67d78e27b3f17b.txt", "r", encoding='utf-8') as f:
    text6 = f.read()
    
doc6 = nlp(text6)

options = {"ents": ["PERSON"],
          "colors": {"PERSON": "lightsteelblue"}}

displacy.render(doc6, style="ent", options=options, jupyter=True)

In [27]:
# We would like to use TEXT6 that also includes other type of entities all at once
# 


with open ("A1_data/5c286d031e67d78e27b3f17b.txt", "r", encoding='utf-8') as f:
    text6 = f.read()
    
doc6 = nlp(text6)

options = {"ents": ["PERSON", "ORG", "GPE"],  
           "colors": { "PERSON": "lightblue",
                        "ORG":  "lightyellow",
                        "GPE":  "lightgreen", }}



displacy.render(doc6, style="ent", options=options, jupyter=True)

# the data does not seem to be right, some of the data is mislabelled

## TEXT7

In [28]:
# let's continue with TEXT7

with open ("A1_data/5c287b841e67d78e27b4163e.txt", "r", encoding='utf-8') as f:
    text7 = f.read()
    
doc7 = nlp(text7)

options = {"ents": ["PERSON", "ORG", "GPE"],  
           "colors": { "PERSON": "lightblue",
                        "ORG":  "lightyellow",
                        "GPE":  "lightgreen", }}

displacy.render(doc7, style="ent", options=options, jupyter=True)

## TEXT8

In [29]:
# TEXT8

with open ("A1_data/5c49e1261e67d78e2721712b.txt", "r", encoding='utf-8') as f:
    text8 = f.read()
    
doc8 = nlp(text8)

options = {"ents": ["PERSON", "ORG", "GPE"],  
           "colors": { "PERSON": "lightblue",
                        "ORG":  "lightyellow",
                        "GPE":  "lightgreen", }}

displacy.render(doc8, style="ent", options=options, jupyter=True)

## TEXT9

In [30]:
# TEXT9

with open ("A1_data/5c483b26795bd2b724e92a68.txt", "r", encoding='utf-8') as f:
    text9 = f.read()
    
doc9 = nlp(text9)

options = {"ents": ["PERSON", "ORG", "GPE"],  
           "colors": { "PERSON": "lightblue",
                        "ORG":  "lightyellow",
                        "GPE":  "lightgreen", }}

displacy.render(doc9, style="ent", options=options, jupyter=True)

## TEXT10

In [31]:
with open ("A1_data/5c52b36a1e67d78e273d029c.txt", "r", encoding='utf-8') as f:
    text10 = f.read()
    
doc10 = nlp(text10)

options = {"ents": ["PERSON", "ORG", "GPE"],  
           "colors": { "PERSON": "lightblue",
                        "ORG":  "lightyellow",
                        "GPE":  "lightgreen", }}

displacy.render(doc10, style="ent", options=options, jupyter=True)

## Part 4: Run the quote extraction from Assignment 1
I suggest using the Matcher quote extraction system from A1, but, if you implemented your own version, or improved on this one, feel free to use that instead.

In [33]:
#import what we need for this
from spacy.matcher import Matcher

matcher = Matcher(nlp.vocab)

In [35]:
# run the matcher for text1

matcher = Matcher(nlp.vocab)

pattern_S_Q_A = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "+"}, 
                 #{"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 {'IS_PUNCT': True, "OP": "*"},
                 {'ORTH': '"'},
                 {'POS': "PRON", "OP": "+" }, 
                 # {'POS': "VERB"},
                 {'IS_ALPHA': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]  

pattern_S_Q_B = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "+"}, 
                 # {"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 # {'IS_PUNCT': True, "OP": "*"},
                 # {'ORTH': '"'},
                 #{'POS': "VERB"},
                 {'POS': "PROPN", "OP": "+" },
                 {'POS': "VERB"},
                 #{'POS': "VERB"}
                 #{'IS_ALPHA': True, "OP": "+"},
                 # {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]
pattern_S_Q_C = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "*"}, 
                 #{"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 {'IS_PUNCT': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "+"},
                 {'ORTH': '"'},
                 {'POS': "PRON", "OP": "+" }, 
                 # {'POS': "VERB"},
                 {'IS_ALPHA': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]  

matcher.add("QUOTES_S_Q", [pattern_S_Q_A, pattern_S_Q_B, pattern_S_Q_C], greedy = 'LONGEST')
doc = nlp(text1)
matches = matcher(doc)
#matches.sort (key = lambda x:x[1])
print(len(matches))

for match in matches[:10]:
    print(match, doc[match[1]:match[2]])
    print("\n") #blank space between outputs  



0


In [36]:
# run the matcher for text2

matcher = Matcher(nlp.vocab)

pattern_S_Q_A = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "+"}, 
                 #{"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 {'IS_PUNCT': True, "OP": "*"},
                 {'ORTH': '"'},
                 {'POS': "PRON", "OP": "+" }, 
                 # {'POS': "VERB"},
                 {'IS_ALPHA': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]  

pattern_S_Q_B = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "+"}, 
                 # {"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 # {'IS_PUNCT': True, "OP": "*"},
                 # {'ORTH': '"'},
                 #{'POS': "VERB"},
                 {'POS': "PROPN", "OP": "+" },
                 {'POS': "VERB"},
                 #{'POS': "VERB"}
                 #{'IS_ALPHA': True, "OP": "+"},
                 # {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]

pattern_S_Q_C = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "*"}, 
                 #{"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 {'IS_PUNCT': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "+"},
                 {'ORTH': '"'},
                 {'POS': "PRON", "OP": "+" }, 
                 # {'POS': "VERB"},
                 {'IS_ALPHA': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]  

matcher.add("QUOTES_S_Q", [pattern_S_Q_A, pattern_S_Q_B, pattern_S_Q_C], greedy = 'LONGEST')
doc = nlp(text2)
matches = matcher(doc)
#matches.sort (key = lambda x:x[1])
print(len(matches))

for match in matches[:10]:
    print(match, doc[match[1]:match[2]])
    print("\n") #blank space between outputs  
    


1
(14911067671772931984, 677, 697) "As young people we understand that this land is foundational to who we are," she said.




In [37]:
# run the matcher for text3

matcher = Matcher(nlp.vocab)

pattern_S_Q_A = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "+"}, 
                 #{"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 {'IS_PUNCT': True, "OP": "*"},
                 {'ORTH': '"'},
                 {'POS': "PRON", "OP": "+" }, 
                 # {'POS': "VERB"},
                 {'IS_ALPHA': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]  

pattern_S_Q_B = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "+"}, 
                 # {"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 # {'IS_PUNCT': True, "OP": "*"},
                 # {'ORTH': '"'},
                 #{'POS': "VERB"},
                 {'POS': "PROPN", "OP": "+" },
                 {'POS': "VERB"},
                 #{'POS': "VERB"}
                 #{'IS_ALPHA': True, "OP": "+"},
                 # {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]

pattern_S_Q_C = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "*"}, 
                 #{"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 {'IS_PUNCT': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "+"},
                 {'ORTH': '"'},
                 {'POS': "PRON", "OP": "+" }, 
                 # {'POS': "VERB"},
                 {'IS_ALPHA': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]  

matcher.add("QUOTES_S_Q", [pattern_S_Q_A, pattern_S_Q_B, pattern_S_Q_C], greedy = 'LONGEST')
doc = nlp(text3)
matches = matcher(doc)
#matches.sort (key = lambda x:x[1])
print(len(matches))

for match in matches[:10]:
    print(match, doc[match[1]:match[2]])
    print("\n") #blank space between outputs  
    

0


In [38]:
# run the matcher for text4

matcher = Matcher(nlp.vocab)

pattern_S_Q_A = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "+"}, 
                 #{"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 {'IS_PUNCT': True, "OP": "*"},
                 {'ORTH': '"'},
                 {'POS': "PRON", "OP": "+" }, 
                 # {'POS': "VERB"},
                 {'IS_ALPHA': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]  

pattern_S_Q_B = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "+"}, 
                 # {"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 # {'IS_PUNCT': True, "OP": "*"},
                 # {'ORTH': '"'},
                 #{'POS': "VERB"},
                 {'POS': "PROPN", "OP": "+" },
                 {'POS': "VERB"},
                 #{'POS': "VERB"}
                 #{'IS_ALPHA': True, "OP": "+"},
                 # {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]

pattern_S_Q_C = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "*"}, 
                 #{"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 {'IS_PUNCT': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "+"},
                 {'ORTH': '"'},
                 {'POS': "PRON", "OP": "+" }, 
                 # {'POS': "VERB"},
                 {'IS_ALPHA': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]  

matcher.add("QUOTES_S_Q", [pattern_S_Q_A, pattern_S_Q_B, pattern_S_Q_C], greedy = 'LONGEST')
doc = nlp(text4)
matches = matcher(doc)
#matches.sort (key = lambda x:x[1])
print(len(matches))

for match in matches[:10]:
    print(match, doc[match[1]:match[2]])
    print("\n") #blank space between outputs  
    


0


In [39]:
# run the matcher for text5

matcher = Matcher(nlp.vocab)

pattern_S_Q_A = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "+"}, 
                 #{"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 {'IS_PUNCT': True, "OP": "*"},
                 {'ORTH': '"'},
                 {'POS': "PRON", "OP": "+" }, 
                 # {'POS': "VERB"},
                 {'IS_ALPHA': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]  

pattern_S_Q_B = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "+"}, 
                 # {"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 # {'IS_PUNCT': True, "OP": "*"},
                 # {'ORTH': '"'},
                 #{'POS': "VERB"},
                 {'POS': "PROPN", "OP": "+" },
                 {'POS': "VERB"},
                 #{'POS': "VERB"}
                 #{'IS_ALPHA': True, "OP": "+"},
                 # {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]
pattern_S_Q_C = [{'ORTH': '"'},
                 {'IS_ALPHA': True, "OP": "*"}, 
                 #{"ORTH":{"IN": [".","!", "?", ",", ";"]}},
                 {'IS_PUNCT': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "+"},
                 {'ORTH': '"'},
                 {'POS': "PRON", "OP": "+" }, 
                 # {'POS': "VERB"},
                 {'IS_ALPHA': True, "OP": "+"},
                 {'IS_PUNCT': True, "OP": "*"},
                 #{'ORTH': '"'}
                 ]  

matcher.add("QUOTES_S_Q", [pattern_S_Q_A, pattern_S_Q_B, pattern_S_Q_C], greedy = 'LONGEST')
doc = nlp(text5)
matches = matcher(doc)
#matches.sort (key = lambda x:x[1])
print(len(matches))

for match in matches[:10]:
    print(match, doc[match[1]:match[2]])
    print("\n") #blank space between outputs  
    



3
(14911067671772931984, 727, 754) "We played the best when the game was on the line in the third and I thought we had some great looks that made Mrazek make


(14911067671772931984, 706, 726) " Peters said. "I like the direction which we are trending and the contributions throughout the lineup.


(14911067671772931984, 350, 356) " Carolina coach Rod Brind'Amour said




## Your turn

Check instructions on Canvas for what to do and what to submit. 