# Spacy Assessment

we'll be using the short story [_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) by Ambrose Bierce (1890). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

In [1]:
#importing modules
import spacy

In [2]:
#Loading the english package by spacy
nlp=spacy.load('en_core_web_sm')

In [3]:
#Creating doc object by reading text files
with open('owlcreek.txt') as file:
    doc=nlp(file.read())

In [4]:
#Checking the doc object
doc[:36]

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

In [5]:
#No. of tokens in the file
len(doc)

4835

In [6]:
#No. of sentences in the file
sents=[sent for sent in doc.sents]
len(sents)

319

In [7]:
#Print second sentence in the document(index starts from 0)
print(sents[1].text)

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.


In [8]:
#Print the token text,pos,dep, lemma
#Dep: Syntactic dependency, i.e. the relation between tokens.
for token in sents[1]:
    print(token.text,token.pos_,token.dep_,token.lemma_)

A DET det a
man NOUN nsubj man
stood VERB ROOT stand
upon SCONJ prep upon
a DET det a
railroad NOUN compound railroad
bridge NOUN pobj bridge
in ADP prep in
northern ADJ amod northern
Alabama PROPN pobj Alabama
, PUNCT punct ,
looking VERB advcl look
down ADV prep down

 SPACE npadvmod 

into ADP prep into
the DET det the
swift ADJ amod swift
water NOUN pobj water
twenty NUM nummod twenty
feet NOUN npadvmod foot
below ADV advmod below
. PUNCT punct .


In [9]:
#In proper format(Above question)
for token in sents[1]:
    print(f'{token.text:{15}} {token.pos_:{5}} {token.dep_:{10}} {token.lemma_:{15}}')

A               DET   det        a              
man             NOUN  nsubj      man            
stood           VERB  ROOT       stand          
upon            SCONJ prep       upon           
a               DET   det        a              
railroad        NOUN  compound   railroad       
bridge          NOUN  pobj       bridge         
in              ADP   prep       in             
northern        ADJ   amod       northern       
Alabama         PROPN pobj       Alabama        
,               PUNCT punct      ,              
looking         VERB  advcl      look           
down            ADV   prep       down           

               SPACE npadvmod   
              
into            ADP   prep       into           
the             DET   det        the            
swift           ADJ   amod       swift          
water           NOUN  pobj       water          
twenty          NUM   nummod     twenty         
feet            NOUN  npadvmod   foot           
below           ADV 

In [10]:
#vocab words in nlp en_core_web_sm (spacy)
for name in nlp.vocab:
    print(name.text)

nuthin
ü.
succeeded
Kan
softly
if
When's
c.
strength
descent
:-}
subordinates
From
full
lifted
Why's
comes
next
arms
distinctly
sergeant
those’s
brain
Who’s
words
bodies
use
feared
pointing
inclining
n.
temporary
thought
martinet
fetching
interminable
>.<
ears
it’s
crowning
stream
rode
:()
)-:
X
make
instantly
foot
undulations
insects
4a.m.
5p.m.
co.
shot
effort!--what
lost
drink
━
sand
(-_-)
measured
Ariz
had
Objects
vs.
x.
toward
news
When’s
percussion
forcing
Calif
does
nothin’
’S
attached
gaze
effaced
mile
sure
Would
do
’s
N.M.
view
b
swirl
South
O.o
dear
lesson
lady
received
never
’’
explosion
etiquette
hanged
ought
good
vertical
northward
observe
3p.m
faintly
scout
h.
forms
where's
outcome
enchanting
relate
black
What's
villainous
former
civilian
course
men
ä
That's
sound
throat
approached
11a.m
flashed
loud
leap
strokes
rose
breath
;D
Miss
Ga
Must
said
sense
other
between
9p.m
revelation
captain
sunshine
By
Nev.
What
these's
z.
by
again
rising
His
Aim
get
r
9
step
an
ran
v.s
rea

In [11]:
#Write a matcher called 'Swimming' that finds both occurrences of the phrase "swimming vigorously" in the text
#HINT: You should include an `'IS_SPACE': True` pattern between the two words!

from spacy.matcher import Matcher
matcher=Matcher(nlp.vocab)

In [12]:
# Create a pattern and add it to matcher:
pattern=[{'LOWER':'swimming'},{'IS_SPACE':True,'OP':'*'},{'LOWER':'vigorously'}]

#'OP':'*' is the option and * is we are looking for 0 or more (spaces in between)
matcher.add('Swimming',[pattern])

In [13]:
doc

The history saving thread hit an unexpected error (OperationalError('database is locked')).History will not be written to the database.

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  The man's hands were behind
his back, the wrists bound with a cord.  A rope closely encircled his
neck.  It was attached to a stout cross-timber above his head and the
slack fell to the level of his knees.  Some loose boards laid upon the
ties supporting the rails of the railway supplied a footing for him
and his executioners--two private soldiers of the Federal army,
directed by a sergeant who in civil life may have been a deputy
sheriff.  At a short remove upon the same temporary platform was an
officer in the uniform of his rank, armed.  He was a captain.  A
sentinel at each end of the bridge stood with his rifle in the
position known as "support," that is to say, vertical in front of the
left shoulder, the hammer resting on the forearm thrown straight
across the chest--a formal and unnatural position, enforcing an ere




In [14]:
# Create a list of matches called "found_matches" and print the list:
#1274 is the 
found_matches=matcher(doc)
print(found_matches)

[(12881893835109366681, 1274, 1277), (12881893835109366681, 3609, 3612)]


In [15]:
#Print the text surrounding each found match
print(doc[1265:1290])

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home


In [16]:
print(doc[3600:3615])

all this over his shoulder; he was now swimming
vigorously with the current


In [17]:
#Print the sentence that contains each found match
for sent in sents:
    if found_matches[0][1] < sent.end:
        print(sent)
        break

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.


In [18]:
found_matches[0][1]

1274