# NLP Basics Assessment

For this assessment we'll be using the short story [_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) by Ambrose Bierce (1890). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

**1. `owlcreek.txt`** owlcreek.txt 파일에서 Doc 객체 생성 <br>
> HINT: Use `with open('../TextFiles/owlcreek.txt') as f:`

In [2]:
# Enter your code here:
with open('../TextFiles/owlcreek.txt') as f:
    doc = nlp(f.read())

In [3]:
# Run this cell to verify it worked:

doc[:36]

AN OCCURRENCE AT OWL CREEK BRIDGE

by Ambrose Bierce

I

A man stood upon a railroad bridge in northern Alabama, looking down
into the swift water twenty feet below.  

**2. 파일에 포함된 토큰은 몇 개?**

In [4]:
len(doc)

4835

**3. 파일에 포함된 문장 수?**<br>HINT: You'll want to build a list first!

In [6]:
sents = [sent for sent in doc.sents]
len(sents)

212

**4. 문서의 두 번째 문장 인쇄**<br> HINT: Indexing starts at zero, and the title counts as the first sentence.

In [7]:
print(sents[1].text)

 The man's hands were behind
his back, the wrists bound with a cord.


**5. 위 문장의 각 토큰에 대해 `text`, `POS` 태그, `dep` 태그 및 `lemma`를 출력<br>
CHALLENGE: Have values line up in columns in the print output.**

In [9]:
# NORMAL SOLUTION:
for token in sents[1]:
    print(token.text, token.pos_, token.dep_, token.lemma_)

  SPACE dep  
The DET det the
man NOUN poss man
's PART case 's
hands NOUN nsubj hand
were AUX ROOT be
behind ADP prep behind

 SPACE dep 

his PRON poss his
back NOUN attr back
, PUNCT punct ,
the DET det the
wrists NOUN appos wrist
bound VERB acl bind
with ADP prep with
a DET det a
cord NOUN pobj cord
. PUNCT punct .


In [14]:
# CHALLENGE SOLUTION:
for token in sents[1]:
    print(f'{token.text:{15}} {token.pos_:{5}} {token.dep_:{10}} {token.lemma_:{15}}')

                SPACE dep                       
The             DET   det        the            
man             NOUN  poss       man            
's              PART  case       's             
hands           NOUN  nsubj      hand           
were            AUX   ROOT       be             
behind          ADP   prep       behind         

               SPACE dep        
              
his             PRON  poss       his            
back            NOUN  attr       back           
,               PUNCT punct      ,              
the             DET   det        the            
wrists          NOUN  appos      wrist          
bound           VERB  acl        bind           
with            ADP   prep       with           
a               DET   det        a              
cord            NOUN  pobj       cord           
.               PUNCT punct      .              


**6. 텍스트에서 "swimming vigorously"라는 문구가 모두 나타나는 것을 찾는 'Swimming'이라는 매처를 작성**<br>
HINT: You should include an `'IS_SPACE': True` pattern between the two words!

In [15]:
# Import the Matcher library:

from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

In [16]:
# Create a pattern and add it to matcher:
pattern = [{'LOWER': 'swimming'}, {'IS_SPACE': True, 'OP':'*'} ,{'LOWER': 'vigorously'}]

matcher.add('Swimming', [pattern])

In [17]:
# Create a list of matches called "found_matches" and print the list:
found_matches = matcher(doc)
found_matches

[(12881893835109366681, 1274, 1277), (12881893835109366681, 3609, 3612)]

**7. 검색된 각 일치 항목의 텍스트를 인쇄**

In [18]:
doc[1265:1290]

By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home

In [19]:
doc[3600:3615]

all this over his shoulder; he was now swimming
vigorously with the current

**EXTRA CREDIT:<br>검색된 각 일치 항목이 포함된 문장 출력**

In [21]:
for sent in sents:
    if found_matches[0][1] < sent.end:
        print(sent)
        break

 By diving I could evade the bullets and, swimming
vigorously, reach the bank, take to the woods and get away home.


In [22]:
for sent in sents:
    if found_matches[1][1] < sent.end:
        print(sent)
        break



The hunted man saw all this over his shoulder; he was now swimming
vigorously with the current.


### Great Job!