# Spacy Library

- en_core_web_sm: English multi-task CNN trained on OntoNotes. Size – 11 MB
- en_core_web_md: English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Size – 91 MB
- User Manual : https://spacy.io/usage/spacy-101

In [25]:
#spacy.load("en_core_web_sm/en_core_web_sm-3.4.1/")
#spacy.load('en_core_web_md/en_core_web_md-3.4.1/')

In [24]:
import spacy

nlp = spacy.load('en_core_web_sm/en_core_web_sm-3.4.1/')
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
            token.shape_, token.is_alpha, token.is_stop)

Apple Apple PROPN NNP nsubj Xxxxx True False
is be AUX VBZ aux xx True True
looking look VERB VBG ROOT xxxx True False
at at ADP IN prep xx True True
buying buy VERB VBG pcomp xxxx True False
U.K. U.K. PROPN NNP dobj X.X. False False
startup startup NOUN NN dobj xxxx True False
for for ADP IN prep xxx True True
$ $ SYM $ quantmod $ False False
1 1 NUM CD compound d False False
billion billion NUM CD pobj xxxx True False


## 1. Part-of-Speech(POS) Tagging

### 1. Universal POS Tags
 <img src = "figures/universalpostag.png" width="400">

 **Note :** Read more https://universaldependencies.org/u/pos/

### 2 Detailed POS Tags

In [1]:
import spacy
nlp=spacy.load('en_core_web_sm/en_core_web_sm-3.4.1/')
 
text='It took me more than two hours to translate a few pages of English.'

for token in nlp(text):
    print(token.text, '=>',token.pos_,'=>',token.tag_)

It => PRON => PRP
took => VERB => VBD
me => PRON => PRP
more => ADJ => JJR
than => ADP => IN
two => NUM => CD
hours => NOUN => NNS
to => PART => TO
translate => VERB => VB
a => DET => DT
few => ADJ => JJ
pages => NOUN => NNS
of => ADP => IN
English => PROPN => NNP
. => PUNCT => .


## 2. Dependency Parsing

In [27]:
import spacy
nlp=spacy.load('en_core_web_sm/en_core_web_sm-3.4.1/')

text='It took me more than two hours to translate a few pages of English.'

for token in nlp(text):
 print(token.text,'=>',token.dep_,'=>',token.head.text)

It => nsubj => took
took => ROOT => took
me => dobj => took
more => amod => two
than => quantmod => two
two => nummod => hours
hours => dobj => took
to => aux => translate
translate => xcomp => took
a => quantmod => few
few => amod => pages
pages => dobj => translate
of => prep => pages
English => pobj => of
. => punct => took


In [31]:
from spacy import displacy
displacy.render(nlp(text),jupyter=True)

## 3. Constituency Parsing

- VP for verb phrase
- NP for noun phrases

In [8]:
import tensorflow
# %tensorflow_version 1.x
import benepar
benepar.download('Test')

[nltk_data] Error loading Test: Package 'Test' not found in index


False

In [7]:
from benepar.spacy_plugin import BeneparComponent

# Loading spaCy’s en model and adding benepar model to its pipeline
nlp=spacy.load('en_core_web_sm/en_core_web_sm-3.4.1/')
nlp.add_pipe(BeneparComponent('Test'))

text='It took me more than two hours to translate a few pages of English.'

# Generating a parse tree for the text
list(nlp(text).sents)[0]._.parse_string

LookupError: 
**********************************************************************
  Resource [93mTest[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> benepar.download('Test')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mmodels/Test[0m

  Searched in:
    - 'C:\\Users\\Guntsv/nltk_data'
    - 'c:\\Python3.10.4\\nltk_data'
    - 'c:\\Python3.10.4\\share\\nltk_data'
    - 'c:\\Python3.10.4\\lib\\nltk_data'
    - 'C:\\Users\\Guntsv\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
**********************************************************************


Source : https://www.analyticsvidhya.com/blog/2020/07/part-of-speechpos-tagging-dependency-parsing-and-constituency-parsing-in-nlp/