# Spacy Library

- en_core_web_sm: English multi-task CNN trained on OntoNotes. Size – 11 MB
- en_core_web_md: English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Size – 91 MB
- en_core_web_lg: English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Size – 789 MB
- User Manual : https://spacy.io/usage/spacy-101

In [9]:
import spacy
nlp = spacy.load("en_core_web_sm/en_core_web_sm-3.4.1/")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
    print(token.text, token.pos_, token.dep_)

Apple PROPN nsubj
is AUX aux
looking VERB ROOT
at ADP prep
buying VERB pcomp
U.K. PROPN dobj
startup NOUN dobj
for ADP prep
$ SYM quantmod
1 NUM compound
billion NUM pobj


In [None]:
#spacy.load("en_core_web_sm/en_core_web_sm-3.4.1/")
#spacy.load('en_core_web_md/en_core_web_md-3.4.1/')
#spacy.load('en_core_web_lg/en_core_web_lg-3.4.1/')

# 1. Part-of-Speech(POS) Tagging

In [10]:
import spacy
nlp=spacy.load('en_core_web_lg/en_core_web_lg-3.4.1/')
 
text='It took me more than two hours to translate a few pages of English.'

for token in nlp(text):
    print(token.text, '=>',token.pos_,'=>',token.tag_)

It => PRON => PRP
took => VERB => VBD
me => PRON => PRP
more => ADJ => JJR
than => ADP => IN
two => NUM => CD
hours => NOUN => NNS
to => PART => TO
translate => VERB => VB
a => DET => DT
few => ADJ => JJ
pages => NOUN => NNS
of => ADP => IN
English => PROPN => NNP
. => PUNCT => .


# 2. Dependency Parsing

In [13]:
import spacy
nlp=spacy.load('en_core_web_lg/en_core_web_lg-3.4.1/')

text='It took me more than two hours to translate a few pages of English.'

for token in nlp(text):
 print(token.text,'=>',token.dep_,'=>',token.head.text)

It => nsubj => took
took => ROOT => took
me => dative => took
more => amod => two
than => quantmod => two
two => nummod => hours
hours => dobj => took
to => aux => translate
translate => xcomp => took
a => quantmod => few
few => amod => pages
pages => dobj => translate
of => prep => pages
English => pobj => of
. => punct => took


In [14]:
from spacy import displacy
displacy.render(nlp(text),jupyter=True)

# 3. Constituency Parsing

In [15]:
import benepar
benepar.download('benepar_en2')

[nltk_data] Error loading benepar_en2: Package 'benepar_en2' not found
[nltk_data]     in index


False

In [16]:
from benepar.spacy_plugin import BeneparComponent

# Loading spaCy’s en model and adding benepar model to its pipeline
nlp = spacy.load('en')
nlp.add_pipe(BeneparComponent('benepar_en2'))

text='It took me more than two hours to translate a few pages of English.'

# Generating a parse tree for the text
list(nlp(text).sents)[0]._.parse_string



OSError: [E941] Can't find model 'en'. It looks like you're trying to load a model from a shortcut, which is obsolete as of spaCy v3.0. To load the model, use its full name instead:

nlp = spacy.load("en_core_web_sm")

For more details on the available models, see the models directory: https://spacy.io/models. If you want to create a blank model, use spacy.blank: nlp = spacy.blank("en")

Source : https://www.analyticsvidhya.com/blog/2020/07/part-of-speechpos-tagging-dependency-parsing-and-constituency-parsing-in-nlp/