### spaCy’s Statistical Models

- These models are the power engines of spaCy. These models enable spaCy to perform several NLP related tasks, such as part-of-speech tagging, named entity recognition, and dependency parsing.

- I’ve listed below the different statistical models in spaCy along with their specifications:

     en_core_web_sm: English multi-task CNN trained on OntoNotes. Size – 11 MB
     
     en_core_web_md: English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Size – 91 MB
     
     en_core_web_lg: English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Size – 789 MB

In [8]:
import spacy
# spacy.cli.download("en_core_web_sm")
nlp = spacy.load("en_core_web_sm")

#### spaCy’s Processing Pipeline
<img src="6.webp">

In [11]:
# Create an nlp object
doc = nlp("He went to play basketball")

## You can use the below code to figure out the active pipeline components:

nlp.pipe_names

##siable pipline
# nlp.disable_pipes('tagger', 'parser')

['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']

## 1. Part-of-Speech (POS) Tagging using spaCy

In [12]:
import spacy 
nlp = spacy.load('en_core_web_sm')

# Create an nlp object
doc = nlp("He went to play basketball")
 
# Iterate over the tokens
for token in doc:
    # Print the token and its part-of-speech tag
    print(token.text, "-->", token.pos_)

He --> PRON
went --> VERB
to --> PART
play --> VERB
basketball --> NOUN


In [15]:
# In case you are not sure about any of these tags, then you can simply use spacy.explain() to figure it out:
spacy.explain("PART")

'particle'

## 2. Dependency Parsing using spaCy

- Every sentence has a grammatical structure to it and with the help of dependency parsing, we can extract this structure. It can also be thought of as a directed graph, where nodes correspond to the words in the sentence and the edges between the nodes are the corresponding dependencies between the word.

<img src="7.webp">

In [16]:
# dependency parsing
for token in doc:
    print(token.text, "-->", token.dep_)

He --> nsubj
went --> ROOT
to --> aux
play --> advcl
basketball --> dobj


The dependency tag ROOT denotes the main verb or action in the sentence. The other words are directly or indirectly connected to the ROOT word of the sentence. You can find out what other tags stand for by executing the code below:

In [17]:
spacy.explain("nsubj"), spacy.explain("ROOT"), spacy.explain("aux"), spacy.explain("advcl"), spacy.explain("dobj")

('nominal subject',
 None,
 'auxiliary',
 'adverbial clause modifier',
 'direct object')

## 3. Named Entity Recognition using spaCy

- Let’s first understand what entities are. Entities are the words or groups of words that represent information about common things such as persons, locations, organizations, etc. These entities have proper names.

For example, consider the following sentence:
<img src="8.webp">

In this sentence, the entities are “Donald Trump”, “Google”, and “New York City”.

Let’s now see how spaCy recognizes named entities in a sentence.

In [22]:
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
 
for ent in doc.ents:
    print(ent.text, ent.label_)

Apple ORG
U.K. GPE
$1 billion MONEY


In [25]:
spacy.explain("GPE")

'Countries, cities, states'