## Getting started with Spacy

Installation instructions

```
pip install spacy
python -m spacy download en
pip install nltk
```

In [1]:
%pip install spacy nltk

Collecting spacy
  Downloading spacy-3.8.7-cp39-cp39-macosx_11_0_arm64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 749 kB/s eta 0:00:01
[?25hCollecting nltk
  Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 9.5 MB/s eta 0:00:01
[?25hCollecting requests<3.0.0,>=2.13.0
  Downloading requests-2.32.4-py3-none-any.whl (64 kB)
[K     |████████████████████████████████| 64 kB 5.1 MB/s  eta 0:00:01
[?25hCollecting srsly<3.0.0,>=2.4.3
  Downloading srsly-2.5.1-cp39-cp39-macosx_11_0_arm64.whl (635 kB)
[K     |████████████████████████████████| 635 kB 7.9 MB/s eta 0:00:01
[?25hCollecting thinc<8.4.0,>=8.3.4
  Downloading thinc-8.3.6-cp39-cp39-macosx_11_0_arm64.whl (848 kB)
[K     |████████████████████████████████| 848 kB 11.7 MB/s eta 0:00:01
[?25hCollecting murmurhash<1.1.0,>=0.28.0
  Downloading murmurhash-1.0.13-cp39-cp39-macosx_11_0_arm64.whl (26 kB)
Collecting preshed<3.1.0,>=3.0.2
  Downloading preshed-3.0.10-cp39-c

In [2]:
!python -m spacy download en

[38;5;3m⚠ As of spaCy v3.0, shortcuts like 'en' are deprecated. Please use the
full pipeline package name 'en_core_web_sm' instead.[0m
Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m4.5 MB/s[0m  [33m0:00:03[0ma [36m0:00:01[0m[36m0:00:01[0m02[0m
[?25hInstalling collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.8.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [3]:
import spacy



In [4]:
# tokenization: Sentence & Word Tokenization In Spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("British Airways will drop its flights to Beijing from October. It feels the impact of being banned from Russian airspace.")

In [5]:
for sentence in doc.sents:
    print(sentence)

British Airways will drop its flights to Beijing from October.
It feels the impact of being banned from Russian airspace.


In [6]:
for sentence in doc.sents:
    for word in sentence:
        print(word)

British
Airways
will
drop
its
flights
to
Beijing
from
October
.
It
feels
the
impact
of
being
banned
from
Russian
airspace
.


## Pipeline

https://spacy.io/usage/processing-pipelines#pipelines


In [7]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [8]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x12308f1c0>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x12308fc40>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x122eff9e0>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x1230eef00>),
 ('lemmatizer', <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x123133240>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x122eff890>)]

In [9]:
# build blank pipeline and add new at run time
source_nlp = spacy.load("en_core_web_sm")

nlp = spacy.blank("en")
nlp.pipe_names

[]

In [10]:

nlp.add_pipe("ner", source=source_nlp)
nlp.pipe_names

['ner']

## POS Tagging

In [11]:
doc = nlp("Captain america ate 100$ of Burger. Then he said I can do this all day.")

for token in doc:
    print(token, " | ", token.pos_, spacy.explain(token.pos_), " | ", token.lemma_)

Captain  |   None  |  
america  |   None  |  
ate  |   None  |  
100  |   None  |  
$  |   None  |  
of  |   None  |  
Burger  |   None  |  
.  |   None  |  
Then  |   None  |  
he  |   None  |  
said  |   None  |  
I  |   None  |  
can  |   None  |  
do  |   None  |  
this  |   None  |  
all  |   None  |  
day  |   None  |  
.  |   None  |  




In [12]:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Elon flew to mars yesterday. He carried biryani masala with him")

for token in doc:
    # print(token," | ", token.pos_, " | ", spacy.explain(token.pos_))
    print(token," | ", token.pos_, " | ", 
          spacy.explain(token.pos_), " | ", token.tag_, " | ",
            spacy.explain(token.tag_))

Elon  |  PROPN  |  proper noun  |  NNP  |  noun, proper singular
flew  |  VERB  |  verb  |  VBD  |  verb, past tense
to  |  ADP  |  adposition  |  IN  |  conjunction, subordinating or preposition
mars  |  NOUN  |  noun  |  NNS  |  noun, plural
yesterday  |  NOUN  |  noun  |  NN  |  noun, singular or mass
.  |  PUNCT  |  punctuation  |  .  |  punctuation mark, sentence closer
He  |  PRON  |  pronoun  |  PRP  |  pronoun, personal
carried  |  VERB  |  verb  |  VBD  |  verb, past tense
biryani  |  ADJ  |  adjective  |  JJ  |  adjective (English), other noun-modifier (Chinese)
masala  |  NOUN  |  noun  |  NN  |  noun, singular or mass
with  |  ADP  |  adposition  |  IN  |  conjunction, subordinating or preposition
him  |  PRON  |  pronoun  |  PRP  |  pronoun, personal


## NER

In [13]:
import spacy

In [14]:
!python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0-py3-none-any.whl (400.7 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.7/400.7 MB[0m [31m6.4 MB/s[0m  [33m0:01:01[0m[0m eta [36m0:00:01[0m[36m0:00:02[0m
[?25hInstalling collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.8.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')


In [15]:
nlp = spacy.load("en_core_web_sm")
nlp

<spacy.lang.en.English at 0x126076d00>

In [16]:
doc = nlp("Donad Trump was President of USA")
doc

Donad Trump was President of USA

In [17]:
doc.ents

(Donad Trump,)

In [18]:
doc = nlp("""
          On Thursday, Ramesh Chaudhari walked into a room of journalists gathered at his Mar-a-Lago estate for a news conference. He didn’t look particularly happy.

His remarks came after a week in which Kamala Harris and her new running mate Tim Walz have dominated media attention, raked in millions of dollars and enjoyed a bump in polling. Trump’s media event seemed more an attempt to win back the spotlight than announce anything new.

Just before Trump stepped up to the podium, one of his advisors texted me the wry assessment that Donald Trump is “never boring!!” (the exclamation marks were his).

The event included a couple of news items. Mr Trump announced that he’d agreed to join a TV debate with Vice President Harris on 10 September. ABC News, the debate host, confirmed that Ms Harris had agreed to participate as well. Trump also said he’d like to do two more debates. There’s no word from the Harris team yet on whether they’ve accepted those additional matchups.
          """)
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Thursday | DATE | Absolute or relative dates or periods
Chaudhari | ORG | Companies, agencies, institutions, etc.
a week | DATE | Absolute or relative dates or periods
Kamala Harris | PERSON | People, including fictional
Tim Walz | PERSON | People, including fictional
millions of dollars | MONEY | Monetary values, including unit
Trump | ORG | Companies, agencies, institutions, etc.
Trump | ORG | Companies, agencies, institutions, etc.
one | CARDINAL | Numerals that do not fall under another type
Donald Trump | PERSON | People, including fictional
Trump | PERSON | People, including fictional
Harris | PERSON | People, including fictional
10 September | DATE | Absolute or relative dates or periods
ABC News | ORG | Companies, agencies, institutions, etc.
Ms Harris | PERSON | People, including fictional
Trump | PERSON | People, including fictional
two | CARDINAL | Numerals that do not fall under another type
Harris | PERSON | People, including fictional


In [19]:
doc = nlp("Tesla Inc is going to acquire Twitter Inc for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", ent.start_char, "|", ent.end_char)

Tesla Inc  |  ORG  |  0 | 9
Twitter Inc  |  PERSON  |  30 | 41
$45 billion  |  MONEY  |  46 | 57


In [20]:
from spacy import displacy
displacy.render(doc, style="ent", jupyter=True)