## Getting started with Spacy

Installation instructions

```
pip install spacy
python -m spacy download en
pip install nltk
```

In [1]:
!python -m spacy download en

[38;5;3m⚠ As of spaCy v3.0, shortcuts like 'en' are deprecated. Please use the
full pipeline package name 'en_core_web_sm' instead.[0m
Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.7.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [18]:
import spacy

In [28]:
# tokenization: Sentence & Word Tokenization In Spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("British Airways will drop its flights to Beijing from October. It feels the impact of being banned from Russian airspace.")

In [21]:
for sentence in doc.sents:
    print(sentence)

British Airways will drop its flights to Beijing from October.
It feels the impact of being banned from Russian airspace.


In [22]:
for sentence in doc.sents:
    for word in sentence:
        print(word)

British
Airways
will
drop
its
flights
to
Beijing
from
October
.
It
feels
the
impact
of
being
banned
from
Russian
airspace
.


## Pipeline

https://spacy.io/usage/processing-pipelines#pipelines


In [23]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [7]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x74c95c59dc10>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x74c95c59dd30>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x74c95c56f760>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x74c95c587c50>),
 ('lemmatizer',
  <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x74c95c595090>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x74c95c56f920>)]

In [24]:
# build blank pipeline and add new at run time
source_nlp = spacy.load("en_core_web_sm")

nlp = spacy.blank("en")
nlp.pipe_names

[]

In [25]:

nlp.add_pipe("ner", source=source_nlp)
nlp.pipe_names

['ner']

## POS Tagging

In [32]:
doc = nlp("Captain america ate 100$ of Burger. Then he said I can do this all day.")

for token in doc:
    print(token, " | ", token.pos_, spacy.explain(token.pos_), " | ", token.lemma_)

Captain  |  PROPN proper noun  |  Captain
america  |  PROPN proper noun  |  america
ate  |  VERB verb  |  eat
100  |  NUM numeral  |  100
$  |  NUM numeral  |  $
of  |  ADP adposition  |  of
Burger  |  PROPN proper noun  |  Burger
.  |  PUNCT punctuation  |  .
Then  |  ADV adverb  |  then
he  |  PRON pronoun  |  he
said  |  VERB verb  |  say
I  |  PRON pronoun  |  I
can  |  AUX auxiliary  |  can
do  |  VERB verb  |  do
this  |  PRON pronoun  |  this
all  |  DET determiner  |  all
day  |  NOUN noun  |  day
.  |  PUNCT punctuation  |  .


In [33]:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Elon flew to mars yesterday. He carried biryani masala with him")

for token in doc:
    # print(token," | ", token.pos_, " | ", spacy.explain(token.pos_))
    print(token," | ", token.pos_, " | ", 
          spacy.explain(token.pos_), " | ", token.tag_, " | ",
            spacy.explain(token.tag_))

Elon  |  PROPN  |  proper noun  |  NNP  |  noun, proper singular
flew  |  VERB  |  verb  |  VBD  |  verb, past tense
to  |  ADP  |  adposition  |  IN  |  conjunction, subordinating or preposition
mars  |  NOUN  |  noun  |  NNS  |  noun, plural
yesterday  |  NOUN  |  noun  |  NN  |  noun, singular or mass
.  |  PUNCT  |  punctuation  |  .  |  punctuation mark, sentence closer
He  |  PRON  |  pronoun  |  PRP  |  pronoun, personal
carried  |  VERB  |  verb  |  VBD  |  verb, past tense
biryani  |  ADJ  |  adjective  |  JJ  |  adjective (English), other noun-modifier (Chinese)
masala  |  NOUN  |  noun  |  NN  |  noun, singular or mass
with  |  ADP  |  adposition  |  IN  |  conjunction, subordinating or preposition
him  |  PRON  |  pronoun  |  PRP  |  pronoun, personal


## NER

In [34]:
import spacy

In [35]:
!python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl (587.7 MB)
[2K     [91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.1/587.7 MB[0m [31m660.0 kB/s[0m eta [36m0:14:28[0m^C
[2K     [91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.1/587.7 MB[0m [31m660.0 kB/s[0m eta [36m0:14:28[0m
[?25h[31mERROR: Operation cancelled by user[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[31mAborted.[0m


In [42]:
nlp = spacy.load("en_core_web_sm")
nlp

<spacy.lang.en.English at 0x74c91fedfa70>

In [37]:
doc = nlp("Donad Trump was President of USA")
doc

Donad Trump was President of USA

In [38]:
doc.ents

(Donad Trump, USA)

In [43]:
doc = nlp("""
          On Thursday, Ramesh Chaudhari walked into a room of journalists gathered at his Mar-a-Lago estate for a news conference. He didn’t look particularly happy.

His remarks came after a week in which Kamala Harris and her new running mate Tim Walz have dominated media attention, raked in millions of dollars and enjoyed a bump in polling. Trump’s media event seemed more an attempt to win back the spotlight than announce anything new.

Just before Trump stepped up to the podium, one of his advisors texted me the wry assessment that Donald Trump is “never boring!!” (the exclamation marks were his).

The event included a couple of news items. Mr Trump announced that he’d agreed to join a TV debate with Vice President Harris on 10 September. ABC News, the debate host, confirmed that Ms Harris had agreed to participate as well. Trump also said he’d like to do two more debates. There’s no word from the Harris team yet on whether they’ve accepted those additional matchups.
          """)
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Thursday | DATE | Absolute or relative dates or periods
Chaudhari | ORG | Companies, agencies, institutions, etc.
a week | DATE | Absolute or relative dates or periods
Kamala Harris | PERSON | People, including fictional
Tim Walz | PERSON | People, including fictional
millions of dollars | MONEY | Monetary values, including unit
Trump | ORG | Companies, agencies, institutions, etc.
Trump | ORG | Companies, agencies, institutions, etc.
one | CARDINAL | Numerals that do not fall under another type
Donald Trump | PERSON | People, including fictional
Trump | PERSON | People, including fictional
Harris | PERSON | People, including fictional
10 September | DATE | Absolute or relative dates or periods
ABC News | ORG | Companies, agencies, institutions, etc.
Ms Harris | PERSON | People, including fictional
Trump | ORG | Companies, agencies, institutions, etc.
two | CARDINAL | Numerals that do not fall under another type
Harris | PERSON | People, including fictional


In [16]:
doc = nlp("Tesla Inc is going to acquire Twitter Inc for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", ent.start_char, "|", ent.end_char)

Tesla Inc  |  ORG  |  0 | 9
Twitter Inc  |  ORG  |  30 | 41
$45 billion  |  MONEY  |  46 | 57


In [17]:
from spacy import displacy
displacy.render(doc, style="ent", jupyter=True)