# Spacy Language Processing Pipelines Tutorial


<b>Blank nlp pipeline</b>


In [22]:
import spacy

nlp = spacy.blank("en")

doc = nlp("Captain america ate 100$ of samosa. Then he said I can do this all day.")

for token in doc:
    print(token)

Captain
america
ate
100
$
of
samosa
.
Then
he
said
I
can
do
this
all
day
.


In [23]:
nlp.pipe_names

[]

<b>Download trained pipeline</b>

To download trained pipeline use a command such as,

python -m spacy download en_core_web_sm

This downloads the small (sm) pipeline for english language

Further instructions on : https://spacy.io/usage/models#quickstart

In [24]:
nlp = spacy.load("en_core_web_sm")
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [25]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x14015ebd0>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x14015df70>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x14069bd80>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x14389b3d0>),
 ('lemmatizer', <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x143e239d0>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x14069be60>)]

In [28]:
doc = nlp("Captain america ate 100$ of samosa. Then he said I can do this all day.")

for token in doc:
    print(token, " | ", spacy.explain(token.pos_), " | ", token.lemma_)

Captain  |  proper noun  |  Captain
america  |  proper noun  |  america
ate  |  verb  |  eat
100  |  numeral  |  100
$  |  numeral  |  $
of  |  adposition  |  of
samosa  |  proper noun  |  samosa
.  |  punctuation  |  .
Then  |  adverb  |  then
he  |  pronoun  |  he
said  |  verb  |  say
I  |  pronoun  |  I
can  |  auxiliary  |  can
do  |  verb  |  do
this  |  pronoun  |  this
all  |  determiner  |  all
day  |  noun  |  day
.  |  punctuation  |  .


<b>spacy.explain(token.pos_):</b> The explanation of the part of speech (POS) tag assigned to the token. The pos_ attribute of the token provides the POS tag, and spacy.explain() function gives a human-readable explanation of that tag.

<b>token.lemma_: </b> The lemma of the token. The lemma is the base or dictionary form of a word.

# Named Entity Recognition


In [37]:
doc = nlp("Tesla Inc is going to acquire twitter for $45 billion")
for ent in doc.ents:
    print(f'{ent.text}|| {ent.label_}')

Tesla Inc|| ORG
$45 billion|| MONEY



<b>Entity Iteration:</b> The code then iterates through each named entity identified in the processed document (doc) using a for loop: for ent in doc.ents:

Printing Entity Information: For each entity, the code prints out two pieces of information:

<b>ent.text:</b> The actual text span of the identified entity.

<b>ent.label_:</b> The label assigned to the entity. This label represents the type or category of the entity, such as PERSON, ORGANIZATION, DATE, MONEY, etc.

In [41]:
from spacy import displacy

displacy.render(doc, style="ent")

# <b>Adding a component to a blank pipeline</b>


In [45]:
nlp = spacy.blank("en")
nlp.add_pipe("ner", source=source_nlp)
nlp.pipe_names

['ner']

In [46]:
doc = nlp("Tesla Inc is going to acquire twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, ent.label_)

Tesla Inc ORG
$45 billion MONEY
