## Introduction to spaCy
##### spaCy: spaCy is a free and open-source library for advanced Natural Language Processing in Python. It is written in cython language (C extension of python) which gives C like performance to Python. it is fast and provides concise API's to access ots methods and properties

#### Some of the main features of spaCy are given below:
##### (1) Tokenization  (2) Part-Of-Speech Tagging  (3) Named-Entity Recognition  (4) Text Classification  (5) Lemmatization  (6) Dependency Parsing  (7) Sentence Boundary Detection  (8) Similarity

In [1]:
# To install spaCy in anaconda
# conda install -c conda-forge spacy

# To install spaCy models that contains language vocabularies, trained vectors, syntaxes and entities
# python -m spacy download en

##### Tokenization with spaCy

In [2]:
# To import spaCy and load the model
import spacy

nlp = spacy.load("en_core_web_sm")

In [3]:
# To create a processed nlp object on a string
doc = nlp("""Discover when, where, and how we collect 360 imagery. Meet Google's colorful Street View fleet and learn how we collect 360 imagery to power the world map.""")


In [4]:
for token in doc:
  print(token.text)

Discover
when
,
where
,
and
how
we
collect
360
imagery
.
Meet
Google
's
colorful
Street
View
fleet
and
learn
how
we
collect
360
imagery
to
power
the
world
map
.


##### Lemmatization with spaCy

In [6]:
# To create a processed nlp object on a string
docs = nlp("""Because of factors outside our control (weather, road closures, etc), it is always possible that our cars may not be operating, or that slight changes may occur. Please also be aware that where the list specifies a particular city, this may include smaller cities and towns that are within driving distance.""")

# To print tokens with their respective lemmas
for lemma in docs:
  print(lemma.text, lemma.lemma_)

Because because
of of
factors factor
outside outside
our our
control control
( (
weather weather
, ,
road road
closures closure
, ,
etc etc
) )
, ,
it it
is be
always always
possible possible
that that
our our
cars car
may may
not not
be be
operating operate
, ,
or or
that that
slight slight
changes change
may may
occur occur
. .
Please please
also also
be be
aware aware
that that
where where
the the
list list
specifies specify
a a
particular particular
city city
, ,
this this
may may
include include
smaller small
cities city
and and
towns town
that that
are be
within within
driving driving
distance distance
. .


##### POS tagging with spaCy

In [8]:
# To print pos tags
for tok in docs:
  print(tok.text+" ---", tok.pos_+" ---", tok.tag_)

Because --- SCONJ --- IN
of --- ADP --- IN
factors --- NOUN --- NNS
outside --- ADP --- IN
our --- PRON --- PRP$
control --- NOUN --- NN
( --- PUNCT --- -LRB-
weather --- NOUN --- NN
, --- PUNCT --- ,
road --- NOUN --- NN
closures --- NOUN --- NNS
, --- PUNCT --- ,
etc --- X --- FW
) --- PUNCT --- -RRB-
, --- PUNCT --- ,
it --- PRON --- PRP
is --- AUX --- VBZ
always --- ADV --- RB
possible --- ADJ --- JJ
that --- SCONJ --- IN
our --- PRON --- PRP$
cars --- NOUN --- NNS
may --- AUX --- MD
not --- PART --- RB
be --- AUX --- VB
operating --- VERB --- VBG
, --- PUNCT --- ,
or --- CCONJ --- CC
that --- SCONJ --- IN
slight --- ADJ --- JJ
changes --- NOUN --- NNS
may --- AUX --- MD
occur --- VERB --- VB
. --- PUNCT --- .
Please --- INTJ --- UH
also --- ADV --- RB
be --- AUX --- VB
aware --- ADJ --- JJ
that --- SCONJ --- IN
where --- SCONJ --- WRB
the --- DET --- DT
list --- NOUN --- NN
specifies --- VERB --- VBZ
a --- DET --- DT
particular --- ADJ --- JJ
city --- NOUN --- NN
, --- PUNCT --- ,

In [11]:
# To use visualizers for pos tags
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")

doc = nlp("Natural Language Processing is Fun!")

displacy.render(doc, style='dep')

'<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" id="3e3cddbc3ec0460dac3d3ee3d42548bd-0" class="displacy" width="925" height="224.5" direction="ltr" style="max-width: none; height: 224.5px; color: #000000; background: #ffffff; font-family: Arial; direction: ltr">\n<text class="displacy-token" fill="currentColor" text-anchor="middle" y="134.5">\n    <tspan class="displacy-word" fill="currentColor" x="50">Natural</tspan>\n    <tspan class="displacy-tag" dy="2em" fill="currentColor" x="50">PROPN</tspan>\n</text>\n\n<text class="displacy-token" fill="currentColor" text-anchor="middle" y="134.5">\n    <tspan class="displacy-word" fill="currentColor" x="225">Language</tspan>\n    <tspan class="displacy-tag" dy="2em" fill="currentColor" x="225">PROPN</tspan>\n</text>\n\n<text class="displacy-token" fill="currentColor" text-anchor="middle" y="134.5">\n    <tspan class="displacy-word" fill="currentColor" x="400">Processing</tspan>\n    <tspan cla

##### Named Entity Recognition with spaCy

In [12]:
# To print named entities
import spacy

nlp = spacy.load("en_core_web_sm")

doc = nlp("""Tokyo, Japan, is the largest city on Earth, with a population of 37.4 million people, which is over four times the population of New York City, USA. In total, the Japanese metropolis covers an area of 13,452km2.""")

for ent in doc.ents:
  print(ent.text+" ---", ent.label_)

Tokyo --- GPE
Japan --- GPE
Earth --- LOC
37.4 million --- CARDINAL
over four --- CARDINAL
New York City --- GPE
USA --- GPE
Japanese --- NORP


In [13]:
# To use visualizers for NER
import spacy
from spacy import displacy

doc = nlp("""Tokyo, Japan, is the largest city on Earth, with a population of 37.4 million people, which is over four times the population of New York City, USA. In total, the Japanese metropolis covers an area of 13,452km2.""")

displacy.render(doc, style="ent")

'<div class="entities" style="line-height: 2.5; direction: ltr">\n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Tokyo\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n, \n<mark class="entity" style="background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Japan\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">GPE</span>\n</mark>\n, is the largest city on \n<mark class="entity" style="background: #ff9561; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Earth\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">LOC</span>\n</mark