### Spacy Basics

In [1]:
import spacy

In [2]:
# After importing we need to load english library to use them

nlp = spacy.load('en_core_web_sm')

In [7]:
doc = nlp(u'Microsoft is going to acquire a startup company at U.K. for $2 million to conquer the digital market')

In [8]:
# pos - part of speach - like adjacetive, noun, proper noun etc

for token in doc:
  print(token.text, token.pos, token.pos_)

Microsoft 96 PROPN
is 87 AUX
going 100 VERB
to 94 PART
acquire 100 VERB
a 90 DET
startup 84 ADJ
company 92 NOUN
at 85 ADP
U.K. 96 PROPN
for 85 ADP
$ 99 SYM
2 93 NUM
million 93 NUM
to 94 PART
conquer 100 VERB
the 90 DET
digital 84 ADJ
market 92 NOUN


In [11]:
# How Spacy is able to distinguish the things is beacuse of the pipeline it has.
# SPacy pipeline has following things.
# It is able to distinguish between symbol, noun, adjactive, company name etc

nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x7fc476d63440>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x7fc476d63c20>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x7fc476afa4d0>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x7fc476a4c8c0>),
 ('lemmatizer',
  <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x7fc476a53e10>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x7fc476afa7d0>)]

### Tokenization


Process of breaking the initial sentence into its component pieces (Tokens) is known as Toeknization.

In [14]:
# Spacy was able to recognize email id and website

mystring = '"If you have any query send mail to our_mail@gamil.com, or visit us at https://www.our_site.com"'

In [15]:
doc2 = nlp(mystring)
for token in doc2:
  print(token.text)

"
If
you
have
any
query
send
mail
to
our_mail@gamil.com
,
or
visit
us
at
https://www.our_site.com
"


In [16]:
# SPacy can recognize named entity

doc3 = nlp(u"Apple is going to build a new office worth $6 million at United Kingdom!")

In [20]:
# Spacy is able to identify the named entity and their classification
# If we want to know the entity classification then we use "label_"
# if we want to know the classification description then use "spacy.explain"

for entity in doc3.ents:
  print(entity)
  print(entity.label_)
  print(str(spacy.explain(entity.label_)))
  print("\n")

Apple
ORG
Companies, agencies, institutions, etc.


$6 million
MONEY
Monetary values, including unit


United Kingdom
GPE
Countries, cities, states




#### Tokenization Visualization

In [24]:
doc5 =  nlp(u"I am going to get into Bloomberg or Natwest group with average salary of more than 50000 per year")

In [22]:
# It is the library for visualization

from spacy import displacy

In [27]:
displacy.render(doc5, style='dep', jupyter=True)

In [28]:
displacy.render(doc5, style='ent', jupyter=True)

In [29]:
# If we want to see vizualization in a browser then we use "serve"

displacy.serve(doc5,style='dep')


Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.
