# NLP Tutorial With SpaCy 
  
  * NLP a form of AI or Artificial Intelligences (Building system that can  do intelligent things).
  * NLP or Natural Language Processing - Building system that can understand everyday language. It is a subset of AI.
  * SpaCy by Explosion.ai (Matthew Honnibal)

![Imgur](https://i.imgur.com/v55ZxW8.png)

## Basic Terms 
* Tokenization:  Segmenting text into words,punctuations marks etc.
* Part-of-speech : (POS) Tagging Assigning word types to tokens,like verb or noun.
* Dependency Parsing: Assigning syntactic dependency labels, describing the relations between individual tokens, like subject or object.
* Lemmatization	: Assigning the base forms of words. For example, the lemma of “was” is “be”, and the lemma of “rats” is “rat”.
* Sentence Boundary Detection (SBD):Finding and segmenting individual sentences.
* Named Entity Recognition (NER): Labelling named “real-world” objects, like persons, companies or locations.
* Similarity:Comparing words, text spans and documents and how similar they are to each other.
* Text Classification:Assigning categories or labels to a whole document, or parts of a document.
* Rule-based Matching:Finding sequences of tokens based on their texts and linguistic annotations, similar to regular expressions.
* Training:Updating and improving a statistical model’s predictions.
* Serialization:Saving objects to files or byte strings.

## Load the Package 

In [2]:
import spacy

In [3]:
nlp = spacy.load("en")

![Imgur](https://i.imgur.com/q4EfY8Z.jpg)

## Reading A Document or Text

In [4]:
docx = nlp("SpaCy is cool tool for nlp")

In [5]:
docx

SpaCy is cool tool for nlp

In [6]:
docx2=nlp(u"SpaCy is an amazing tool like nltk")

In [7]:
docx2

SpaCy is an amazing tool like nltk

# Sentence Tokens 
* Tokenization == Splitting or segmenting the text into sentences or tokens 
* .sent 

#### Word Tokens  
* Splitting or segmenting the text into words 
* .text

In [8]:
docx2

SpaCy is an amazing tool like nltk

In [9]:
#Word tokens
for token in docx2:
    print(token.text)

SpaCy
is
an
amazing
tool
like
nltk


In [10]:
[token.text for token in docx2]

['SpaCy', 'is', 'an', 'amazing', 'tool', 'like', 'nltk']

#### similar to splitting on spaces 

In [11]:
docx2.text.split(" ")

['SpaCy', 'is', 'an', 'amazing', 'tool', 'like', 'nltk']

## More about words 

* .shape_ ==> for shape of word eg. capital,lowercase etc.
* .is_alpha ==> returns boolean(true or false) if word is alphabet.
* .is_stop ==> returns boolean(true or false) if word is a stop word.

In [12]:
docx2 

SpaCy is an amazing tool like nltk

In [15]:
for word in docx2:
    print(word.text,word.shape_)

SpaCy XxxXx
is xx
an xx
amazing xxxx
tool xxxx
like xxxx
nltk xxxx


In [16]:
ex_doc=nlp("Hello hello HELLO HeLLo")

In [18]:
for word in ex_doc:
    print("Token =>",word.text,"  Shape:",word.shape_,"  Alpha =>",word.is_alpha,"  Stop Word =>",word.is_stop)

Token => Hello   Shape: Xxxxx   Alpha => True   Stop Word => False
Token => hello   Shape: xxxx   Alpha => True   Stop Word => False
Token => HELLO   Shape: XXXX   Alpha => True   Stop Word => False
Token => HeLLo   Shape: XxXXx   Alpha => True   Stop Word => False


# Part Of Speech Tagging

* NB attribute_ ==> Returns readable string representation of attribute.
* .pos 
* .pos_ ==> exposes Google Universal pos_tag,simple 
* .tag 
* .tag_ ==> exposes Treebank,detailed,for training your own model 
* * Uses
* * Sentiment analysis, Homonym Disambuguity, Prediction 

In [23]:
doc = nlp("He drinks a drink")

In [26]:
for word in doc:
    print("Word : " , word.text, "," "   Part of Speech : ", word.pos_ )

Word :  He ,   Part of Speech :  PRON
Word :  drinks ,   Part of Speech :  VERB
Word :  a ,   Part of Speech :  DET
Word :  drink ,   Part of Speech :  NOUN


In [27]:
doc1=nlp("I fish a fish")

In [29]:
for word in doc1:
    print("Word : " , word.text, "," "   Part of Speech : ", word.pos_ , "  ", "Tag : ", word.tag_)

Word :  I ,   Part of Speech :  PRON    Tag :  PRP
Word :  fish ,   Part of Speech :  VERB    Tag :  VBP
Word :  a ,   Part of Speech :  DET    Tag :  DT
Word :  fish ,   Part of Speech :  NOUN    Tag :  NN


### If you want to know meaning of the pos abbreviation 

* spacy.explain('NN')

In [30]:
spacy.explain('NN')

'noun, singular or mass'

In [33]:
ex1=nlp(u"All the faith he had had had no effect on the outcome of his life")

In [35]:
for word in ex1:
    print(("Word : " , word.text , "Tag : ", word.tag_ , "Part of Speech : ",word.pos_))

('Word : ', 'All', 'Tag : ', 'PDT', 'Part of Speech : ', 'DET')
('Word : ', 'the', 'Tag : ', 'DT', 'Part of Speech : ', 'DET')
('Word : ', 'faith', 'Tag : ', 'NN', 'Part of Speech : ', 'NOUN')
('Word : ', 'he', 'Tag : ', 'PRP', 'Part of Speech : ', 'PRON')
('Word : ', 'had', 'Tag : ', 'VBD', 'Part of Speech : ', 'VERB')
('Word : ', 'had', 'Tag : ', 'VBN', 'Part of Speech : ', 'VERB')
('Word : ', 'had', 'Tag : ', 'VBN', 'Part of Speech : ', 'VERB')
('Word : ', 'no', 'Tag : ', 'DT', 'Part of Speech : ', 'DET')
('Word : ', 'effect', 'Tag : ', 'NN', 'Part of Speech : ', 'NOUN')
('Word : ', 'on', 'Tag : ', 'IN', 'Part of Speech : ', 'ADP')
('Word : ', 'the', 'Tag : ', 'DT', 'Part of Speech : ', 'DET')
('Word : ', 'outcome', 'Tag : ', 'NN', 'Part of Speech : ', 'NOUN')
('Word : ', 'of', 'Tag : ', 'IN', 'Part of Speech : ', 'ADP')
('Word : ', 'his', 'Tag : ', 'PRP$', 'Part of Speech : ', 'DET')
('Word : ', 'life', 'Tag : ', 'NN', 'Part of Speech : ', 'NOUN')


### Syntactic Dependency 

* It helps us to know the relation between tokens

In [48]:
ex3 = nlp("Sally likes Sam")

In [49]:
for word in ex3:
    print(("Word : " , word.text , "Tag : ", word.tag_ , "Part of Speech : ",word.pos_, " Dependency :",word.dep_))

('Word : ', 'Sally', 'Tag : ', 'NNP', 'Part of Speech : ', 'PROPN', ' Dependency :', 'nsubj')
('Word : ', 'likes', 'Tag : ', 'VBZ', 'Part of Speech : ', 'VERB', ' Dependency :', 'ROOT')
('Word : ', 'Sam', 'Tag : ', 'NNP', 'Part of Speech : ', 'PROPN', ' Dependency :', 'dobj')


In [50]:
spacy.explain('nsubj')

'nominal subject'

# Visualizing Dependency using displaCy

* from spacy import displacy
* displacy.serve()
* displacy.render(jupyter=True) # for jupyter notebook.

In [51]:
from spacy import displacy

In [52]:
displacy.render(ex3,style='dep')

#  Thanks for reading this notebook.Keep In Touch With Us.Like Our Page [Quantum.ai](https://www.facebook.com/Quantumaibd)