# Getting started with Spacy

Spacy provides a production grade, high performance NLP library, that integrates well with the HuggingFace transformers.

In this notebook, we will introduce ourselves to the basics of the spacy library.

## Installation

The installation of ``spacy`` is straightforward; after that, let us import a lightweight language model for English. 


In [None]:
# Install spacy if it does not exist
! pip install -U spacy

In [1]:
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')

In [2]:
type (nlp)

spacy.lang.en.English

In [48]:
doc = nlp(u'I loved the ML400 workshop at SupportVectors! AND THE COW named Speedy JUMPED over the moon excitedly, and ran over the grass soon after.')

In [49]:
type(doc)

spacy.tokens.doc.Doc

In [50]:
print ([w.text for w in doc]) 

['I', 'loved', 'the', 'ML400', 'workshop', 'at', 'SupportVectors', '!', 'AND', 'THE', 'COW', 'named', 'Speedy', 'JUMPED', 'over', 'the', 'moon', 'excitedly', ',', 'and', 'ran', 'over', 'the', 'grass', 'soon', 'after', '.']


## Lemmatization

In [51]:
for token in doc:
    print (f'{token.text:>15} : {token.lemma_}')

              I : -PRON-
          loved : love
            the : the
          ML400 : ml400
       workshop : workshop
             at : at
 SupportVectors : SupportVectors
              ! : !
            AND : and
            THE : the
            COW : cow
          named : name
         Speedy : Speedy
         JUMPED : jump
           over : over
            the : the
           moon : moon
      excitedly : excitedly
              , : ,
            and : and
            ran : run
           over : over
            the : the
          grass : grass
           soon : soon
          after : after
              . : .


# Part of speech tagging

In [52]:
for token in doc:
    print (f'{token.text:<15} : {token.lemma_:>15} : {token.pos_:>15} : {token.tag_:>15}')

I               :          -PRON- :            PRON :             PRP
loved           :            love :            VERB :             VBD
the             :             the :             DET :              DT
ML400           :           ml400 :            NOUN :              NN
workshop        :        workshop :            NOUN :              NN
at              :              at :             ADP :              IN
SupportVectors  :  SupportVectors :           PROPN :             NNP
!               :               ! :           PUNCT :               .
AND             :             and :           CCONJ :              CC
THE             :             the :             DET :              DT
COW             :             cow :            NOUN :              NN
named           :            name :            VERB :             VBN
Speedy          :          Speedy :           PROPN :             NNP
JUMPED          :            jump :            VERB :             VBD
over            :   

# Visualization
We can visualize the structure of a text with the following:

## DEPENDENCY DIAGRAM:

In [54]:
# text = u'Let us learn NLP deeply at SupportVectors.'
text = u'The cow jumped over the moon.'
doc  = nlp(text)
displacy.render(doc, 
                style='dep',
                jupyter=True,
                options={'distance':120})

In [59]:
text = u'We could never have loved the earth so well if we had had no childhood in it.'
options={'compact':True, 'bg':'black', 'font':'garamard', 'color': 'wheat', 'distance':50}
doc  = nlp(text)
displacy.render(doc, 
                style='dep',
                jupyter=True,
                options=options)

## NAME ENTITY RECOGNITION

In [57]:
text = u'Let us learn natural language processing deeply at SupportVectors. This Silicon Valley workshop is a great place to start.'
doc  = nlp(text)
displacy.render(doc, style='ent')

In [None]:
# Using this for a simple classifier

1. We will first gather all the nouns, and remove the named entities
2. Create a word vector over the list of nouns
3. Then find the nearest neighboring subject.
4. Or a full-blown classifier from the word-vector of nouns (either concatenated, or averaged)


In [5]:
doc = nlp(u'The cow named Speedy made a speedy jump over the fence.')
options={'compact':True, 'bg':'black', 'font':'garamard', 'color': 'wheat', 'distance':60}
displacy.render(doc, style='dep', options=options)

In [9]:
doc = nlp(u' The cow named Speedy did a speedy jump over the fence.')
for token in doc:
    print (f'{token.text:<15} : {token.lemma_:>15} : {token.pos_:>15} : {token.tag_:>15}')

                :                 :           SPACE :             _SP
The             :             the :             DET :              DT
cow             :             cow :            NOUN :              NN
named           :            name :            VERB :             VBN
Speedy          :          Speedy :           PROPN :             NNP
did             :              do :             AUX :             VBD
a               :               a :             DET :              DT
speedy          :          speedy :             ADJ :              JJ
jump            :            jump :            NOUN :              NN
over            :            over :             ADP :              IN
the             :             the :             DET :              DT
fence           :           fence :            NOUN :              NN
.               :               . :           PUNCT :               .


In [10]:
text = u'I have been working hard towards mastering NLP, having worked previously on Deep Learning fundamentals'

doc = nlp(text)
for token in doc:
    print (f'{token.text:<15} : {token.lemma_:>15} : {token.pos_:>15} : {token.tag_:>15}')




I               :          -PRON- :            PRON :             PRP
have            :            have :             AUX :             VBP
been            :              be :             AUX :             VBN
working         :            work :            VERB :             VBG
hard            :            hard :             ADV :              RB
towards         :         towards :             ADP :              IN
mastering       :          master :            VERB :             VBG
NLP             :             NLP :           PROPN :             NNP
,               :               , :           PUNCT :               ,
having          :            have :            VERB :             VBG
worked          :            work :            VERB :             VBN
previously      :      previously :             ADV :              RB
on              :              on :             ADP :              IN
Deep            :            Deep :           PROPN :             NNP
Learning        :   