# Spacy
-----------
- SpaCy is a NLP library similar to gensim, with different implementations
- Focus on creating NLP pipelines to generate models and corpora
- Open-source, with extra libraries and tools . example: `Displacy entity recognition `visualizer
- Easy pipeline creation
- Different entity types compared to nltk
- Easily find entities in Tweets and chat messages
- Quickly growing package!
- Verify 
    - https://demos.explosion.ai/displacy/
    - https://demos.explosion.ai/displacy-ent/



## Exercise 1: SpaCy for NER

In [2]:
# First time users may need to execute in anaconda cmd prompt below two lines
# conda install -c conda-forge spacy
# python -m spacy download en_core_web_sm
import spacy
nlp = spacy.load('en_core_web_sm')

In [3]:
type(nlp)

spacy.lang.en.English

In [4]:
# nlp.entity
doc = nlp("""Ram Reddy is a mentor,working for Google and staying in India""")

In [5]:
type(doc)

spacy.tokens.doc.Doc

In [6]:
doc.ents

(Ram Reddy, Google, India)

In [7]:
print(doc.ents[0], doc.ents[0].label_)

Ram Reddy PERSON


In [8]:
print(doc.ents[1], doc.ents[1].label_)


Google ORG


In [9]:
print(doc.ents[2], doc.ents[2].label_)

India GPE


# Exercise 2: Comparing NLTK with spaCy NER

In [10]:
# import os 
# os.chdir("C:\\Users\\Hi\\Google Drive\\01 Data Science Lab Copy\\02 Lab Data\\Python")
# Load a scraped news article
article = open('nlp_ner.txt', 'r').read()

In [11]:
# Import spacy
import spacy
# Instantiate the English model: nlp
nlp = spacy.load('en_core_web_sm', tagger=False, parser=False, matcher=False)
# Create a new document: doc
doc = nlp(article)
# Print all of the found entities and their labels
for ent in doc.ents:
    print(ent.label_, ent.text)


GPE Uber
PERSON Uber
ORG Apple
PERSON Uber
PERSON Uber
GPE Travis Kalanick
GPE Uber
PERSON Tim Cook
ORG Apple
ORG Uber’s
LOC Silicon Valley’s
ORG Yahoo
PERSON Marissa Mayer
MONEY 186


> Which are the extra categories that spacy uses compared to nltk in its named-entity recognition?
    - NORP, CARDINAL, MONEY, WORK OF ART, LANGUAGE, EVENT


> Home Work: Do more examples from [spacy website](https://spacy.io/) . Please find some sample example below

## Exercise 3: From Spacy website 

In [12]:
# pip install spacy
# python -m spacy download en_core_web_sm

import spacy

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")

# Process whole documents
text = ("When Sebastian Thrun started working on self-driving cars at "
        "Google in 2007, few people outside of the company took him "
        "seriously. “I can tell you very senior CEOs of major American "
        "car companies would shake my hand and turn away because I wasn’t "
        "worth talking to,” said Thrun, in an interview with Recode earlier "
        "this week.")
doc = nlp(text)

In [13]:
type(nlp)

spacy.lang.en.English

In [14]:
type(doc)

spacy.tokens.doc.Doc

In [15]:
# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])

Noun phrases: ['Sebastian Thrun', 'self-driving cars', 'Google', 'few people', 'the company', 'him', 'I', 'you', 'very senior CEOs', 'major American car companies', 'my hand', 'I', 'Thrun', 'an interview', 'Recode']


In [16]:
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])

Verbs: ['start', 'work', 'drive', 'take', 'can', 'tell', 'would', 'shake', 'turn', 'be', 'talk', 'say']


In [17]:
# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)

Sebastian Thrun PERSON
Google ORG
2007 DATE
American NORP
Thrun PERSON
Recode ORG
earlier this week DATE


## Example 4: Understand POS

In [19]:
doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion')

print(" text","|", " lemma_", "|"," pos_", "|"," tag_","|", " dep_","|",
            " shape_","|", " is_alpha", "|"," is_stop")
for token in doc:
    print(token.text,"|", token.lemma_, "|",token.pos_, "|",token.tag_,"|", token.dep_,"|",
            token.shape_,"|", token.is_alpha, "|",token.is_stop)

 text |  lemma_ |  pos_ |  tag_ |  dep_ |  shape_ |  is_alpha |  is_stop
Apple | Apple | PROPN | NNP | nsubj | Xxxxx | True | False
is | be | VERB | VBZ | aux | xx | True | True
looking | look | VERB | VBG | ROOT | xxxx | True | False
at | at | ADP | IN | prep | xx | True | True
buying | buy | VERB | VBG | pcomp | xxxx | True | False
U.K. | U.K. | PROPN | NNP | compound | X.X. | False | False
startup | startup | NOUN | NN | dobj | xxxx | True | False
for | for | ADP | IN | prep | xxx | True | True
$ | $ | SYM | $ | quantmod | $ | False | False
1 | 1 | NUM | CD | compound | d | False | False
billion | billion | NUM | CD | pobj | xxxx | True | False


## Example 5: Understand displacy Visualization

In [4]:
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion")
displacy.render(doc, style="dep")

## Example 6: Understand displacy Visualization

In [5]:
from spacy import displacy

text = """But Google is starting from behind. The company made a late push
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s Alexa
software, which runs on its Echo and Dot devices, have clear leads in
consumer adoption."""

nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
displacy.render(doc, style="ent")

In [6]:
from spacy import displacy


doc = nlp(u"Rats are various medium-sized, long-tailed rodents.")
displacy.render(doc, style="dep")


`Home Work:` Today home work for the people who are serious to learn

    1. do all 4 chapters https://course.spacy.io/chapter1
    
    2. go to https://spacy.io/universe and explore any one project