<a href="https://colab.research.google.com/github/mgiardinelli/spacy-3.0-playground/blob/main/spacy_3_0_Playground.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Explore and better understand spaCy (3.0)


In [2]:
# Install dependencies
!pip install spacy
!pip install pandas



In [5]:
# Download english model
# Other models - https://spacy.io/models/en
!python -m spacy download en_core_web_sm

Collecting en_core_web_sm==2.2.5
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz (12.0 MB)
[K     |████████████████████████████████| 12.0 MB 1.5 MB/s 
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')


Spacy NER (Named Entity Recognition) - extract entities from text

In [27]:
sample_text = '''Family vacations are great. It allows you to spend time away from work, relax, and have fun. We really like playing pickleball, tennis, riding bikes, playing games, and just hanging
out with each other. We recently went to Suncadia resort, which is by Snoqualmie Pass and the Cascade Mountains.  It was so nice being in the trees, enjoying the cooler temperatures, and slower pace life. It was really nice sitting out at the firepit,
out by the big boulders. We saw a deer, lots of different birds, and of course some funny squirrles.'''

In [28]:
#NER
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(sample_text)

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Suncadia 223 231 GPE
Snoqualmie Pass 252 267 ORG
the Cascade Mountains 272 293 LOC


In [29]:
#http://ner.pythonhumanities.com/04_01_spaCy_Entity_Ruler.html
# NER Annotations - https://spacy.io/api/annotation#named-entities
#Import the requisite library
import spacy

#Build upon the spaCy Small Model
nlp = spacy.load("en_core_web_sm")

#import the spaCy EntityRuler class
from spacy.pipeline import EntityRuler

#create the ruler with the ability to overwrite entities
ruler = EntityRuler(nlp, overwrite_ents=True)

#List of Entities and Patterns
patterns = [{"label": "SPORTS", "pattern": [{"LOWER": "tennis"}]},
            {"label": "SPORTS", "pattern": [{"LOWER": "pickelball"}]},
            {"label": "SPORTS", "pattern": [{"LOWER": "riding"}, {"LOWER": "bikes"}]},
            {"label": "LOC", "pattern": [{"LOWER": "snoqualmie"}, {"LOWER": "pass"}]},
            {"label": "LOC", "pattern": [{"LOWER": "by"}, {"LOWER": "the"}, {"LOWER": "big"}, {"LOWER": "boulders"}]}]

#add patterns to ruler
ruler.add_patterns(patterns)


#add the pipe
nlp.add_pipe(ruler)

#create the doc
doc = nlp(sample_text)

#extract entities
for ent in doc.ents:
    print (ent.text, ent.label_)


tennis SPORTS
riding bikes SPORTS
Suncadia GPE
Snoqualmie Pass LOC
the Cascade Mountains LOC
by the big boulders LOC


In [13]:
# POS
import spacy

nlp = spacy.load("en_core_web_sm")

for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
            token.shape_, token.is_alpha, token.is_stop)

Family family NOUN NN compound Xxxxx True False
vacations vacation NOUN NNS nsubj xxxx True False
are be AUX VBP ROOT xxx True True
great great ADJ JJ acomp xxxx True False
. . PUNCT . punct . False False
It -PRON- PRON PRP nsubj Xx True True
allows allow VERB VBZ ROOT xxxx True False
you -PRON- PRON PRP nsubj xxx True True
to to PART TO aux xx True True
spend spend VERB VB ccomp xxxx True False
time time NOUN NN dobj xxxx True False
away away ADV RB advmod xxxx True False
from from ADP IN prep xxxx True True
work work NOUN NN pobj xxxx True False
, , PUNCT , punct , False False
relax relax VERB VB conj xxxx True False
, , PUNCT , punct , False False
and and CCONJ CC cc xxx True True
have have AUX VBP conj xxxx True True
fun fun NOUN NN dobj xxx True False
. . PUNCT . punct . False False
We -PRON- PRON PRP nsubj Xx True True
really really ADV RB advmod xxxx True True
like like VERB VBP ROOT xxxx True False
playing play VERB VBG xcomp xxxx True False
pickleball pickleball NOUN NN dobj x