<a href="https://colab.research.google.com/github/robinkm0610/NLP_dump/blob/main/Basics_of_spaCy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Basics of Spacy

In [8]:
import spacy
from spacy.cli import download



In [9]:
download('en_core_web_sm')


[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [10]:
nlp = spacy.load('en_core_web_sm')

In [11]:
#Reading a Text Document
doc1=nlp("We are learning spaCy")
doc1

We are learning spaCy

In [12]:
file1 = """

Hello Guys,

We are learning spaCy , a cool nlp library.

Some of its Features are:-

Easy deep learning integration.
Non-destructive tokenization.
Export to numpy data arrays.
Named entity recognition.
Support for 51+ languages.
Pre-trained word vectors.
State-of-the-art speed.
Part-of-speech tagging.
Robust, rigorously evaluated accuracy and many more
"""

Sentence Tokenization


In [13]:
file1 = nlp(file1)

In [14]:
for num,sentence in enumerate(file1.sents):
    print(f'{num}:{sentence}')

0:

Hello Guys,

We are learning spaCy , a cool nlp library.


1:Some of its Features are:-

Easy deep learning integration.

2:Non-destructive tokenization.

3:Export to numpy data arrays.

4:Named entity recognition.

5:Support for 51+ languages.

6:Pre-trained word vectors.

7:State-of-the-art speed.

8:Part-of-speech tagging.

9:Robust, rigorously evaluated accuracy and many more



Word Tokenization


In [15]:
doc1 = "We are learning spaCy"
doc1 = nlp(doc1)

In [16]:

for token in doc1:
    print(token.text)

We
are
learning
spaCy


In [17]:
#For getting list of words, use split() method
doc1.text.split()

['We', 'are', 'learning', 'spaCy']

Word Properties


In [18]:
doc2=nlp("I have 3 coins and a 10 rupee note")
doc2

I have 3 coins and a 10 rupee note

In [19]:
for word in doc2:
  print(word.text, word.is_alpha, word.is_digit, word.is_currency)

I True False False
have True False False
3 False True False
coins True False False
and True False False
a True False False
10 False True False
rupee True False False
note True False False


In [20]:
## is_stop property
for word in doc2:
  print(word.text, word.is_stop)

I True
have True
3 False
coins False
and True
a True
10 False
rupee False
note False


In [21]:
## shape property
for word in doc1:
  print(word.text, word.shape_)

We Xx
are xxx
learning xxxx
spaCy xxxXx


Part of speech Tagging


In [22]:
## .pos_ property
for word in doc1:
  print(word.text, word.pos_)

We PRON
are AUX
learning VERB
spaCy VERB


In [23]:
## .tag_ property
for word in doc1:
    print(word.text,word.pos_,word.tag_)

We PRON PRP
are AUX VBP
learning VERB VBG
spaCy VERB VBN


In [24]:

## meaning of pos abbrev.
spacy.explain('NN')

'noun, singular or mass'

In [25]:

spacy.explain('VBP')

'verb, non-3rd person singular present'

Visual dependency using displacy


In [26]:
from spacy import displacy


In [27]:
displacy.render(doc1,style='dep', jupyter=True)

Lemmatization

In [28]:
doc3=nlp("playing played player")


In [29]:
for word in doc3:
  print(word.text, word.lemma_)


playing play
played play
player player


In [30]:
doc4=nlp("walks walk walked")


In [31]:
for word in doc4:
  print(word.text, word.lemma_, word.pos_)

walks walk NOUN
walk walk NOUN
walked walk VERB


Named Entity Recognition or Detection


In [32]:
doc5=nlp("By 2025 , India will grow so much in Technical field and earn more than 5 million dollars")


In [33]:
for word in doc5.ents:
  print(word.text, word.label_)

2025 DATE
India GPE
Technical GPE
more than 5 million dollars MONEY


In [34]:
spacy.explain('GPE')

'Countries, cities, states'

In [35]:
displacy.render(doc5,style='ent',jupyter=True)


Semantic Similarity


In [36]:
word1=nlp("dog")
word2=nlp("cat")

In [37]:
word1.similarity(word2)

  word1.similarity(word2)


0.6847176149951816

In [38]:
doc5=nlp("cat dog bird fish")


In [39]:
## similarity between words in a sentence

for w1 in doc5:
    for w2 in doc5:
        print((w1.text,w2.text),"Similarly :-",w1.similarity(w2))

('cat', 'cat') Similarly :- 1.0
('cat', 'dog') Similarly :- 0.6201360821723938
('cat', 'bird') Similarly :- 0.6236063241958618
('cat', 'fish') Similarly :- 0.14771202206611633
('dog', 'cat') Similarly :- 0.6201360821723938
('dog', 'dog') Similarly :- 1.0
('dog', 'bird') Similarly :- 0.6349874138832092
('dog', 'fish') Similarly :- 0.4316307306289673
('bird', 'cat') Similarly :- 0.6236063241958618
('bird', 'dog') Similarly :- 0.6349874138832092
('bird', 'bird') Similarly :- 1.0
('bird', 'fish') Similarly :- 0.320694237947464
('fish', 'cat') Similarly :- 0.14771202206611633
('fish', 'dog') Similarly :- 0.4316307306289673
('fish', 'bird') Similarly :- 0.320694237947464
('fish', 'fish') Similarly :- 1.0


  print((w1.text,w2.text),"Similarly :-",w1.similarity(w2))


Stopwords


In [40]:
from spacy.lang.en.stop_words import STOP_WORDS

In [41]:
print(STOP_WORDS)

{'just', 'herein', 'because', 'noone', 'nowhere', 'thence', 'alone', 'therefore', 'there', 'again', 'see', 'will', 'with', 'might', 'seems', 'used', 'thereupon', '‘ve', 'done', 'seemed', '’m', 'into', 'regarding', 'everything', 'after', 'enough', 'unless', 'so', 'where', 'several', 'doing', 'before', 'those', '‘ll', 'well', '’s', 'serious', 'do', 'one', 'cannot', 'than', 'using', 'others', 'somewhere', 'below', 'whenever', 'otherwise', 'whoever', 'move', 'via', 'itself', 'whole', 'five', 'beyond', "'ve", 'either', 'seeming', 'it', 'were', "'s", 'twelve', 'in', 'under', 'these', 'n‘t', 'thereafter', 'therein', 'meanwhile', 'four', 'not', 'everyone', 'now', 'further', 'on', 'or', 'third', 'whence', '’ve', 'last', 'would', 'was', 'thus', 'we', 'upon', 'almost', 'onto', 'becoming', 'its', 'hereafter', 'any', 'anything', 'among', 'formerly', 'how', 'whose', 'back', 'may', 'nobody', 'this', 'eleven', 'became', 'nor', 'becomes', 'neither', 'most', 'besides', 'own', 'along', 'did', 'two', 'non

In [42]:
STOP_WORDS.add('ohh')
nlp.vocab['ohh'].is_stop

True

Noun Chunks

In [43]:
doc15=nlp('the man playing football is a great player.')


In [44]:
for w in doc15.noun_chunks:
  print(w.text)

the man
football
a great player


In [45]:
## get root words

for w in doc15.noun_chunks:
  print(w.root.text)

man
football
player


In [48]:
for w in doc15.noun_chunks:
  print(w.root.text, "--connected by =", w.root.head.text)

man --connected by = is
football --connected by = playing
player --connected by = is


Sentence Segmentation and Boundary Detection

In [49]:
doc25=nlp("Hello friends ,we are learning spaCy. Are you all enjoying? keep learning")


In [50]:
for sent in doc25.sents:
    print(sent)

Hello friends ,we are learning spaCy.
Are you all enjoying?
keep learning


In [51]:
doc3=nlp("spaCy is an amazing library\nWe want to learn it in depth\nThat's why we are here :P")


In [52]:
for s in doc3.sents:
    print(s.text)

spaCy is an amazing library
We want to learn it in depth
That's why we are here :P
