In [2]:
import spacy

In [3]:
nlp = spacy.load("en_core_web_sm")

In [7]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.1.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.1.0/en_core_web_sm-3.1.0-py3-none-any.whl (13.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.6/13.6 MB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0mm
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.1.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [4]:
string = nlp('let us tokenize')
for token in string:
    print(token.text)

let
us
tokenize


In [5]:
string = nlp('let us tokenize')
for token in string:
    print(token.text,token.pos_,token.dep_,token.lemma_)

let VERB ROOT let
us PRON nsubj we
tokenize VERB ccomp tokenize


In [6]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x7fbc4e94af40>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x7fbc4e959cc0>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x7fbc4e6eedc0>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x7fbc4e573fc0>),
 ('lemmatizer',
  <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x7fbc4e9b0b00>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x7fbc4e6ee880>)]

In [7]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [8]:
string2 = nlp('let us learn how to use spacy')

In [9]:
for token in string2:
    print(token.text,token.pos_,token.dep_)

let VERB ROOT
us PRON nsubj
learn VERB ccomp
how ADV advmod
to PART aux
use VERB xcomp
spacy NOUN dobj


In [10]:
string2[0].pos_

'VERB'

In [11]:
doc = nlp('First Sentence.Second Sentence.Third Sentence.')

In [12]:
for sentence in doc.sents:
    print(sentence)

First Sentence.
Second Sentence.
Third Sentence.


In [13]:
doc2 = nlp(u'The ice cream costs $4.5')

In [14]:
for token in doc2:
    print(token.text)

The
ice
cream
costs
$
4.5


In [15]:
for token in doc2:
    print(token.text,end ='|')

The|ice|cream|costs|$|4.5|

In [18]:
for entity in doc2.ents:
    print(entity)
    print(entity.label_)
    print(str(spacy.explain(entity.label_)))

4.5
MONEY
Monetary values, including unit


In [19]:
for noun in doc2.noun_chunks:
    print(noun)

The ice cream


In [20]:
from spacy import displacy

In [22]:
displacy.render(doc2,style = 'dep',jupyter=True,options={'distance':100})

In [23]:
displacy.render(doc2,style = 'ent',jupyter=True)

# stemming


Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

In [24]:
import nltk

In [25]:
from nltk.stem.porter import PorterStemmer

In [26]:
stemmer = PorterStemmer()

In [32]:
words = ['run','runner','ran','runs','easily','fairly','fairness']

In [33]:
for word in words:
    print(stemmer.stem(word))

run
runner
ran
run
easili
fairli
fair


In [34]:
from nltk.stem.snowball import SnowballStemmer

In [35]:
stemmer2 = SnowballStemmer(language='english')

In [36]:
for word in words:
    print(stemmer2.stem(word))

run
runner
ran
run
easili
fair
fair


# lemmatization

Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

Stemming and Lemmatization both generate the foundation sort of the inflected words and therefore the only difference is that stem may not be an actual word whereas, lemma is an actual language word. Stemming follows an algorithm with steps to perform on the words which makes it faster.

Stemming and lemmatization are methods used by search engines and chatbots to analyze the meaning behind a word. Stemming uses the stem of the word, while lemmatization uses the context in which the word is being used. We'll later go into more detailed explanations and examples.  

Stemming algorithms work by cutting off the end or the beginning of the word, taking into account a list of common prefixes and suffixes that can be found in an inflected word. This indiscriminate cutting can be successful in some occasions, but not always, and that is why we affirm that this approach presents some limitations. 

Lemmatization, on the other hand, takes into consideration the morphological analysis of the words. To do so, it is necessary to have detailed dictionaries which the algorithm can look through to link the form back to its lemma.