## Stemming 

Stemming means by removal of certain character we are able to find the root word.
- Eg : Cooking - Cook,running - runn
- In Stemming it dosen't look on meaning it has strick rules like just remove -ing,-able etc
- Stemming is mostly used in Sentiment Analysis

In [1]:
#Spacy dosent support Stemming so we have to use NLTK
import nltk

In [2]:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

In [3]:
words = ["eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]

for word in words:
    print(word,'|',stemmer.stem(word)) # stem is an method in the PorterStemmer class which will reduce the word into the root word

eating | eat
eats | eat
eat | eat
ate | ate
adjustable | adjust
rafting | raft
ability | abil
meeting | meet


## Lemmatization 

Lemmatization means it is similar to stemming but the difference is that it actually gives meaningfull base word.
- eg:eating - ate,running - run
- Lemmatization takes more time as compared to stemming because it finds meaningful word/ representation. Stemming just needs to get a base word and therefore takes less time.
- Lemmatization is mainly used in chatbots


In [4]:
#Spacy can be used for lemmatization
import spacy

In [5]:
nlp = spacy.load("en_core_web_sm")

doc1 = nlp("Mando talked for 3 hours although talking isn't his thing")
doc2 = nlp("eating eats eat ate adjustable rafting ability meeting better")

for token in doc2:
    print(token,'|',token.lemma_)

eating | eat
eats | eat
eat | eat
ate | eat
adjustable | adjustable
rafting | raft
ability | ability
meeting | meeting
better | well


### Customizing Lemmatizer

In [6]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [7]:
#Here we can see that the word brother is written as bro and brah so here we dont have a lemmatized word for it so we add brother to it.
doc = nlp("Bro, you wanna go? Brah, don't say no! I am exhausted")
for token in doc:
    print(token.text, "|", token.lemma_)

Bro | bro
, | ,
you | you
wanna | wanna
go | go
? | ?
Brah | Brah
, | ,
do | do
n't | not
say | say
no | no
! | !
I | I
am | be
exhausted | exhaust


In [11]:
#here we are taking attribute_ruler for adding or modifying the words in the lemmatizer
ar = nlp.get_pipe("attribute_ruler")
ar.add([[{"TEXT":"Bro"}],[{"TEXT":"Brah"}]],{"LEMMA":"Brother"})

doc = nlp("Bro, you wanna go? Brah, don't say no! I am exhausted")
for token in doc:
    print(token.text, "|", token.lemma_)

Bro | Brother
, | ,
you | you
wanna | wanna
go | go
? | ?
Brah | Brother
, | ,
do | do
n't | not
say | say
no | no
! | !
I | I
am | be
exhausted | exhaust
