<h3>Stemming in NLTK</h3>

In [1]:
from nltk.stem import PorterStemmer  # SnowballStemmer
stemmer = PorterStemmer()

In [2]:
words = ["eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]

for word in words:
    print(word, "|", stemmer.stem(word))

eating | eat
eats | eat
eat | eat
ate | ate
adjustable | adjust
rafting | raft
ability | abil
meeting | meet


<h3>Lemmatization in Spacy</h3>

In [4]:
import spacy

In [5]:
nlp = spacy.load("en_core_web_sm")

doc = nlp("Mando talked for 3 hours although talking isn't his thing")
doc = nlp("eating eats eat ate adjustable rafting ability meeting better")
for token in doc:
    print(token, " | ", token.lemma_)

eating  |  eat
eats  |  eat
eat  |  eat
ate  |  eat
adjustable  |  adjustable
rafting  |  raft
ability  |  ability
meeting  |  meeting
better  |  well


In [6]:
doc = nlp("Mando talked for 3 hours although talking isn't his hting he became talkative")

for token in doc:
    print(token, "|", token.lemma_)

Mando | Mando
talked | talk
for | for
3 | 3
hours | hour
although | although
talking | talk
is | be
n't | not
his | his
hting | hting
he | he
became | become
talkative | talkative


<h3>Customizing lemmatizer</h3>

In [7]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [29]:
ar = nlp.get_pipe('attribute_ruler')

ar.add([[{"TEXT":"Bro"}],[{"TEXT":"Brah"}]],{"LEMMA":"Brother"})

doc = nlp("Bro, you wanna go? Brah, don't say no! I am exhausted")
for token in doc:
    print(token.text, "|", token.lemma_)

Bro | Brother
, | ,
you | you
wanna | wanna
go | go
? | ?
Brah | Brother
, | ,
do | do
n't | not
say | say
no | no
! | !
I | I
am | be
exhausted | exhaust


In [35]:
doc[6]

Brah

In [36]:
doc[6].lemma_

'Brother'

###                     **Stemming and Lemmatization: Exercises**

#let import necessary libraries and create the object

#for nltk
import nltk
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

#downloading all neccessary packages related to nltk
nltk.download('all')


#for spacy
import spacy
nlp = spacy.load("en_core_web_sm")

**Exercise1:** 
- Convert these list of words into base form using Stemming and Lemmatization and observe the transformations
- Write a short note on the words that have different base words using stemming and Lemmatization

In [None]:
#using stemming in nltk
lst_words = ['running', 'painting', 'walking', 'dressing', 'likely', 'children', 'whom', 'good', 'ate', 'fishing']



In [None]:
#using lemmatization in spacy

doc = nlp("running painting walking dressing likely children who good ate fishing")



**Exercise2:**

- convert the given text into it's base form using both stemming and lemmatization

In [None]:
text = """Latha is very multi talented girl.She is good at many skills like dancing, running, singing, playing.She also likes eating Pav Bhagi. she has a 
habit of fishing and swimming too.Besides all this, she is a wonderful at cooking too.
"""

#using stemming in nltk


#step1: Word tokenizing




#step2: getting the base form for each token using stemmer




#step3: joining all words in a list into string using 'join()'

#using lemmatisation in spacy


#step1: Creating the object for the given text



#step2: getting the base form for each token using spacy 'lemma_'




#step3: joining all words in a list into string using 'join()'

###                     **Stemming and Lemmatization: Solutions**

In [None]:
#let import necessary libraries and create the object

#for nltk
import nltk
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

#downloading all neccessary packages related to nltk
nltk.download('all')


#for spacy
import spacy
nlp = spacy.load("en_core_web_sm")

**Exercise1:** 
- Convert these list of words into base form using Stemming and Lemmatization and observe the transformations
- Write a short note on the words that have different base words using stemming and Lemmatization

In [None]:
#using stemming in nltk
lst_words = ['running', 'painting', 'walking', 'dressing', 'likely', 'children', 'whom', 'good', 'ate', 'fishing']

for word in lst_words:
    print(f"{word} | {stemmer.stem(word)}")

In [None]:
#using lemmatization in spacy

doc = nlp("running painting walking dressing likely children whom good ate fishing")
for token in doc:
    print(token, " | ", token.lemma_)

**Observations**

- Words that are different in stemming and lemmatization are:
    - painting
    - likely
    - children
    - ate
    - fishing

- As Stemming achieves the base word by removing the **suffixes** [ing, ly etc], so it successfully transform the words like 'painting', 'likely', 'fishing' and lemmatization fails for some words ending with suffixes here.

- As Lemmatization uses the **dictionary** meanings while converting to the base form, so words like 'children' and 'ate' are successfully transformed and stemming fails here.

**Exercise2:**

- convert the given text into it's base form using both stemming and lemmatization

In [None]:
text = """Latha is very multi talented girl.She is good at many skills like dancing, running, singing, playing.She also likes eating Pav Bhagi. she has a 
habit of fishing and swimming too.Besides all this, she is a wonderful at cooking too.
"""

In [None]:
#using stemming in nltk

#step1: Word tokenizing
all_word_tokens = nltk.word_tokenize(text)


#step2: getting the base form for each token using stemmer
all_base_words = []

for token in all_word_tokens:
  base_form = stemmer.stem(token)
  all_base_words.append(base_form)


#step3: joining all words in a list into string using 'join()'
final_base_text = ' '.join(all_base_words)
print(final_base_text)

In [None]:
#using lemmatisation in spacy


#step1: Creating the object for the given text
doc = nlp(text)
all_base_words = []

#step2: getting the base form for each token using spacy 'lemma_'
for token in doc:
  base_word =  token.lemma_
  all_base_words.append(base_word)


#step3: joining all words in a list into string using 'join()'
final_base_text = ' '.join(all_base_words)
print(final_base_text)