## Stemming using NLTK

Stemming is a method in text processing that eliminates prefixes and suffixes from words, transforming them into their fundamental or root form, The main objective of stemming is to streamline and standardize words, enhancing the effectiveness of the natural language processing tasks.

In [1]:
import nltk
import spacy

In [2]:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

In [6]:
words = ["eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]

for word in words:
    print(word," ------> ",stemmer.stem(word))

eating  ------>  eat
eats  ------>  eat
eat  ------>  eat
ate  ------>  ate
adjustable  ------>  adjust
rafting  ------>  raft
ability  ------>  abil
meeting  ------>  meet


## Lemmatization in Spacy

Lemmatization is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item. Lemmatization is similar to stemming but it brings context to the words.

In [9]:
nlp = spacy.load("en_core_web_sm")

doc = nlp("Mando talked for 3 hours although talking isn't his thing")
doc = nlp("eating eats eat ate adjustable rafting ability meeting better")

for token in doc:
    print(token," ----> ",token.lemma_)

eating  ---->  eat
eats  ---->  eat
eat  ---->  eat
ate  ---->  eat
adjustable  ---->  adjustable
rafting  ---->  raft
ability  ---->  ability
meeting  ---->  meet
better  ---->  well


## Customizing lemmatizer

In [10]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [12]:
doc = nlp("Bro, you wanna go? Brah, don't say no! I am exhausted")
for token in doc:
    print(token.text, " ----> ", token.lemma_)

Bro  ---->  Bro
,  ---->  ,
you  ---->  you
wanna  ---->  wanna
go  ---->  go
?  ---->  ?
Brah  ---->  Brah
,  ---->  ,
do  ---->  do
n't  ---->  not
say  ---->  say
no  ---->  no
!  ---->  !
I  ---->  I
am  ---->  be
exhausted  ---->  exhaust


### attribute_ruler
The attribute ruler lets you set token attributes for tokens identified by Matcher patterns. The attribute ruler is typically used to handle exceptions for token attributes and to map values between attributes such as mapping fine-grained POS tags to coarse-grained POS tags. 

In [13]:
ar = nlp.get_pipe("attribute_ruler")

ar.add([[{"TEXT":"Bro"}],[{"TEXT":"Brah"}]],{"LEMMA":"Brother"})

doc = nlp("Bro, you wanna go? Brah, don't say no! I am exhausted")
for token in doc:
    print(token.text, " ----> ", token.lemma_)

Bro  ---->  Brother
,  ---->  ,
you  ---->  you
wanna  ---->  wanna
go  ---->  go
?  ---->  ?
Brah  ---->  Brother
,  ---->  ,
do  ---->  do
n't  ---->  not
say  ---->  say
no  ---->  no
!  ---->  !
I  ---->  I
am  ---->  be
exhausted  ---->  exhaust
