### Stemming

stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words
known as a lemma. Stemming is important in natural language understanding 
(NLU) and natural language processing(NLP)

In [1]:
## Classification Problem
## Comments of product is a positive review or negative review
## Reviews ----> eating,eat,eaten[going,gone,goes]--->go"
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalize"]

### PorterStemmer

In [2]:
from nltk.stem import PorterStemmer

In [3]:
stemming =PorterStemmer()

In [4]:
for word in words:
    print(word+"---->"+stemming.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalize---->final


In [5]:
stemming.stem('congratulations')

'congratul'

In [6]:
stemming.stem("sitting")

'sit'

### RegexpStemmer class

NLTK has RegexpStemmer class with the help of which we can easily implement Regular 
Expression Stemmer algorithms. It basically takes a single regular expression and removes any
prefix or suffix that matches the expression.Let us see an example

In [7]:
from nltk.stem import RegexpStemmer

In [8]:
reg_stemmer=RegexpStemmer('ing$|s$|e$|able$', min=4)

In [9]:
reg_stemmer.stem('eating')

'eat'

In [10]:
reg_stemmer.stem('ingeating')

'ingeat'

### Snowball Stemmer

In [11]:
from nltk.stem import SnowballStemmer

In [12]:
snowballsstemmer=SnowballStemmer('english')

In [13]:
for word in words:
    print(word+"--->"+snowballsstemmer.stem(word))

eating--->eat
eats--->eat
eaten--->eaten
writing--->write
writes--->write
programming--->program
programs--->program
history--->histori
finally--->final
finalize--->final


In [14]:
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [15]:
snowballsstemmer.stem("fairly"),snowballsstemmer.stem("sportingly")

('fair', 'sport')

In [16]:
snowballsstemmer.stem("going")

'go'

In [17]:
snowballsstemmer.stem("goes")

'goe'

### Wordnet Lemmatizer
Lemmatization technique is like stemming. The output we will get after lemmatization is called
'lemma'. which is a root word rather than root stem, the output of stemming.After lemmatisation,
we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This
class uses morphy() function to the WordNet CorpusReader class to find a lemma.Let us
understand it with an example -

In [21]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\swabh\AppData\Roaming\nltk_data...


True

In [22]:
from nltk.stem import WordNetLemmatizer

In [23]:
lemmatizer=WordNetLemmatizer()

In [24]:
'''
POS- Noun-n
verb-v
abjective-a
adverb-r
'''
lemmatizer.lemmatize("going",pos='v')

'go'

In [26]:
for word in words:
    print(word+"--->"+lemmatizer.lemmatize(word,pos='v'))

eating--->eat
eats--->eat
eaten--->eat
writing--->write
writes--->write
programming--->program
programs--->program
history--->history
finally--->finally
finalize--->finalize


In [27]:
lemmatizer.lemmatize("goes",pos='v')

'go'

In [29]:
lemmatizer.lemmatize("fairly",pos='v'),lemmatizer.lemmatize("sportingly")

('fairly', 'sportingly')