In [1]:
#import libraries
import nltk
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords

In [2]:
paragraph = '''It started before I was born. My biological mother was a young, unwed college graduate student, and she decided 
               to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything 
               was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided 
               at the last minute that they really wanted a girl. So my parents, who were on a waiting list, got a call in the 
               middle of the night asking: “We have an unexpected baby boy; do you want him?” They said: “Of course.” My 
               biological mother later found out that my mother had never graduated from college and that my father had never 
               graduated from high school. She refused to sign the final adoption papers.She only relented a few months later 
               when my parents promised that I would someday go to college.'''

In [3]:
# Tokenizing sentences
sentences = nltk.sent_tokenize(paragraph)

In [4]:
sentences

['It started before I was born.',
 'My biological mother was a young, unwed college graduate student, and she decided \n               to put me up for adoption.',
 'She felt very strongly that I should be adopted by college graduates, so everything \n               was all set for me to be adopted at birth by a lawyer and his wife.',
 'Except that when I popped out they decided \n               at the last minute that they really wanted a girl.',
 'So my parents, who were on a waiting list, got a call in the \n               middle of the night asking: “We have an unexpected baby boy; do you want him?” They said: “Of course.” My \n               biological mother later found out that my mother had never graduated from college and that my father had never \n               graduated from high school.',
 'She refused to sign the final adoption papers.She only relented a few months later \n               when my parents promised that I would someday go to college.']

## STEMMING

Stemming is the process of producing morphological variants of a root/base word. A stemming algorithm reduces the words “chocolates”, “chocolatey”, “choco” to the root word, “chocolate” and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”. The input to the stemmer is tokenized words. 

### Errors in Stemming:

- **Over-stemming**: Over-stemming occurs when two words are stemmed from the same root that are of different stems. Over-                            stemming can also be regarded as false-positives. 
- **Under-stemming**: Under-stemming occurs when two words are stemmed from the same root that are not of different stems.                             Under-stemming can be interpreted as false-negatives. 

### Applications of stemming : 

- Stemming is used in information retrieval systems like search engines.
- It is used to determine domain vocabularies in domain analysis.

In [5]:
stemmer = PorterStemmer()

In [6]:
# Apply Stemming and remove stop words
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    print(words)
    words_after_stemming = [stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    print(words_after_stemming) 
    sentences[i] = ' '.join(words_after_stemming)

['It', 'started', 'before', 'I', 'was', 'born', '.']
['it', 'start', 'i', 'born', '.']
['My', 'biological', 'mother', 'was', 'a', 'young', ',', 'unwed', 'college', 'graduate', 'student', ',', 'and', 'she', 'decided', 'to', 'put', 'me', 'up', 'for', 'adoption', '.']
['my', 'biolog', 'mother', 'young', ',', 'unw', 'colleg', 'graduat', 'student', ',', 'decid', 'put', 'adopt', '.']
['She', 'felt', 'very', 'strongly', 'that', 'I', 'should', 'be', 'adopted', 'by', 'college', 'graduates', ',', 'so', 'everything', 'was', 'all', 'set', 'for', 'me', 'to', 'be', 'adopted', 'at', 'birth', 'by', 'a', 'lawyer', 'and', 'his', 'wife', '.']
['she', 'felt', 'strongli', 'i', 'adopt', 'colleg', 'graduat', ',', 'everyth', 'set', 'adopt', 'birth', 'lawyer', 'wife', '.']
['Except', 'that', 'when', 'I', 'popped', 'out', 'they', 'decided', 'at', 'the', 'last', 'minute', 'that', 'they', 'really', 'wanted', 'a', 'girl', '.']
['except', 'i', 'pop', 'decid', 'last', 'minut', 'realli', 'want', 'girl', '.']
['So', '

In [7]:
# Sentences after stemming and removing stop words
sentences

['it start i born .',
 'my biolog mother young , unw colleg graduat student , decid put adopt .',
 'she felt strongli i adopt colleg graduat , everyth set adopt birth lawyer wife .',
 'except i pop decid last minut realli want girl .',
 'so parent , wait list , got call middl night ask : “ we unexpect babi boy ; want ? ” they said : “ of course. ” my biolog mother later found mother never graduat colleg father never graduat high school .',
 'she refus sign final adopt papers.sh relent month later parent promis i would someday go colleg .']

In [None]:
**Problem with Stemming**
Produced intermediate representaion of words may not have any actual meaning.
Example: realli, intelligen, fina etc