In [1]:
# import libraries
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords

In [2]:
paragraph = '''It started before I was born. My biological mother was a young, unwed college graduate student, and she decided 
               to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything 
               was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided 
               at the last minute that they really wanted a girl. So my parents, who were on a waiting list, got a call in the 
               middle of the night asking: “We have an unexpected baby boy; do you want him?” They said: “Of course.” My 
               biological mother later found out that my mother had never graduated from college and that my father had never 
               graduated from high school. She refused to sign the final adoption papers.She only relented a few months later 
               when my parents promised that I would someday go to college.'''

In [3]:
# Tokenizing sentences
sentences = nltk.sent_tokenize(paragraph)

In [4]:
# Return all the sentences present in the paragraph
sentences

['It started before I was born.',
 'My biological mother was a young, unwed college graduate student, and she decided \n               to put me up for adoption.',
 'She felt very strongly that I should be adopted by college graduates, so everything \n               was all set for me to be adopted at birth by a lawyer and his wife.',
 'Except that when I popped out they decided \n               at the last minute that they really wanted a girl.',
 'So my parents, who were on a waiting list, got a call in the \n               middle of the night asking: “We have an unexpected baby boy; do you want him?” They said: “Of course.” My \n               biological mother later found out that my mother had never graduated from college and that my father had never \n               graduated from high school.',
 'She refused to sign the final adoption papers.She only relented a few months later \n               when my parents promised that I would someday go to college.']

## LEMMATIZATION

Lemmatization is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. Lemmatization is preferred over Stemming because lemmatization does morphological analysis of the words.

Lemmatization is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech.

Examples:
- The word "better" has "good" as its lemma. This link is missed by stemming, as it requires a dictionary look-up.
- The word "walk" is the base form for the word "walking", and hence this is matched in both stemming and lemmatization.
- The word "meeting" can be either the base form of a noun or a form of a verb ("to meet") depending on the context; e.g., "in     our last meeting" or "We are meeting again tomorrow". Unlike stemming, lemmatization attempts to select the correct lemma       depending on the context.

In [5]:
lemmatizer = WordNetLemmatizer()

In [6]:
# Apply lemmatization and remove stop words
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    print(words)
    words_after_lemmatization = [lemmatizer.lemmatize(word) for word in words if word not in set(stopwords.words('english'))]
    print(words_after_lemmatization)
    sentences[i] = ' '.join(words)      

['It', 'started', 'before', 'I', 'was', 'born', '.']
['It', 'started', 'I', 'born', '.']
['My', 'biological', 'mother', 'was', 'a', 'young', ',', 'unwed', 'college', 'graduate', 'student', ',', 'and', 'she', 'decided', 'to', 'put', 'me', 'up', 'for', 'adoption', '.']
['My', 'biological', 'mother', 'young', ',', 'unwed', 'college', 'graduate', 'student', ',', 'decided', 'put', 'adoption', '.']
['She', 'felt', 'very', 'strongly', 'that', 'I', 'should', 'be', 'adopted', 'by', 'college', 'graduates', ',', 'so', 'everything', 'was', 'all', 'set', 'for', 'me', 'to', 'be', 'adopted', 'at', 'birth', 'by', 'a', 'lawyer', 'and', 'his', 'wife', '.']
['She', 'felt', 'strongly', 'I', 'adopted', 'college', 'graduate', ',', 'everything', 'set', 'adopted', 'birth', 'lawyer', 'wife', '.']
['Except', 'that', 'when', 'I', 'popped', 'out', 'they', 'decided', 'at', 'the', 'last', 'minute', 'that', 'they', 'really', 'wanted', 'a', 'girl', '.']
['Except', 'I', 'popped', 'decided', 'last', 'minute', 'really',