## Lemmatizing Words Using WordNet
* Part-of-speech constants:
* ADJ: a
* ADV: r
* NOUN: n
* VERB: v

In [29]:
import nltk

from nltk.stem import *

import pandas as pd

### Stemming words

In [21]:
from nltk.stem import PorterStemmer 

stemmer = PorterStemmer() 
print(stemmer.stem('definitions'))

definit


In [22]:
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/loonycorn/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

### Lemmatizing Words

In [35]:
from nltk.stem import WordNetLemmatizer

wnl = WordNetLemmatizer()
print(wnl.lemmatize('definitions'))

definition


### Lemmatizing words by specifying parts-of-speech

In [53]:
print('Adjective: ', wnl.lemmatize('running', pos='a'))
print('Adverb: ', wnl.lemmatize('running', pos='r'))
print('Noun: ', wnl.lemmatize('running', pos='n'))
print('Verb: ', wnl.lemmatize('running', pos='v'))

Adjective:  running
Adverb:  running
Noun:  running
Verb:  run


In [72]:
input_tokens = ['dictionaries', 'dictionary', 
                'hushed', 'hush', 'hushing',
                'functional', 'functionally',
                'lying', 'lied', 'lies',
                'flawed', 'flaws', 'flawless', 
                'friendship', 'friendships', 'friendly', 'friendless', 
                'definitions', 'definition', 'definitely',  
                'the', 'these', 'those',
                'motivational', 'motivate', 'motivating']

In [73]:
ss =  SnowballStemmer('english')

ss_stemmed_tokens = []
for token in input_tokens:
    ss_stemmed_tokens.append(ss.stem(token))

In [74]:
wnl_lemmatized_tokens = []
for token in input_tokens:
    wnl_lemmatized_tokens.append(wnl.lemmatize(token, pos='v'))

In [75]:
stems_lemmas_df = pd.DataFrame({
    'words': input_tokens,
    'Snowball Stemmer': ss_stemmed_tokens,
    'WordNet Lemmatizer': wnl_lemmatized_tokens
})

stems_lemmas_df

Unnamed: 0,words,Snowball Stemmer,WordNet Lemmatizer
0,dictionaries,dictionari,dictionaries
1,dictionary,dictionari,dictionary
2,hushed,hush,hush
3,hush,hush,hush
4,hushing,hush,hush
5,functional,function,functional
6,functionally,function,functionally
7,lying,lie,lie
8,lied,lie,lie
9,lies,lie,lie


In [76]:
from nltk.tokenize import word_tokenize
 
with open('./datasets/biography.txt', 'r') as f:
    file_contents = f.read()
    
print(file_contents)

Marie Curie was a Polish-born physicist and chemist and one of the most famous scientists of her time.
Together with her husband Pierre, she was awarded the Nobel Prize in 1903, and she went on to win another in 1911.
Marie Sklodowska was born in Warsaw on 7 November 1867, the daughter of a teacher.
In 1891, she went to Paris to study physics and mathematics at the Sorbonne where she met Pierre Curie, professor of the School of Physics.
They were married in 1895.
The Curies worked together investigating radioactivity, building on the work of the German physicist Roentgen and the French physicist Becquerel.
In July 1898, the Curies announced the discovery of a new chemical element, polonium.
At the end of the year, they announced the discovery of another, radium.
The Curies, along with Becquerel, were awarded the Nobel Prize for Physics in 1903.
Pierre's life was cut short in 1906 when he was knocked down and killed by a carriage.
Marie took over his teaching post, becoming the first wo

In [77]:
word_tokens = word_tokenize(file_contents)

In [78]:
wnl = WordNetLemmatizer()
lemmatized_words = []

for word in word_tokens:
    lemmatized_words.append(wnl.lemmatize(word, pos="v"))

In [79]:
" ".join(lemmatized_words)

"Marie Curie be a Polish-born physicist and chemist and one of the most famous scientists of her time . Together with her husband Pierre , she be award the Nobel Prize in 1903 , and she go on to win another in 1911 . Marie Sklodowska be bear in Warsaw on 7 November 1867 , the daughter of a teacher . In 1891 , she go to Paris to study physics and mathematics at the Sorbonne where she meet Pierre Curie , professor of the School of Physics . They be marry in 1895 . The Curies work together investigate radioactivity , build on the work of the German physicist Roentgen and the French physicist Becquerel . In July 1898 , the Curies announce the discovery of a new chemical element , polonium . At the end of the year , they announce the discovery of another , radium . The Curies , along with Becquerel , be award the Nobel Prize for Physics in 1903 . Pierre 's life be cut short in 1906 when he be knock down and kill by a carriage . Marie take over his teach post , become the first woman to teac