Text Processing with NLTK3 Cookbook
## Chapter 2 Replacing and Correcting Words

### Stemming Words (p.30)


Think about the following code this way:

**```PorterStemmer```** is like **a stemming company** in **the town of nltk.stem** (i.e., module). By executing **```PorterStemmer()```**, you request the PosterStemmer' company send you an expert stemmer.

Let's look at how this is done step by step:

The following code asks Python for access to **```PorterStemmer```** (a stemming company) in the nltk.stem module (town).
```
from nltk.stem import PorterStemmer
```
It's like getting the company's phone number.

The following code asks **```PorterStemmer```** calls the business to send you one of their experts.
```
stemmer = PorterStemmer()                      
```
Here, ***```stemmer```*** is a variable, a nickname for the stemming expert. (It's like your plumber 'Jimmy'; stemmer is good at stemming words.)

The following line asks the expert ```stemmer``` to **```stem```** 'cooking'.
```
res = stemmer.stem('cooking')
```

In [16]:
# Stemming Words

from nltk.stem import PorterStemmer 
stemmer = PorterStemmer()                      
            
res = stemmer.stem('cooking')                  # Ask the 'stemmer' to 'stem' 'cooking'.
print(res)

res = stemmer.stem('cookery')
print(res)

res = stemmer.stem('book')
print(res)

res = stemmer.stem('universities')
print(res)

res = stemmer.stem('univers')
print(res)

res = stemmer.stem('universal')
print(res)

cook
cookeri
book
univers
univ
univers


### Lemmatizing Words with WordNet (p.32)

Lemmatizing works like Stemming. **```WordNetLemmatizer```** is a business you call to get a lemmatizing expert. This expert is called 'lemmatizer' (variable), of whom you ask to lemmatize any words.

When you ask lemmatizer to lematize using the **```lemmatize```** command, you can indicate what part-of-speech (pos) your word is (if there are multiple possibilities).

```
res = lemmatizer.lemmatize('cooking', pos='n')    # assume that 'cooking' is a noun.
```


In [89]:
# Lemmatizing Words with WordNet

from nltk.stem import WordNetLemmatizer         # Import WordNetLemmatizer fom the nltk.stem module.
lemmatizer = WordNetLemmatizer()                # Create an instance of WordNetLemmatizer.
res = lemmatizer.lemmatize('cooking')           # Ask 'lemmatizer' (expert) to lemmatize 'cooking'.
print(res)
res = lemmatizer.lemmatize('cooking', pos='v')
print(res)
res = lemmatizer.lemmatize('cooking', pos='n')
print(res)

cooking
cook
cooking


### Stemming vs. Lemmatization (p.33 — under There's more...)

- Stems may not be valid words.
- Lemmas are actual words found in Dictionaries

In [90]:
# Lemmatizing Words with WordNet / Stemming vs. Lemmatization (p.33)
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer

print("*** Stemming Examples ***")
stemmer = PorterStemmer()
print(stemmer.stem('believes'))
print(stemmer.stem('universities'))
print(stemmer.stem('leading'))

print("\n*** Lemmatization Examples ***")
lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize('believes'))
print(lemmatizer.lemmatize('universities'))
print(lemmatizer.lemmatize('leading'))


*** Stemming Examples ***
believ
univers
lead

*** Lemmatization Examples ***
belief
university
leading


### Combining Stemming with Lematization (p.34)

- Lemmatizing before stemming can compress words.

In [91]:
# Lemmatizing Words with WordNet / Combining Stemming with Lemmatization

from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()


stem1 = stemmer.stem('buses')
print(stem1)

lemma = lemmatizer.lemmatize('buses')
print(lemma)
stem2 = stemmer.stem(lemma)
print(stem2)


buse
bus
bu
