# Lemmatization

As we saw in the previous chapter, we can explain to the machine which words are similar but also how different there are.

However some "different" words are only variations of the same word and should not be considered as different entries. 

Let's take an example:

Imagine that you are asked to build a model to classify books in two categories: _cooking_ and _cars_. You will use the most frequent words of the book to build your algorithm.

In that case you don't really want to make a distinction between `apple` and `apples` or between `wheel` and `wheels`. You prefer to consider `apple` and `apples` as being variations of `apple`.

To fix that, we will apply **lemmatization**. This approach aims to reduce each word to its simplest variation (named **lemma**). This lemma corresponds to the heading word in a language dictionary:


**apple** (noun) : `a round fruit (usually with a green or red skin) which can be eaten (plural: apples)`

 


## Still confused?
Let's see how it works in a practical case.

First, read [this article](https://www.machinelearningplus.com/nlp/lemmatization-examples-python/).

Then, try to apply what you have learned by using SpaCy or NLTK.

**Pro tips:** Most lemmatizers only work with a single word and not on sentences. Think about tokenizing your sentence first.

**Pro tips:** If you experience SSL issues during `nltk` import [check this](https://stackoverflow.com/questions/38916452/nltk-download-ssl-certificate-verify-failed).

In [None]:
# Can you lemmatize this sentence with Spacy and / or NLTK?

my_sentence = "Those children are playing. this game, those games, I play he plays"


What are the differences between both tools ?

## Conclusion
There are multiple libraries that allow you to do lemmatization. Each of them have their particularities.
There are also other techniques to "simplify" words like [Stemming](https://medium.com/swlh/introduction-to-stemming-vs-lemmatization-nlp-8c69eb43ecfe). Feel free explore those that seems relevant to your use-case.

![stemming vs lemmatization](https://miro.medium.com/max/2050/1*ES5bt7IoInIq2YioQp2zcQ.png)
