# Lemmatization
In contrast to stemming, lemmatization looks beyond word reduction, and considers a language's full vocabulary to apply a *morphological analysis* to words. The lemma of 'was' is 'be' and the lemma of 'mice' is 'mouse'. Further, the lemma of 'meeting' might be 'meet' or 'meeting' depending on its use in a sentence.

In [7]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [8]:
doc1 = nlp("I am a runner running in a race because I love to run since I ran today")

for token in doc1:
    print(token.text, '\t', token.pos_, '\t', token.lemma_)

I 	 PRON 	 I
am 	 AUX 	 be
a 	 DET 	 a
runner 	 NOUN 	 runner
running 	 VERB 	 run
in 	 ADP 	 in
a 	 DET 	 a
race 	 NOUN 	 race
because 	 SCONJ 	 because
I 	 PRON 	 I
love 	 VERB 	 love
to 	 PART 	 to
run 	 VERB 	 run
since 	 SCONJ 	 since
I 	 PRON 	 I
ran 	 VERB 	 run
today 	 NOUN 	 today


<font color=green>In the above sentence, `running`, `run` and `ran` all point to the same lemma `run` (...11841) to avoid duplication.</font>

### Function to display lemmas
Since the display above is staggared and hard to read, let's write a function that displays the information we want more neatly.

In [9]:
def show_lemmas(text):
    for token in text:
        print(f'{token.text:{12}} {token.pos_:{6}}  {token.lemma_}')

Here we're using an **f-string** to format the printed text by setting minimum field widths and adding a left-align to the lemma hash value.

In [10]:
doc2 = nlp(u"I saw eighteen mice today!")

show_lemmas(doc2)

I            PRON    I
saw          VERB    see
eighteen     NUM     eighteen
mice         NOUN    mouse
today        NOUN    today
!            PUNCT   !


<font color=green>Notice that the lemma of `saw` is `see`, `mice` is the plural form of `mouse`, and yet `eighteen` is its own number, *not* an expanded form of `eight`.</font>

In [11]:
doc3 = nlp("I am meeting him tomorrow at the meeting.")

show_lemmas(doc3)

I            PRON    I
am           AUX     be
meeting      VERB    meet
him          PRON    he
tomorrow     NOUN    tomorrow
at           ADP     at
the          DET     the
meeting      NOUN    meeting
.            PUNCT   .


<font color=green>Here the lemma of `meeting` is determined by its Part of Speech tag.</font>

In [12]:
doc4 = nlp("That's an enormous automobile")

show_lemmas(doc4)

That         PRON    that
's           AUX     be
an           DET     an
enormous     ADJ     enormous
automobile   NOUN    automobile


<font color=green>Note that lemmatization does *not* reduce words to their most basic synonym - that is, `enormous` doesn't become `big` and `automobile` doesn't become `car`.</font>