## LEMMATIZATION

In contrast to stemming, lemmatization looks beyond word reduction, and considers a language's 
full vocabulary to apply a morphological analysis to words. The lemma of 'was' is 'be' and the lemma of 'mice' is 'mouse'. 
Further, the lemma of 'meeting' might be 'meet' or 'meeting' depending on its use in a sentence.

In [2]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm') # loading the language library

In [3]:
doc1 = nlp(u"I am a runner running in a race because I love to run since I ran today")

In [4]:
for token in doc1:
    print(token.text,'\t',token.pos_,'\t',token.lemma,'\t',token.lemma_)

I 	 PRON 	 561228191312463089 	 -PRON-
am 	 VERB 	 10382539506755952630 	 be
a 	 DET 	 11901859001352538922 	 a
runner 	 NOUN 	 12640964157389618806 	 runner
running 	 VERB 	 12767647472892411841 	 run
in 	 ADP 	 3002984154512732771 	 in
a 	 DET 	 11901859001352538922 	 a
race 	 NOUN 	 8048469955494714898 	 race
because 	 ADP 	 16950148841647037698 	 because
I 	 PRON 	 561228191312463089 	 -PRON-
love 	 VERB 	 3702023516439754181 	 love
to 	 PART 	 3791531372978436496 	 to
run 	 VERB 	 12767647472892411841 	 run
since 	 ADP 	 10066841407251338481 	 since
I 	 PRON 	 561228191312463089 	 -PRON-
ran 	 VERB 	 12767647472892411841 	 run
today 	 NOUN 	 11042482332948150395 	 today


## Function to display Lemmas

In [9]:
def show_lemmas(text):
    for token in text:
        print(f'{token.text:{12}} {token.pos_:{6}} {token.lemma:<{22}} {token.lemma_}')

In [10]:
show_lemmas(doc1)

I            PRON   561228191312463089     -PRON-
am           VERB   10382539506755952630   be
a            DET    11901859001352538922   a
runner       NOUN   12640964157389618806   runner
running      VERB   12767647472892411841   run
in           ADP    3002984154512732771    in
a            DET    11901859001352538922   a
race         NOUN   8048469955494714898    race
because      ADP    16950148841647037698   because
I            PRON   561228191312463089     -PRON-
love         VERB   3702023516439754181    love
to           PART   3791531372978436496    to
run          VERB   12767647472892411841   run
since        ADP    10066841407251338481   since
I            PRON   561228191312463089     -PRON-
ran          VERB   12767647472892411841   run
today        NOUN   11042482332948150395   today


In [11]:
doc2 = nlp(u"I saw eighteen mice today!")
show_lemmas(doc2)

I            PRON   561228191312463089     -PRON-
saw          VERB   11925638236994514241   see
eighteen     NUM    9609336664675087640    eighteen
mice         NOUN   1384165645700560590    mouse
today        NOUN   11042482332948150395   today
!            PUNCT  17494803046312582752   !


In [12]:
doc3 = nlp(u"AB DE Villiers is the best batsman in the world.The Greatest ever!")

In [13]:
show_lemmas(doc3)

AB           PROPN  3916325639175504915    AB
DE           PROPN  7237117249260884669    DE
Villiers     PROPN  14717145686106484398   Villiers
is           VERB   10382539506755952630   be
the          DET    7425985699627899538    the
best         ADJ    5711639017775284443    good
batsman      NOUN   10958173730388585239   batsman
in           ADP    3002984154512732771    in
the          DET    7425985699627899538    the
world        NOUN   1703489418272052182    world
.            PUNCT  12646065887601541794   .
The          DET    7425985699627899538    the
Greatest     ADV    2478119202729520523    greatest
ever         ADV    6231102377460051108    ever
!            PUNCT  17494803046312582752   !
