# Lemmatization

Stemming and lemmatization have a [similar goal](https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html): 
> The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For instance: 

> "am, are, is $\Rightarrow$ be". 

> The result of this mapping of text will be something like:
`the boy's cars are different colors` $\Rightarrow$ 
`the boy car be differ color`. 

> However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

SpaCy has only lemmatization available instead of simple stemming.


In [1]:
import spacy

In [3]:
nlp = spacy.load("en_core_web_sm")

In [4]:
doc1 = nlp(u"I am a runner running in a race because I love to run since I ran since the day I first ran.")

In [22]:
print(f"{'Text':{12}}\t{'POS':{12}}\tLemma\n{'-'*36}")
for token in doc1:
  print(f"{token.text:{12}}\t{token.pos_:{12}}\t{token.lemma_:{12}}")

Text        	POS         	Lemma
------------------------------------
I           	PRON        	-PRON-      
am          	VERB        	be          
a           	DET         	a           
runner      	NOUN        	runner      
running     	VERB        	run         
in          	ADP         	in          
a           	DET         	a           
race        	NOUN        	race        
because     	ADP         	because     
I           	PRON        	-PRON-      
love        	VERB        	love        
to          	PART        	to          
run         	VERB        	run         
since       	ADP         	since       
I           	PRON        	-PRON-      
ran         	VERB        	run         
since       	ADP         	since       
the         	DET         	the         
day         	NOUN        	day         
I           	PRON        	-PRON-      
first       	ADV         	first       
ran         	VERB        	run         
.           	PUNCT       	.           
