# Intro NLTK ⚗️
- NLTK, or Natural Language Toolkit, is a Python library designed for natural language processing (NLP). It provides tools and resources for tasks like tokenization, stemming, lemmatization, part-of-speech tagging, parsing, and semantic reasoning. NLTK also includes corpora and lexical resources, making it a comprehensive platform for working with human language data. It is widely used in research, education, and industry for tasks such as text analysis, sentiment analysis, and language modeling

### Stemming method 🔍
- in Python, particularly within the context of Natural Language Processing (NLP), refers to the process of reducing words to their root or base form (stem). This technique aims to normalize text data by removing suffixes, prefixes, or inflections, thus grouping related words under a common stem. Stemming is used to simplify text analysis, improve information retrieval accuracy, and reduce the dimensionality of text data

In [5]:
import nltk

In [7]:
from nltk.stem.porter import PorterStemmer

In [9]:
p_stemmer = PorterStemmer()

In [15]:
words = ['run','runner', 'ran','runs','running','because','cause','pack','packages','logger','log','login','easily','fairly']

In [17]:
for word in words:
    print(word + '----->' + p_stemmer.stem(word))

run----->run
runner----->runner
ran----->ran
runs----->run
running----->run
because----->becaus
cause----->caus
pack----->pack
packages----->packag
logger----->logger
log----->log
login----->login
easily----->easili
fairly----->fairli


In [24]:
from nltk.stem.snowball import SnowballStemmer

In [26]:
s_temmer = SnowballStemmer(language='english')

In [28]:
for word in words:
    print(word + '----->' + s_temmer.stem(word))

run----->run
runner----->runner
ran----->ran
runs----->run
running----->run
because----->becaus
cause----->caus
pack----->pack
packages----->packag
logger----->logger
log----->log
login----->login
easily----->easili
fairly----->fairli


# Lemmatization 🗳️
- In Python, lemmatization is a process of converting a word to its base or root form, also known as a lemma, which is a dictionary form of the word. Unlike stemming, which simply removes suffixes or prefixes, lemmatization considers the word's context and part of speech to ensure the result is a valid, meaningful dictionary word

In [33]:
import spacy
nlp =spacy.load('en_core_web_sm')

In [35]:
doc1 =  nlp(u'I am a runner running in a race because i love to run since I ran today ')

In [43]:
for token in doc1:
    print(token.text,'\t',token.pos_,token.lemma,'\t',token.lemma_)

I 	 PRON 4690420944186131903 	 I
am 	 AUX 10382539506755952630 	 be
a 	 DET 11901859001352538922 	 a
runner 	 NOUN 12640964157389618806 	 runner
running 	 VERB 12767647472892411841 	 run
in 	 ADP 3002984154512732771 	 in
a 	 DET 11901859001352538922 	 a
race 	 NOUN 8048469955494714898 	 race
because 	 SCONJ 16950148841647037698 	 because
i 	 PRON 4690420944186131903 	 I
love 	 VERB 3702023516439754181 	 love
to 	 PART 3791531372978436496 	 to
run 	 VERB 12767647472892411841 	 run
since 	 SCONJ 10066841407251338481 	 since
I 	 PRON 4690420944186131903 	 I
ran 	 VERB 12767647472892411841 	 run
today 	 NOUN 11042482332948150395 	 today


## 📘 Here's what the preview definition shows:
- 🔤 Root Word | 🧬 Type of Root Word | 🧭 Hash Reference (Pointer) to en_core_web_sm library | 📚 References the Root Word from the previously mentioned library