# Lemmatization

Lemmatization is a natural language processing (NLP) technique that reduces words to their base or dictionary form, called the lemma. Unlike stemming, which just cuts off prefixes or suffixes, lemmatization takes into account the context of the word, ensuring the result is a valid word in the language. For example, "running" becomes "run" and "better" becomes "good."

In [9]:
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
lemmatizer = WordNetLemmatizer()

In [10]:
corpus = "The quick brown fox jumps over the lazy dog. It is running swiftly through the green grass, while the birds are singing beautifully in the trees. The sun is shining brightly, making everything look more vibrant. Children are playing and laughing, creating a joyful atmosphere in the park. Everyone seems to be enjoying the lovely day."
corpus

'The quick brown fox jumps over the lazy dog. It is running swiftly through the green grass, while the birds are singing beautifully in the trees. The sun is shining brightly, making everything look more vibrant. Children are playing and laughing, creating a joyful atmosphere in the park. Everyone seems to be enjoying the lovely day.'

In [13]:
nltk.download('punkt')

words = word_tokenize(corpus)

'''
POS tag

Noun - n
adjective - a
adverb - r
verb - v
'''

nltk.download('wordnet')
words = [lemmatizer.lemmatize(word, pos='v') for word in words]

print(words)

['The', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazy', 'dog', '.', 'It', 'be', 'run', 'swiftly', 'through', 'the', 'green', 'grass', ',', 'while', 'the', 'bird', 'be', 'sing', 'beautifully', 'in', 'the', 'tree', '.', 'The', 'sun', 'be', 'shin', 'brightly', ',', 'make', 'everything', 'look', 'more', 'vibrant', '.', 'Children', 'be', 'play', 'and', 'laugh', ',', 'create', 'a', 'joyful', 'atmosphere', 'in', 'the', 'park', '.', 'Everyone', 'seem', 'to', 'be', 'enjoy', 'the', 'lovely', 'day', '.']


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
