Use library is NLTK for task

In [55]:
# For NLTK
import nltk
nltk.download('punkt_tab')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('maxent_ne_chunker_tab')
nltk.download('words')


from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk import pos_tag, ne_chunk

text = "I’m just someone who’s super curious about how tech works and how it can be used to make life easier or more meaningful. I really like exploring stuff like machine learning and AI, and I’ve been diving into things like music genre classification and even making a chatbot that talks like the Bhagavad Gita. It’s kinda crazy how much you can do with deep learning and all these tools like CNNs, transformers and other models I didn’t even knew existed few years ago. I’m also very into security – not just coding it but understanding how systems get attacked and how to protect them smartly. Ethics is a big thing for me too, cause like, what’s the point of building powerful tools if they’re used in wrong way, right? I also been trying to think more like a entrepreneur, trying to build stuff that people would actually use, like maintenance platforms and other ideas I’m still figuring out. I'm not perfect with everything I do, but I try, I learn, and I keep going. That’s kinda who I am."


[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!


In [56]:
import re
import string



In [57]:
def clean_text(text):
    text = text.lower()  # Lowercase
    text = re.sub(r'\d+', '', text)  # Remove numbers
    text = text.translate(str.maketrans('', '', string.punctuation))  # Remove punctuation
    text = re.sub(r'\W', '', text)  # Remove special characters
    return text

In [58]:
# Tokenization
tokens = word_tokenize(text)
print("Tokens:", tokens)


# POS Tagging
pos_tags = pos_tag(tokens)
print("POS Tags:", pos_tags)




Tokens: ['I', '’', 'm', 'just', 'someone', 'who', '’', 's', 'super', 'curious', 'about', 'how', 'tech', 'works', 'and', 'how', 'it', 'can', 'be', 'used', 'to', 'make', 'life', 'easier', 'or', 'more', 'meaningful', '.', 'I', 'really', 'like', 'exploring', 'stuff', 'like', 'machine', 'learning', 'and', 'AI', ',', 'and', 'I', '’', 've', 'been', 'diving', 'into', 'things', 'like', 'music', 'genre', 'classification', 'and', 'even', 'making', 'a', 'chatbot', 'that', 'talks', 'like', 'the', 'Bhagavad', 'Gita', '.', 'It', '’', 's', 'kinda', 'crazy', 'how', 'much', 'you', 'can', 'do', 'with', 'deep', 'learning', 'and', 'all', 'these', 'tools', 'like', 'CNNs', ',', 'transformers', 'and', 'other', 'models', 'I', 'didn', '’', 't', 'even', 'knew', 'existed', 'few', 'years', 'ago', '.', 'I', '’', 'm', 'also', 'very', 'into', 'security', '–', 'not', 'just', 'coding', 'it', 'but', 'understanding', 'how', 'systems', 'get', 'attacked', 'and', 'how', 'to', 'protect', 'them', 'smartly', '.', 'Ethics', 'is

In [59]:
# Stemming
stemmer = PorterStemmer()
stems = [stemmer.stem(token) for token in tokens]
print("Stemming:", stems)

Stemming: ['i', '’', 'm', 'just', 'someon', 'who', '’', 's', 'super', 'curiou', 'about', 'how', 'tech', 'work', 'and', 'how', 'it', 'can', 'be', 'use', 'to', 'make', 'life', 'easier', 'or', 'more', 'meaning', '.', 'i', 'realli', 'like', 'explor', 'stuff', 'like', 'machin', 'learn', 'and', 'ai', ',', 'and', 'i', '’', 've', 'been', 'dive', 'into', 'thing', 'like', 'music', 'genr', 'classif', 'and', 'even', 'make', 'a', 'chatbot', 'that', 'talk', 'like', 'the', 'bhagavad', 'gita', '.', 'it', '’', 's', 'kinda', 'crazi', 'how', 'much', 'you', 'can', 'do', 'with', 'deep', 'learn', 'and', 'all', 'these', 'tool', 'like', 'cnn', ',', 'transform', 'and', 'other', 'model', 'i', 'didn', '’', 't', 'even', 'knew', 'exist', 'few', 'year', 'ago', '.', 'i', '’', 'm', 'also', 'veri', 'into', 'secur', '–', 'not', 'just', 'code', 'it', 'but', 'understand', 'how', 'system', 'get', 'attack', 'and', 'how', 'to', 'protect', 'them', 'smartli', '.', 'ethic', 'is', 'a', 'big', 'thing', 'for', 'me', 'too', ',', '

In [60]:
# Lemmatization
lemmatizer = WordNetLemmatizer()
lemmas = [lemmatizer.lemmatize(token) for token in tokens]
print("Lemmatization:", lemmas)

Lemmatization: ['I', '’', 'm', 'just', 'someone', 'who', '’', 's', 'super', 'curious', 'about', 'how', 'tech', 'work', 'and', 'how', 'it', 'can', 'be', 'used', 'to', 'make', 'life', 'easier', 'or', 'more', 'meaningful', '.', 'I', 'really', 'like', 'exploring', 'stuff', 'like', 'machine', 'learning', 'and', 'AI', ',', 'and', 'I', '’', 've', 'been', 'diving', 'into', 'thing', 'like', 'music', 'genre', 'classification', 'and', 'even', 'making', 'a', 'chatbot', 'that', 'talk', 'like', 'the', 'Bhagavad', 'Gita', '.', 'It', '’', 's', 'kinda', 'crazy', 'how', 'much', 'you', 'can', 'do', 'with', 'deep', 'learning', 'and', 'all', 'these', 'tool', 'like', 'CNNs', ',', 'transformer', 'and', 'other', 'model', 'I', 'didn', '’', 't', 'even', 'knew', 'existed', 'few', 'year', 'ago', '.', 'I', '’', 'm', 'also', 'very', 'into', 'security', '–', 'not', 'just', 'coding', 'it', 'but', 'understanding', 'how', 'system', 'get', 'attacked', 'and', 'how', 'to', 'protect', 'them', 'smartly', '.', 'Ethics', 'is'

In [61]:
# NER
ner_tree = ne_chunk(pos_tags)
print("Named Entities (NLTK):")
print(ner_tree)


Named Entities (NLTK):
(S
  I/PRP
  ’/VBP
  m/RB
  just/RB
  someone/NN
  who/WP
  ’/VBP
  s/JJ
  super/JJ
  curious/JJ
  about/IN
  how/WRB
  tech/JJ
  works/NNS
  and/CC
  how/WRB
  it/PRP
  can/MD
  be/VB
  used/VBN
  to/TO
  make/VB
  life/NN
  easier/JJR
  or/CC
  more/RBR
  meaningful/JJ
  ./.
  I/PRP
  really/RB
  like/IN
  exploring/VBG
  stuff/NN
  like/IN
  machine/NN
  learning/NN
  and/CC
  (ORGANIZATION AI/NNP)
  ,/,
  and/CC
  I/PRP
  ’/VBP
  ve/RB
  been/VBN
  diving/VBG
  into/IN
  things/NNS
  like/IN
  music/NN
  genre/NN
  classification/NN
  and/CC
  even/RB
  making/VBG
  a/DT
  chatbot/NN
  that/WDT
  talks/NNS
  like/IN
  the/DT
  (ORGANIZATION Bhagavad/NNP Gita/NNP)
  ./.
  It/PRP
  ’/VBD
  s/JJ
  kinda/NN
  crazy/VB
  how/WRB
  much/JJ
  you/PRP
  can/MD
  do/VB
  with/IN
  deep/JJ
  learning/NN
  and/CC
  all/PDT
  these/DT
  tools/NNS
  like/IN
  (ORGANIZATION CNNs/NNP)
  ,/,
  transformers/NNS
  and/CC
  other/JJ
  models/NNS
  I/PRP
  didn/VBP
  ’/JJ
  t/NN

Compare lemmatization and stemming with at least 10 examples and explain the
differences.

In [62]:

words = ["running", "flies", "better", "cities", "caring", "played", "mice", "feet", "geese", "wolves"]
print(f"{'Word':<10}{'Stem':<15}{'Lemma':<15}")
print("-" * 40)
for word in words:
    stem = stemmer.stem(word)
    lemma = lemmatizer.lemmatize(word)
    print(f"{word:<10}{stem:<15}{lemma:<15}")

Word      Stem           Lemma          
----------------------------------------
running   run            running        
flies     fli            fly            
better    better         better         
cities    citi           city           
caring    care           caring         
played    play           played         
mice      mice           mouse          
feet      feet           foot           
geese     gees           goose          
wolves    wolv           wolf           


**Let’s talk about the differences**

1. Stemming is like cutting corners
Stemming is more of a quick and dirty method. It just chops off the ends of words most of the time, without caring if the result is an actual word. Like with flies, it becomes fli, which doesn’t really make sense on its own. It’s just removing the -es and assuming it helps.

2. Lemmatization cares more about the meaning
Lemmatization, on the other hand, tries to find the actual root word that has meaning in grammar. So flies becomes fly, which is a proper word. Same with mice, it turns into mouse — which is the correct singular form.

3. Lemmatization needs more info
Unlike stemming, lemmatization sometimes needs the context or part of speech to do its job right. For example, better could be an adjective or a verb. It stays the same unless you provide extra information.

4. Stemming is faster but less accurate
Because stemming is simple and rule-based, it's faster, but sometimes it produces words that aren't real (like citi, gees, wolv). Lemmatization is more reliable but a bit slower.
