A) Non-Negative Matrix Factorization (NMF) for Topic Modeling

📘 Theory (Short):

NMF is used to extract topics from a collection of documents.

It decomposes a document-term matrix into:

W (topics per document)

H (words per topic)

Useful to understand hidden themes in text.

Reconstruction Error measures how well the model fits the data.

In [2]:
import nltk
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import normalize

# Sample documents
docs = [
    "I love programming in Python. Python is great for data science.",
    "The stock market crashed due to economic slowdown.",
    "Machine learning and AI are transforming technology.",
    "Investors are cautious about the global economy.",
    "Python libraries like Numpy and Pandas are powerful."
]

# TF-IDF vectorization
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(docs)

# Apply NMF
nmf = NMF(n_components=2, random_state=42)
W = nmf.fit_transform(X)
H = nmf.components_

# Show top words per topic
words = vectorizer.get_feature_names_out()
for topic_idx, topic in enumerate(H):
    top_words = [words[i] for i in topic.argsort()[-5:]]
    print(f"Topic {topic_idx+1}: {', '.join(top_words)}")

# Reconstruction error
print("Reconstruction Error:", nmf.reconstruction_err_)


Topic 1: numpy, like, powerful, pandas, python
Topic 2: market, economic, crashed, slowdown, stock
Reconstruction Error: 1.6737376756861646


B) WordNet for Word Sense Disambiguation (WSD)

📘 Theory (Short):

WordNet is a large lexical database of English.

WSD (Word Sense Disambiguation) selects the correct meaning of a word in context.

Example: "bank" → river bank or financial bank?

Use Lesk algorithm from nltk.

In [3]:
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize
from nltk.corpus import wordnet as wn

nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('punkt')

# Sentence with ambiguous word "bank"
sentence = "He went to the bank to deposit money"
word = "bank"

# Disambiguation using Lesk
context = word_tokenize(sentence)
sense = lesk(context, word)

print("Best Sense:", sense)
print("Definition:", sense.definition())


[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Gauri\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\Gauri\AppData\Roaming\nltk_data...
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Gauri\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Best Sense: Synset('savings_bank.n.02')
Definition: a container (usually with a slot in the top) for keeping money at home
