# NLP Legacy: Hands-on Laboratory

We will explore how NLP worked before the invention of Transformers.
1.  **Bag of Words & TF-IDF**: Spam Detection.
2.  **Word Embeddings**: Basic Vector Arithmetic.

In [None]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

## Part 1: Spam Detection with TF-IDF

We will manually create a tiny dataset of messages.

In [None]:
# 1. Create Data
data = [
    ("Win a free iPhone now!", 1), # Spam
    ("Call me when you get home.", 0), # Ham
    ("Congratulations, you won the lottery.", 1),
    ("Meeting reschedule to 5 PM.", 0),
    ("Urgent! Claim your prize.", 1),
    ("Can we have lunch tomorrow?", 0)
]
df = pd.DataFrame(data, columns=['text', 'label'])

# 2. Convert Text to Numbers (TF-IDF)
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['text'])
y = df['label']

print("Vocabulary:", vectorizer.get_feature_names_out())
print("Shape of Vectors:", X.shape)

In [None]:
# 3. Train Naive Bayes Classifier
model = MultinomialNB()
model.fit(X, y)

# 4. Test
test_msgs = ["Free money waiting for you", "Hey, are you busy?"]
X_test = vectorizer.transform(test_msgs)
predictions = model.predict(X_test)

for msg, label in zip(test_msgs, predictions):
    print(f"Message: '{msg}' Prediction: {'SPAM' if label==1 else 'NORMAL'}")