# Week 1: ML Spam Classifier - Full Tutorial

### 🎯 **Learning Objectives:**
- Understand basic text classification
- Learn how Naive Bayes works for spam detection
- Practice data preprocessing and model evaluation

### 🔍 What is a Spam Classifier?
A spam classifier is a machine learning model trained to distinguish between spam and non-spam (ham) messages. It uses historical labeled messages to learn patterns and predict future ones.

In [None]:
# ✅ Step 1: Install dependencies (if running locally or on new Colab session)
!pip install scikit-learn pandas matplotlib

In [None]:
# ✅ Step 2: Import required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report

### 📥 Step 3: Create a Sample Dataset
This is a toy dataset to simulate SMS messages.

In [None]:
data = pd.DataFrame({
    'text': [
        'Win money now!!!',
        'Hi friend, how are you?',
        'Earn $1000 per day!!!',
        'Can we meet for coffee?',
        'You have won a prize!',
        'Let’s catch up tomorrow'
    ],
    'label': [1, 0, 1, 0, 1, 0]  # 1 = spam, 0 = not spam
})
data

In [None]:
# ✅ Step 4: Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(data['text'], data['label'], test_size=0.3, random_state=42)

In [None]:
# ✅ Step 5: Convert Text to Features
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

In [None]:
# ✅ Step 6: Train a Naive Bayes Classifier
model = MultinomialNB()
model.fit(X_train_vec, y_train)

In [None]:
# ✅ Step 7: Evaluate the Model
y_pred = model.predict(X_test_vec)
print(classification_report(y_test, y_pred))

### 🧪 Exercise: Add Your Own Examples
Try adding more spam or ham messages to the dataset and retrain the model.

### ✅ Summary:
- You built a basic spam classifier using Naive Bayes.
- You learned how to convert text to features using `CountVectorizer`.
- You evaluated your model using precision, recall, and F1-score.