## 📘 **Lesson Plan: Text Classification Using Email Spam Dataset**

---

### 🔹 **1. Problem Statement**

Classify email messages as **Spam** or **Not Spam (Ham)** using machine learning and deep learning techniques.
We will use:

* **CountVectorizer** and **TF-IDF** for feature extraction
* ML models: **Naive Bayes**, **Logistic Regression**, **SVM**
* **ANN** using Keras

---

### 📦 **Dataset Used**

* Email spam dataset with two columns:

  * `label`: 'spam' or 'ham'
  * `text`: email content

> Example dataset: [Kaggle Email Spam Dataset](https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset)

---

## ✅ **Steps for All Approaches**

### **Step 1: Load & Preprocess Data**


In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

df = pd.read_csv("class_11_spam.csv", encoding='latin-1')[['v1', 'v2']]
df.columns = ['label', 'text']

# Encode label
le = LabelEncoder()
df['label'] = le.fit_transform(df['label'])  # ham = 0, spam = 1

# Split data
X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42)

In [3]:

## 🔹 **2. CountVectorizer + ML Models**

### Vectorization

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
X_train_cv = vectorizer.fit_transform(X_train)
X_test_cv = vectorizer.transform(X_test)



In [4]:
### Classification

from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

model = MultinomialNB()
model.fit(X_train_cv, y_train)
y_pred = model.predict(X_test_cv)

print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))


Accuracy: 0.9838565022421525
              precision    recall  f1-score   support

           0       0.98      1.00      0.99       965
           1       0.99      0.89      0.94       150

    accuracy                           0.98      1115
   macro avg       0.98      0.95      0.96      1115
weighted avg       0.98      0.98      0.98      1115



In [5]:

# 📌 **Try with Logistic Regression and SVM as well.**


## 🔹 **3. TF-IDF + ML Models**

### TF-IDF Vectorization


from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer()
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)

### Classification (same as above)

model = MultinomialNB()
model.fit(X_train_tfidf, y_train)
y_pred = model.predict(X_test_tfidf)

print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.9623318385650225


In [6]:


## 🔹 **4. Text Classification using ANN (Keras)**

### Preprocess using TF-IDF and Convert to Array


import numpy as np
X_train_arr = X_train_tfidf.toarray()
X_test_arr = X_test_tfidf.toarray()



In [7]:
### Define ANN Model

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential()
model.add(Dense(128, input_dim=X_train_arr.shape[1], activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train_arr, y_train, epochs=5, batch_size=64, validation_data=(X_test_arr, y_test))


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m70/70[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 39ms/step - accuracy: 0.8342 - loss: 0.5842 - val_accuracy: 0.8987 - val_loss: 0.2784
Epoch 2/5
[1m70/70[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 26ms/step - accuracy: 0.9269 - loss: 0.2271 - val_accuracy: 0.9722 - val_loss: 0.1204
Epoch 3/5
[1m70/70[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 29ms/step - accuracy: 0.9823 - loss: 0.0837 - val_accuracy: 0.9794 - val_loss: 0.0793
Epoch 4/5
[1m70/70[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 32ms/step - accuracy: 0.9938 - loss: 0.0448 - val_accuracy: 0.9785 - val_loss: 0.0677
Epoch 5/5
[1m70/70[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 32ms/step - accuracy: 0.9969 - loss: 0.0247 - val_accuracy: 0.9803 - val_loss: 0.0629


<keras.src.callbacks.history.History at 0x212383619a0>

## Step 2: Predict with Sample Input

In [None]:
# Sample email text
sample_email = ["Congratulations! You've won a free iPhone. Click the link to claim now!"]

# Vectorize using the same TF-IDF vectorizer
sample_tfidf = tfidf.transform(sample_email).toarray()

# Predict
prediction = model.predict(sample_tfidf)

# Interpret result
label = "Spam" if prediction[0][0] >= 0.5 else "Ham"
print(f"Prediction: {label} ({prediction[0][0]:.4f})")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 180ms/step
Prediction: Spam (0.9854)


In [9]:
# Try with Multiple Samples
sample_emails = [
    "Congratulations! You have won a lottery worth $1,000,000.",
    "Hey, are we still meeting for lunch today?",
    "Limited-time offer just for you! Get 90% off on your next purchase.",
    "Please find the attached report for the project."
]

sample_tfidf = tfidf.transform(sample_emails).toarray()
predictions = model.predict(sample_tfidf)

for i, pred in enumerate(predictions):
    label = "Spam" if pred[0] >= 0.5 else "Ham"
    print(f"Email: {sample_emails[i]}\nPrediction: {label} ({pred[0]:.4f})\n")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 212ms/step
Email: Congratulations! You have won a lottery worth $1,000,000.
Prediction: Spam (0.9313)

Email: Hey, are we still meeting for lunch today?
Prediction: Ham (0.0016)

Email: Limited-time offer just for you! Get 90% off on your next purchase.
Prediction: Ham (0.1808)

Email: Please find the attached report for the project.
Prediction: Ham (0.1389)



### ✅ How to Save and Prepare the Model

In [None]:
# Save model
model.save("ann_spam_model.h5")

# Save TF-IDF vectorizer
import pickle
with open("tfidf_vectorizer.pkl", "wb") as f:
    pickle.dump(tfidf, f)


### ✅ Streamlit App: spam_detector_app.py

✅ How to Run Streamlit App

streamlit run spam_detector_app.py

In [None]:
import streamlit as st
import pandas as pd
import numpy as np
import pickle
from tensorflow.keras.models import load_model
from sklearn.feature_extraction.text import TfidfVectorizer

# Load trained model
model = load_model("ann_spam_model.h5")

# Load saved TF-IDF vectorizer
with open("tfidf_vectorizer.pkl", "rb") as f:
    tfidf = pickle.load(f)

# Streamlit UI
st.title("📧 Email Spam Classifier")
st.write("Enter an email below to check if it's **Spam** or **Ham**")

email_text = st.text_area("✉️ Email Content")

if st.button("Predict"):
    if email_text.strip() == "":
        st.warning("Please enter some email text.")
    else:
        # Transform text
        email_vector = tfidf.transform([email_text]).toarray()
        
        # Predict
        prediction = model.predict(email_vector)
        label = "🛑 Spam" if prediction[0][0] >= 0.5 else "✅ Ham"
        confidence = prediction[0][0] if label == "🛑 Spam" else 1 - prediction[0][0]
        
        st.subheader("📊 Prediction Result")
        st.write(f"**Prediction:** {label}")
        st.write(f"**Confidence:** {confidence:.2%}")


## Flask API: app.py

In [None]:
# ✅ 2. Save Model and Vectorizer
# After training in your main notebook:

model.save("ann_spam_model.h5")

import pickle
with open("tfidf_vectorizer.pkl", "wb") as f:
    pickle.dump(tfidf, f)

In [None]:
from flask import Flask, request, jsonify
import numpy as np
import pickle
from tensorflow.keras.models import load_model
from sklearn.feature_extraction.text import TfidfVectorizer

app = Flask(__name__)

# Load model and vectorizer
model = load_model('ann_spam_model.h5')

with open('tfidf_vectorizer.pkl', 'rb') as f:
    tfidf = pickle.load(f)

@app.route('/', methods=['GET'])
def index():
    return jsonify({'message': 'Email Spam Detection API is running!'})

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    
    if 'email' not in data:
        return jsonify({'error': 'No email content provided'}), 400

    email_text = data['email']
    
    # Preprocess and predict
    email_vector = tfidf.transform([email_text]).toarray()
    prediction = model.predict(email_vector)
    
    label = 'Spam' if prediction[0][0] >= 0.5 else 'Ham'
    confidence = float(prediction[0][0]) if label == "Spam" else float(1 - prediction[0][0])

    return jsonify({
        'prediction': label,
        'confidence': round(confidence, 4)
    })

if __name__ == '__main__':
    app.run(debug=True)


In [None]:
# ✅ 3. Run the Flask API

# pip install flask tensorflow scikit-learn
# python app.py


## ✅ **4. Test the API (Using curl or Postman)**

### Example `POST` request using **curl**:

```bash
curl -X POST http://127.0.0.1:5000/predict \
     -H "Content-Type: application/json" \
     -d '{"email":"Congratulations! You have won a free ticket. Click here!"}'
```

### Sample Response:

```json
{
  "prediction": "Spam",
  "confidence": 0.9813
}


Great! Let's deploy your **Flask API** for Email Spam Detection to a free cloud platform.

---

## ✅ Recommended Hosting Platforms

| Platform                | Free Tier | Supports Flask            | Custom Domain        | Setup Effort |
| ----------------------- | --------- | ------------------------- | -------------------- | ------------ |
| **Render**              | ✅ Yes     | ✅ Yes                     | ✅                    | ⭐⭐ Easy      |
| **Railway**             | ✅ Yes     | ✅ Yes                     | ✅                    | ⭐⭐ Easy      |
| **Hugging Face Spaces** | ✅ Yes     | ✅ (via Gradio or FastAPI) | ❌ (no custom domain) | ⭐ Easy       |

---

## ✅ Let’s Use **Render** (Free & Flask-Friendly)



---

### 🔹 **Step-by-Step Deployment on Render**

---

### **1. Prepare Folder Structure**

```
email-spam-api/
├── app.py
├── requirements.txt
├── ann_spam_model.h5
├── tfidf_vectorizer.pkl
```

---

### **2. `requirements.txt`**

```txt
Flask
tensorflow
scikit-learn
numpy
```

---

### **3. Create a GitHub Repo**

1. Create a GitHub repository: e.g., `email-spam-api`
2. Push your files (`app.py`, `requirements.txt`, `.h5`, `.pkl`) to GitHub

---

### **4. Deploy on Render**

1. Visit [https://render.com](https://render.com)
2. Sign in → Click **“New +”** → **“Web Service”**
3. Connect your GitHub → Select your repo
4. Fill out:

   * **Name**: `email-spam-api`
   * **Runtime**: Python
   * **Build Command**: `pip install -r requirements.txt`
   * **Start Command**: `python app.py`
5. Click **Deploy**

---

### 🔁 Modify `app.py` for Production on Render

Replace the last line with:

```python
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)
```

---

### ✅ After Deployment

* You’ll get a public URL like:
  `https://email-spam-api.onrender.com/predict`

---

### 📦 Example `POST` Request

```bash
curl -X POST https://email-spam-api.onrender.com/predict \
     -H "Content-Type: application/json" \
     -d '{"email":"You won a lottery! Click to claim"}'
```

---

## 🛠️ Want Me to:

* **Package and zip the code for Render**?
* **Give you a `Gradio` version for Hugging Face Spaces** instead?
* Or **set it up for FastAPI deployment**?

Let me know your preferred path and I’ll deliver it fully ready-to-deploy.


Perfect! Here's a step-by-step guide to **deploy your Flask Email Spam Detection API on Railway** — one of the easiest free hosting platforms for Python apps.

---

## 🚀 **Deploy Flask API to Railway**

---

### ✅ **Step 1: Prepare Your Project Folder**

Folder name: `email-spam-api`

```
email-spam-api/
├── app.py
├── requirements.txt
├── ann_spam_model.h5
├── tfidf_vectorizer.pkl
```

---

### ✅ **Step 2: Modify `app.py` for Production**

Update the `app.py` bottom section for Railway:

```python
import os

if __name__ == '__main__':
    port = int(os.environ.get("PORT", 5000))
    app.run(host='0.0.0.0', port=port)
```

---

### ✅ **Step 3: Create `requirements.txt`**

```txt
Flask
tensorflow
scikit-learn
numpy
```

> Add `gunicorn` if Railway asks for production server (usually Flask dev server works fine for small apps).

---

### ✅ **Step 4: Push Code to GitHub**

If you don’t have Git setup:

```bash
git init
git add .
git commit -m "Initial commit"
gh repo create email-spam-api --public --source=. --remote=origin
git push -u origin main
```

Make sure your repo contains all 4 files: `app.py`, `requirements.txt`, `.h5`, `.pkl`.

---

### ✅ **Step 5: Deploy to Railway**

1. Go to [https://railway.app](https://railway.app)
2. Login with GitHub
3. Click **“New Project” → “Deploy from GitHub Repo”**
4. Select your `email-spam-api` repo
5. Railway will auto-detect and install dependencies
6. Once deployed, you’ll get a public URL like:

```
https://email-spam-api.up.railway.app
```

✅ Try `POST /predict`:

```bash
curl -X POST https://email-spam-api.up.railway.app/predict \
     -H "Content-Type: application/json" \
     -d '{"email":"You have won a free iPhone. Click here to claim."}'
```

---

## 📦 Want Me to Package It?

I can give you:

* ✅ A downloadable `.zip` of the full project (Flask, model, vectorizer)
* ✅ GitHub-ready project folder
* ✅ `README.md` for easy Railway deployment

Would you like the `.zip` download or GitHub repo version?
