# Email Spam Detection using Machine Learning

This notebook demonstrates an end-to-end **Email Spam Detection system** with detailed explanations.


## 1. Import Required Libraries

Libraries for data processing, evaluation, and visualization are imported below.


In [None]:
import pandas as pd
import pickle
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


## 2. Load Dataset

We load the email spam dataset from the data directory.


In [None]:
df = pd.read_csv("../data/spam.csv")
df.head()


## 3. Inspect Dataset

Inspect column names, data types, and label distribution.


In [None]:
print(df.columns)
print(df.info())
print(df['Category'].value_counts())


## 4. Preprocessing

Rename columns and encode labels numerically.


In [None]:
df = df.rename(columns={'Category': 'label', 'Message': 'text'})
df['label'] = df['label'].map({'ham': 0, 'spam': 1})
df.head()


## 5. Load Trained Model

Load the trained Naive Bayes model and TF-IDF vectorizer.


In [None]:
with open('../src/spam_model.pkl', 'rb') as f:
    model, vectorizer = pickle.load(f)


## 6. Prediction

Transform text using TF-IDF and generate predictions.


In [None]:
X = vectorizer.transform(df['text'])
y = df['label']
y_pred = model.predict(X)


## 7. Evaluation Metrics

Compute accuracy and classification metrics.


In [None]:
print('Accuracy:', accuracy_score(y, y_pred))
print(confusion_matrix(y, y_pred))
print(classification_report(y, y_pred, target_names=['Ham', 'Spam']))


## 8. Confusion Matrix Visualization

Visualize model performance using a confusion matrix.


In [None]:
cm = confusion_matrix(y, y_pred)
plt.figure()
plt.imshow(cm)
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.xticks([0,1], ['Ham','Spam'])
plt.yticks([0,1], ['Ham','Spam'])
for i in range(2):
    for j in range(2):
        plt.text(j, i, cm[i, j], ha='center', va='center')
plt.colorbar()
plt.show()


## 9. Real Email Test

Test the trained model on a custom email example.


In [None]:
email_text = '''\
Congratulations! You have been selected for a free prize.
Click the link below to claim immediately.
'''
prediction = model.predict(vectorizer.transform([email_text]))[0]
print('SPAM' if prediction == 1 else 'NOT SPAM')
