<a href="https://colab.research.google.com/github/qandeelfatima55/AI-ML-Internship-Tasks/blob/main/task1_spam_classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task 1: Spam Detection using Machine Learning
In this notebook, we will build a text classification model that classifies SMS messages as **Spam** or **Not Spam**.

In [None]:

# Import required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load dataset
url = "https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv"
df = pd.read_table(url, header=None, names=['label', 'message'])

# Display first 5 rows
df.head()

Unnamed: 0,label,message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


### Preprocessing the data

In [None]:

# Convert labels to binary (spam=1, ham=0)
df['label'] = df['label'].map({'ham':0, 'spam':1})

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(df['message'], df['label'], test_size=0.2, random_state=42)

# Convert text to numerical features using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

### Training the Classifier

In [None]:

# Train a Naive Bayes classifier
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)

# Predictions
y_pred = model.predict(X_test_tfidf)

### Evaluating the Model

In [None]:

# Evaluate performance using different metrics
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))

Accuracy: 0.97847533632287
Precision: 1.0
Recall: 0.8389261744966443
F1 Score: 0.9124087591240876


### Testing with User Input

In [None]:

# Function to test custom messages
def classify_message(msg):
    msg_tfidf = vectorizer.transform([msg])
    pred = model.predict(msg_tfidf)[0]
    return "Spam" if pred == 1 else "Not Spam"

# Example test
print(classify_message("Congratulations! You have won $1000."))
print(classify_message("Let's meet for lunch tomorrow."))

Spam
Not Spam
