# Problem Statement:

Weve all been the recipient of spam emails before. Spam mail, or junk mail, is a type of email that is sent to a massive number of users at one time, frequently containing cryptic messages, scams, or most dangerously, phishing content.
In this Project, use Python to build an email spam detector. Then, use machine learning to train the spam detector to recognize and classify emails into spam and non-spam. Let's get started!

# Import necessary libraries

In [1]:
import pandas as pd # Data processing
import numpy as np # Scientific computing
from sklearn.feature_extraction.text import CountVectorizer # Feature extraction from text data
from sklearn.naive_bayes import MultinomialNB # Naive Bayes classifier for multinomial models

# Load Data

In [2]:
data = pd.read_csv("spam.csv", encoding="latin-1") # Read the data 
data.drop(["Unnamed: 2", "Unnamed: 3", "Unnamed: 4"], axis=1, inplace=True) # Drop the unnecessary columns

# Rename columns

In [3]:
data.rename(columns={"v1":"label", "v2":"text"}, inplace=True) # Renaming the columns

# Convert label column to binary

In [4]:
data["label"] = np.where(data["label"]=="spam", 1, 0) # Replacing the labels with 0 and 1 for easy computation

# Create count vectorizer

In [5]:
cv = CountVectorizer(stop_words="english") # Initializing the CountVectorizer with stop words in English

# Transform text column into numerical data

In [6]:
X = cv.fit_transform(data["text"]) # Extracting the features from the text data and storing it in X

# Create target variable

In [7]:
y = data["label"]# train test split of the data
from sklearn.model_selection import train_test_split # Importing the train_test_split function
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Splitting the data into train and test sets

# Train model

In [8]:
model = MultinomialNB() # Initializing the Multinomial Naive Bayes model 
model.fit(X_train, y_train) # Fitting the model on the training data

# Predict on test set

In [9]:
y_pred = model.predict(X_test) # Predicting the labels on the test data

# Calculate accuracy

In [10]:
from sklearn.metrics import accuracy_score # Importing the accuracy_score function 
accuracy = accuracy_score(y_test, y_pred) # Calculating the accuracy of the model
print("Accuracy of the model is:",accuracy*100,"%") # Printing the accuracy of the model in percentage

Accuracy of the model is: 97.72727272727273 %
