# <center><u>EMAIL SPAM DETECTION WITH MACHINE LEARNING</u></center>

### We've all been the recipient of spam emails before. Spam mail, or junk mail, is a type of email that is sent to a massive number of users at one time, frequently containing cryptic messages, scams, or most dangerously, phishing content. In this Project, we will use Python to build an email spam detector. Then, we will use machine learning to train the spam detector to recognize and classify emails into spam and non-spam. Let's get started!

### Let's start by importing the necessary libraries:

In [2]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

### Next, we'll load the spam dataset. We can save the "spam.csv" file in the same directory as our Jupyter Notebook.

In [7]:
# Load the spam dataset with 'latin-1' encoding
data = pd.read_csv('spam.csv', encoding='latin-1')

# Split the data into features (email text) and labels (spam or not spam)
X = data['v2']
y = data['v1']


### Now, let's split the data into training and testing sets:

In [8]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### Next, we'll convert the email text into numerical features using the CountVectorizer:

In [9]:
# Convert email text into numerical features
vectorizer = CountVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)


### Now, we can train the Naive Bayes classifier using the training data:

In [10]:
# Train the Naive Bayes classifier
naive_bayes = MultinomialNB()
naive_bayes.fit(X_train_vectorized, y_train)


### After training the classifier, we can use it to make predictions on the testing data:

In [11]:
# Make predictions on the testing data
y_pred = naive_bayes.predict(X_test_vectorized)


### Finally, we can evaluate the accuracy of our model by comparing the predicted labels with the actual labels:

In [12]:
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 0.9838565022421525
