# EMAIL SPAM DETECTION WITH MACHINE LEARNING

We've all been the recipient of spam emails before. Spam mail, or junk mail, is a type of email that is sent to a massive number of users at one time, frequently containing cryptic messages, scams, or most dangerously, phishing content.

In this Project, we'll use Python to build an email spam detector. Then, we'll use machine learning to train the spam detector to recognize and classify emails into spam and non-spam.

In [103]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

dataset=pd.read_csv("spam.csv", encoding='ISO-8859-1')
dataset.drop(dataset.columns[[2,3,4]],axis=1,inplace=True)
dataset = dataset.drop_duplicates(keep='first')
np.random.seed(55)
dataset.head(10)

Unnamed: 0,v1,v2
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."
5,spam,FreeMsg Hey there darling it's been 3 week's n...
6,ham,Even my brother is not like to speak with me. ...
7,ham,As per your request 'Melle Melle (Oru Minnamin...
8,spam,WINNER!! As a valued network customer you have...
9,spam,Had your mobile 11 months or more? U R entitle...


In [104]:
x=dataset.v2
print(x)

0       Go until jurong point, crazy.. Available only ...
1                           Ok lar... Joking wif u oni...
2       Free entry in 2 a wkly comp to win FA Cup fina...
3       U dun say so early hor... U c already then say...
4       Nah I don't think he goes to usf, he lives aro...
                              ...                        
5567    This is the 2nd time we have tried 2 contact u...
5568                Will Ì_ b going to esplanade fr home?
5569    Pity, * was in mood for that. So...any other s...
5570    The guy did some bitching but I acted like i'd...
5571                           Rofl. Its true to its name
Name: v2, Length: 5169, dtype: object


In [105]:
y=dataset.v1
print(y)

0        ham
1        ham
2       spam
3        ham
4        ham
        ... 
5567    spam
5568     ham
5569     ham
5570     ham
5571     ham
Name: v1, Length: 5169, dtype: object


In [106]:
from sklearn.preprocessing import LabelEncoder
labelencoder_y=LabelEncoder()
y=labelencoder_y.fit_transform(y)
print(y)

[0 0 1 ... 0 0 0]


In [107]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

In [108]:
feat_vect=TfidfVectorizer(min_df=1,stop_words='english',lowercase=True)
TfidfVectorizer(stop_words='english')
x_train_vec =feat_vect.fit_transform(x_train)
x_test_vec =feat_vect.transform(x_test)
print(x_train)

3614    I enjoy watching and playing football and bask...
4289    For you information, IKEA is spelled with all ...
1729                   Lol yeah at this point I guess not
4388                            K I'm ready,  &lt;#&gt; ?
1385    That's ok. I popped in to ask bout something a...
                              ...                        
2098    No dice, art class 6 thru 9 :( thanks though. ...
990                                          26th OF JULY
4703                     Yar but they say got some error.
5111                          I've reached sch already...
4885                               Or just do that 6times
Name: v2, Length: 3618, dtype: object


In [109]:
from sklearn.linear_model import LogisticRegression
logmodel=LogisticRegression()
logmodel.fit(x_train_vec,y_train)

In [110]:
y_pred=logmodel.predict(x_test_vec)
y_pred

array([0, 0, 0, ..., 0, 0, 0])

In [111]:
from sklearn.metrics import confusion_matrix
results=confusion_matrix(y_test,y_pred)

In [112]:
from sklearn.metrics import accuracy_score 
from sklearn.metrics import classification_report 
print("Confusion Matrix")
print(results) 
print("Accuracy Score :")
print(accuracy_score(y_test,y_pred))
print("Report:")
print(classification_report(y_test,y_pred))

Confusion Matrix
[[1357    1]
 [  81  112]]
Accuracy Score :
0.9471308833010961
Report:
              precision    recall  f1-score   support

           0       0.94      1.00      0.97      1358
           1       0.99      0.58      0.73       193

    accuracy                           0.95      1551
   macro avg       0.97      0.79      0.85      1551
weighted avg       0.95      0.95      0.94      1551



#### Prakriti Mukherjee