# ***Email Spam Prediction Model - User Interactive***


*   We receive emails that are not advantageous to us and can be misleading and dangerous; We have no idea what damage is lurking behind them. This project assists us in avoiding potentially hazardous emails by screening them.

---





***Importing Libraries***

---



In [48]:
import numpy as pd
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

***Importing Dataset***

---



In [49]:
mail = pd.read_csv('spam.csv', encoding='ISO-8859-1')

In [50]:
print(mail)

        v1                                                 v2 Unnamed: 2  \
0      ham  Go until jurong point, crazy.. Available only ...        NaN   
1      ham                      Ok lar... Joking wif u oni...        NaN   
2     spam  Free entry in 2 a wkly comp to win FA Cup fina...        NaN   
3      ham  U dun say so early hor... U c already then say...        NaN   
4      ham  Nah I don't think he goes to usf, he lives aro...        NaN   
...    ...                                                ...        ...   
5567  spam  This is the 2nd time we have tried 2 contact u...        NaN   
5568   ham              Will Ì_ b going to esplanade fr home?        NaN   
5569   ham  Pity, * was in mood for that. So...any other s...        NaN   
5570   ham  The guy did some bitching but I acted like i'd...        NaN   
5571   ham                         Rofl. Its true to its name        NaN   

     Unnamed: 3 Unnamed: 4  
0           NaN        NaN  
1           NaN        NaN  


In [51]:
df = pd.DataFrame(mail)
df

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,
...,...,...,...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...,,,
5568,ham,Will Ì_ b going to esplanade fr home?,,,
5569,ham,"Pity, * was in mood for that. So...any other s...",,,
5570,ham,The guy did some bitching but I acted like i'd...,,,


In [52]:
list(df.columns)

['v1', 'v2', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4']

***Dropping the Useless Columns***

---



In [53]:
df = df.drop(['Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4'], axis =1 )

***Renaming the Columns***

---



In [54]:
df= df.rename(columns={'v1': 'Category', 'v2': 'Message'})

In [55]:
df

Unnamed: 0,Category,Message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."
...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...
5568,ham,Will Ì_ b going to esplanade fr home?
5569,ham,"Pity, * was in mood for that. So...any other s..."
5570,ham,The guy did some bitching but I acted like i'd...


***Removing the Null Values from the Dataset***

---



In [56]:
mail_df = df.where((pd.notnull(df)),'')

In [57]:
mail_df

Unnamed: 0,Category,Message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."
...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...
5568,ham,Will Ì_ b going to esplanade fr home?
5569,ham,"Pity, * was in mood for that. So...any other s..."
5570,ham,The guy did some bitching but I acted like i'd...


In [58]:
mail_df.head()

Unnamed: 0,Category,Message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


***Labels: ***
*   0 is for Spam Emails
*   1 is for Ham Emails


---




In [59]:
mail_df.loc[mail_df['Category'] == 'spam', 'Category'] = 0
mail_df.loc[mail_df['Category'] == 'ham', 'Category'] = 1

In [60]:
mail_df

Unnamed: 0,Category,Message
0,1,"Go until jurong point, crazy.. Available only ..."
1,1,Ok lar... Joking wif u oni...
2,0,Free entry in 2 a wkly comp to win FA Cup fina...
3,1,U dun say so early hor... U c already then say...
4,1,"Nah I don't think he goes to usf, he lives aro..."
...,...,...
5567,0,This is the 2nd time we have tried 2 contact u...
5568,1,Will Ì_ b going to esplanade fr home?
5569,1,"Pity, * was in mood for that. So...any other s..."
5570,1,The guy did some bitching but I acted like i'd...


***Separating data as Texts and Labels***

---



In [61]:
x = mail_df['Message']
y = mail_df['Category']

In [62]:
x

0       Go until jurong point, crazy.. Available only ...
1                           Ok lar... Joking wif u oni...
2       Free entry in 2 a wkly comp to win FA Cup fina...
3       U dun say so early hor... U c already then say...
4       Nah I don't think he goes to usf, he lives aro...
                              ...                        
5567    This is the 2nd time we have tried 2 contact u...
5568                Will Ì_ b going to esplanade fr home?
5569    Pity, * was in mood for that. So...any other s...
5570    The guy did some bitching but I acted like i'd...
5571                           Rofl. Its true to its name
Name: Message, Length: 5572, dtype: object

In [63]:
y

0       1
1       1
2       0
3       1
4       1
       ..
5567    0
5568    1
5569    1
5570    1
5571    1
Name: Category, Length: 5572, dtype: object

***Splitting the Dataset into two Halves: Training and Test Data (80% and 20%)***

---



In [64]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state=3)

In [65]:
print(x.shape)
print(x_test.shape)
print(x_train.shape)

(5572,)
(1115,)
(4457,)


***Feature Extraction***

---



In [66]:
x_train

3075    Mum, hope you are having a great day. Hoping t...
1787                           Yes:)sura in sun tv.:)lol.
1614    Me sef dey laugh you. Meanwhile how's my darli...
4304                Yo come over carlos will be here soon
3266                    Ok then i come n pick u at engin?
                              ...                        
789                          Gud mrng dear hav a nice day
968             Are you willing to go for aptitude class.
1667    So now my dad is gonna call after he gets out ...
3321    Ok darlin i supose it was ok i just worry too ...
1688                     Nan sonathaya soladha. Why boss?
Name: Message, Length: 4457, dtype: object

In [67]:
x_test

2632                       I WILL CAL YOU SIR. In meeting
454     Loan for any purpose å£500 - å£75,000. Homeown...
983     LOOK AT THE FUCKIN TIME. WHAT THE FUCK YOU THI...
1282    Ever green quote ever told by Jerry in cartoon...
4610                                  Wat time Ì_ finish?
                              ...                        
4827    Lol no. Just trying to make your day a little ...
5291      Xy trying smth now. U eat already? We havent...
3325    Huh so fast... Dat means u havent finished pai...
3561    Still chance there. If you search hard you wil...
1136    Dont forget you can place as many FREE Request...
Name: Message, Length: 1115, dtype: object

In [68]:
feature_extraction = TfidfVectorizer(min_df = 1, stop_words = 'english', lowercase =True)
x_train_features = feature_extraction.fit_transform(x_train)
x_test_features = feature_extraction.transform(x_test)
y_train = y_train.astype('int')
y_test = y_test.astype('int')

***Training the Logistic Regression Model***

---



In [69]:
model = LogisticRegression()
model.fit(x_train_features, y_train)

***Evaluating the Training Model***

---



In [70]:
prediction = model.predict(x_train_features)
accuracy = accuracy_score(y_train, prediction)
print('Accuracy on training data : ', accuracy)

Accuracy on training data :  0.9661207089970832


In [71]:
prediction = model.predict(x_test_features)
accuracy = accuracy_score(y_test, prediction)
print('Accuracy on testing data : ', accuracy)

Accuracy on testing data :  0.9623318385650225


***Building the Predictive System***

---



In [78]:
input_mail = ["Help  |  Privacy  |  Reset password  |  Download app We sent this email to @Pri6369. UnsubscribeTwitter International CompanyOne Cumberland Place, Fenian Street Dublin 2, D02 AX07  IRELAND"]

In [79]:
input_mail_features = feature_extraction.transform(input_mail)
prediction_new =model.predict(input_mail_features)
print(prediction_new)

[1]


In [80]:
if prediction_new[0]==1:
    print("Hurrayy!! It is a Ham Email")

else:
    print("Oops!! It is a Spam Email")

Hurrayy!! It is a Ham Email
