# Spam Comments Detection...

Detecting spam comments is the task of text classification in Machine Learning. Spam comments on social media platforms are the type of comments posted to redirect the user to another social media account, website or any piece of content.


To detect spam comments with Machine Learning, we need labelled data of spam comments.

##### Let’s start this task by importing the necessary Python libraries and the dataset:

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.feature_extraction.text import CountVectorizer

In [2]:
df = pd.read_csv('Youtube_Psy.csv',sep = ',')
df[10:25]

Unnamed: 0,COMMENT_ID,AUTHOR,DATE,CONTENT,CLASS
10,z13auhww3oufjn1qo04ci3grqqjmfjexxuo0k,Huckyduck,2013-11-28T17:06:17,Hey subscribe to me﻿,1
11,z13xit5agm2zyh4f523rst2gowmbx5bml,Lone Twistt,2013-11-28T17:34:55,Once you have started reading do not stop. If...,1
12,z13pejoiuozwxtdu323dspopnri4xts0f,Archie Lewis,2013-11-28T17:54:39,https://twitter.com/GBphotographyGB﻿,1
13,z121zxaxsq25z5k5o04ch1o5jqqfij3gtm40k,TheUploadaddict,2013-11-28T18:12:12,subscribe like comment﻿,1
14,z12oglnpoq3gjh4om04cfdlbgp2uepyytpw0k,Francisco Nora,2013-11-28T19:52:35,please like :D https://premium.easypromosapp.c...,1
15,z13phrmwrkfisn5er22eyrbpbvaiwfvwf04,Gaming and Stuff PRO,2013-11-28T21:14:13,"Hello! Do you like gaming, art videos, scienti...",1
16,z13bgdvyluihfv11i22rgxwhuvabzz1os04,Zielimeek21,2013-11-28T21:49:00,I'm only checking the views﻿,0
17,z13vxpnoxsyeuv2jr04cctprprb1slnxdf4,OutrightIgnite,2013-11-28T21:55:02,http://www.ebay.com/itm/171183229277?ssPageNam...,1
18,z12qth5j0ob1fx3q404chvy4fz32tbkpllk0k,Tony K Frazier,2013-11-28T23:57:13,http://ubuntuone.com/40beUutVu2ZKxK4uTgPZ8K﻿,1
19,z13etj0bclzfztuwc04cgfvrgmf3fvjor1g,Jose Renteria,2013-11-29T00:22:01,We are an EDM apparel company dedicated to bri...,1


In [3]:
df.shape

(350, 5)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 350 entries, 0 to 349
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   COMMENT_ID  350 non-null    object
 1   AUTHOR      350 non-null    object
 2   DATE        350 non-null    object
 3   CONTENT     350 non-null    object
 4   CLASS       350 non-null    int64 
dtypes: int64(1), object(4)
memory usage: 13.8+ KB


In [5]:
df.isna().sum()

COMMENT_ID    0
AUTHOR        0
DATE          0
CONTENT       0
CLASS         0
dtype: int64

##### We only need the content and class column from the dataset for the rest of the task. So let’s select both the columns and move further:

In [6]:
data = df[['CONTENT','CLASS']]
data.sample(10)

Unnamed: 0,CONTENT,CLASS
184,OPPA &lt;3﻿,0
116,Made in china....﻿,0
265,"9 year olds be like, 'How does this have 2 bil...",0
205,2.126.521.750 views!!!!!!!!!!!!!!!!!﻿,0
298,https://www.facebook.com/SchoolGeniusNITS/phot...,1
140,http://www.gcmforex.com/partners/aw.aspx?Task=...,1
198,OMG over 2 billion views!﻿,0
285,"If I knew Korean, this would be even funnier. ...",0
222,"Is this the video that started the whole ""got ...",0
19,We are an EDM apparel company dedicated to bri...,1


##### The class column contains values 0 and 1. 0 indicates not spam, and 1 indicates spam. So to make it look better, I will use spam and not spam labels instead of 1 and 0:

In [7]:
data["CLASS"] = data["CLASS"].map({0: "Not Spam",
                                   1: "Spam Comment"})
print(data.sample(5))

                                               CONTENT         CLASS
49   thumbs up if u checked this video to see hw vi...      Not Spam
91   There is one video on my channel about my brot...  Spam Comment
6                            Subscribe to my channel ﻿  Spam Comment
114  Hey guys please check out my new Google+ page ...  Spam Comment
212                                Still the best. :D﻿      Not Spam


##### Training a Classification Model

##### Now let’s move further by training a classification Machine Learning model to classify spam and not spam comments. As this problem is a problem of binary classification, I will use the Bernoulli Naive Bayes algorithm to train the model:



In [8]:
x = np.array(data['CONTENT'])
y = np.array(data['CLASS'])

cv = CountVectorizer()
x = cv.fit_transform(x)

In [9]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.2,random_state = 42)

In [10]:
model = BernoulliNB()
model.fit(x_train,y_train)
print(model.score(x_train,y_train))

0.9785714285714285


##### Now let’s test the model by giving spam and not spam comments as input:

In [11]:
sample = "Check this out: https://thecleverprogrammer.com/"
data = cv.transform([sample]).toarray()
print(model.predict(data))

['Spam Comment']


In [12]:
sample = 'http://woobox.com/33gxrf/brt0u5 FREE CS GO!!!!﻿'
data = cv.transform([sample]).toarray()
print(model.predict(data))

['Spam Comment']


In [13]:
sample = 'just checking the views﻿'
data = cv.transform([sample]).toarray()
print(model.predict(data))

['Not Spam']


#### So this is how you can train a Machine Learning model for the task of spam detection using Python.

##### Summary
Spam comments detection means classifying comments as spam or not spam. Spam comments on social media platforms are the type of comments posted to redirect the user to another social media account, website or any piece of content.