<a href="https://colab.research.google.com/github/rcdbe/sma-online/blob/master/day-2/python/Detecting_bot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Bot Detection

Machine Learning Identifies Malicious Behavior. As each device roams your website, Imperva Bot Management collects and analyzes data about its behavior.

Dataset Source: https://github.com/RohanBhirangi/Twitter-Bot-Detection

**Install and Import Libraries**

In [0]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn.metrics as metrics
import warnings
warnings.filterwarnings('ignore')

## a.Modeling Bot Detector

**Import Dataset**

In [0]:
# Import Dataset
df_bot = pd.read_csv('https://raw.githubusercontent.com/rcdbe/sma-online/master/day-2/python/data/data_bot.csv', sep = ';')
df_bot.head()

**Set Feature and Target**

In [0]:
# Select the Required Column
df_bot_feature = df_bot[['followers_count',
                         'friends_count',
                         'listed_count',
                         'favourites_count',
                         'statuses_count',
                         'verified']]
df_bot_feature.head()

In [0]:
# Encode feature
df_bot_feature["verified"] = df_bot_feature["verified"].astype('category')
df_bot_feature["verified"] = df_bot_feature["verified"].cat.codes
df_bot_feature

In [0]:
# Set Target
target = df_bot['bot']
target

**Set Training dan Testing Data**

In [0]:
# Set Training and Testing Data (70:30)
from sklearn.model_selection import train_test_split, cross_val_score
feature_train, feature_test, target_train, target_test = train_test_split(df_bot_feature , target, shuffle = True, test_size=0.3, random_state=1)

# Show the Training and Testing Data
print(feature_train.shape)
print(feature_test.shape)
print(target_train.shape)
print(target_test.shape)

**Contruct Decision Tree Classifier**

In [0]:
# Import library
from sklearn import tree

# Train Decision Tree
dtree = tree.DecisionTreeClassifier(min_impurity_decrease=0.01)
dtree.fit(feature_train, target_train)

In [0]:
# Visualize Tree

from sklearn.externals.six import StringIO  
from IPython.display import Image  
from sklearn.tree import export_graphviz
import pydotplus
dot_data = StringIO()
export_graphviz(dtree, out_file=dot_data,  
                filled=True, rounded=True,
                special_characters=True,
                class_names=['nonbot','bot'],
                feature_names=['folowers', 'friends', 'listed', 'favourites', 'statuses', 'verified'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
Image(graph.create_png())

In [0]:
# Predict to Test Data 
target_predicted = dtree.predict(feature_test)
target_predicted

**Search for the Tree Accuration**

In [0]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

results = confusion_matrix(target_test, target_predicted)
print ('Confusion Matrix :')
print (results)
print ('Accuracy Score :',accuracy_score(target_test, target_predicted))
print ('Classification Report : ')
print (classification_report(target_test, target_predicted))

In [0]:
# Import Visualization Package
plt.rcParams['figure.figsize'] = (10, 10)
plt.style.use('ggplot')

# Visualize ROC Curve
target_predicted_prob = dtree.predict_proba(feature_test)[::,1]
fp_rate, tp_rate, _ = metrics.roc_curve(target_test,  target_predicted_prob)
auc = metrics.roc_auc_score(target_test, target_predicted_prob)
plt.plot(fp_rate, tp_rate, label="Decision Tree, auc="+str(auc))
plt.xlabel('false positive rate') 
plt.ylabel('true positive rate')
plt.legend(loc=4)
plt.show()

## b.Detecting New Data

**Define Model**

In [0]:
X = feature_train.values
Y = target_train.values
predbot = dtree.fit(X, Y)
predbot

**Import New Dataset with Unknown Class**

In [0]:
# Import New Dataset
df_new = pd.read_csv('https://raw.githubusercontent.com/rcdbe/sma-online/master/day-2/python/data/new_data.csv', sep =';', encoding="utf-8")
df_new.head()

In [0]:
# Select Required Columns
df_new_feature = df_new[['followers_count',
                          'friends_count',
                          'listed_count',
                          'favourites_count',
                          'statuses_count',
                          'verified']]
df_new_feature.head()

In [0]:
df_new_feature["verified"] = df_new_feature["verified"].astype('category')
df_new_feature["verified"] = df_new_feature["verified"].cat.codes
df_new_feature

**Predict New Data**

In [0]:
# Predict New Data
df_new_prediction = pd.DataFrame(dtree.predict(df_new_feature), columns = ['predicted_bot'])
df_new_prediction.reset_index()
df_new_prediction.head()

**Show Prediction Result**

In [0]:
# Show Prediction Result
pred_result = pd.concat([df_new, df_new_prediction], axis=1)
pred_result.head()

In [0]:
# Save Prediction Result
pred_result.to_csv('bot_prediction.csv', index=False)