# Multi Layer Perceptron Classifier

In [1]:
import pandas as pd
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot as plt
import seaborn as sb

import warnings
from collections import Counter
import datetime


from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix

def accuracy(confusion_matrix):
   diagonal_sum = confusion_matrix.trace()
   sum_of_all_elements = confusion_matrix.sum()
   return diagonal_sum / sum_of_all_elements

# Importing cleaned dataset with relevant features

This new dataset only include 2020 onwards data with features such as the video category, views, likes, comments

In [2]:
df = pd.read_csv("Dataset.csv")

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21505 entries, 0 to 21504
Data columns (total 14 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   title                      21505 non-null  object
 1   publishedAt                21505 non-null  object
 2   channelTitle               21505 non-null  object
 3   categoryId                 21505 non-null  int64 
 4   trending_date              21505 non-null  object
 5   tags                       21505 non-null  object
 6   view_count                 21505 non-null  int64 
 7   likes                      21505 non-null  int64 
 8   dislikes                   21505 non-null  int64 
 9   comment_count              21505 non-null  int64 
 10  comments_disabled          21505 non-null  bool  
 11  ratings_disabled           21505 non-null  bool  
 12  description                20801 non-null  object
 13  Number_of_days_it_trended  21505 non-null  int64 
dtypes: boo

In [4]:
dataset = df[['categoryId','view_count', 'likes', 'comment_count']]
target = df[['Number_of_days_it_trended']]

Splitting the prepared data to train and test data randomly at 3:1 ratio

In [5]:
classifier = MLPClassifier(hidden_layer_sizes=(150, 100, 50), max_iter=300, activation='relu', solver='adam', random_state=1)
dataset_train, dataset_test, target_train, target_test = train_test_split(dataset, target, test_size=0.25)
classifier.fit(dataset_train, target_train)
target_test_predicted = classifier.predict(dataset_test)
cm = confusion_matrix(target_test_predicted, target_test)

#Printing the accuracy
print("Accuracy of MLPClassifier : ",round(accuracy(cm)*100, 1), '\b% (rounded to 1 decimal place)')

  return f(*args, **kwargs)


Accuracy of MLPClassifier :  25.1 % (rounded to 1 decimal place)


We observe that the accuracy of this machine learning with this dataset is considerably low at 25%. This may be because our target spread is large where the trending days range from 1 to 47. Hence we will attempt to reduce the spread by further cleaning our dataset to include only videos that trended for 10 days and below.

In [6]:
df_15days = df[df['Number_of_days_it_trended'] < 15]
dataset_15days = df_15days[['categoryId','view_count', 'likes', 'comment_count']]
target_15days = df_15days[['Number_of_days_it_trended']]

In [7]:
classifier = MLPClassifier(hidden_layer_sizes=(150, 100, 50), max_iter=300, activation='relu', solver='adam', random_state=1)
dataset_train, dataset_test, target_train, target_test = train_test_split(dataset_15days, target_15days, test_size=0.25)
classifier.fit(dataset_train, target_train)
target_test_predicted = classifier.predict(dataset_test)
cm = confusion_matrix(target_test_predicted, target_test)

#Printing the accuracy
print("Accuracy of MLPClassifier : ",round(accuracy(cm)*100, 1), '\b% (rounded to 1 decimal place)')

  return f(*args, **kwargs)


Accuracy of MLPClassifier :  21.9 % (rounded to 1 decimal place)


Even though we reduced the target spread, the accuracy is still extremely low at slightly above 20%, we will further reduce the target spread to 10

In [8]:
df_10days = df[df['Number_of_days_it_trended'] < 10]
dataset_10days = df_10days[['categoryId','view_count', 'likes', 'comment_count']]
target_10days = df_10days[['Number_of_days_it_trended']]

In [9]:
classifier = MLPClassifier(hidden_layer_sizes=(150, 100, 50), max_iter=300, activation='relu', solver='adam', random_state=1)
dataset_train, dataset_test, target_train, target_test = train_test_split(dataset_10days, target_10days, test_size=0.25)
classifier.fit(dataset_train, target_train)
target_test_predicted = classifier.predict(dataset_test)
cm = confusion_matrix(target_test_predicted, target_test)

#Printing the accuracy
print("Accuracy of MLPClassifier : ",round(accuracy(cm)*100, 1), '\b% (rounded to 1 decimal place)')

  return f(*args, **kwargs)


Accuracy of MLPClassifier :  26.3 % (rounded to 1 decimal place)


We now see a better accuracy compared to the previous 2 datasets. We will try to reduce the dataset target spread to below 5 days

In [10]:
df_5days = df[df['Number_of_days_it_trended'] < 5]
dataset_5days = df_5days[['categoryId','view_count', 'likes', 'comment_count']]
target_5days = df_5days[['Number_of_days_it_trended']]

In [11]:
classifier = MLPClassifier(hidden_layer_sizes=(150, 100, 50), max_iter=300, activation='relu', solver='adam', random_state=1)
dataset_train, dataset_test, target_train, target_test = train_test_split(dataset_5days, target_5days, test_size=0.25)
classifier.fit(dataset_train, target_train)
target_test_predicted = classifier.predict(dataset_test)
cm = confusion_matrix(target_test_predicted, target_test)

#Printing the accuracy
print("Accuracy of MLPClassifier : ",round(accuracy(cm)*100, 1), '\b% (rounded to 1 decimal place)')

  return f(*args, **kwargs)


Accuracy of MLPClassifier :  61.6 % (rounded to 1 decimal place)


We see a major improvement in our accuracy to our model now after limiting the our target spread to just 5 days and below.

# Conclusion

Predicting the trend of a video is more than just looking at its views, comments, categories and likes. The most important parts of a youtube video which makes it go viral may be its thumbnail and its video content which we are unable to use to train our machine learning model. Therefore, the capabilities of our machine learning model to model a video's ability to maintain its trending status is limited.