# TrendTracker: Exploring Trending Youtube Videos in Canada

### Feature Engineering

In this notebook, I will create new columns that can help with our model.

### Import Libraries and Data

In [34]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

In [35]:
# File name represents most recent update
file = r'/Users/OliverPan/Desktop/youtube_data/trend_12-03-20.csv'

In [36]:
trends = pd.read_csv(file)

In [37]:
# Change format
pd.options.display.float_format = '{:.5f}'.format

In [38]:
# Turn trending_date and publishedAt to datetime format
trends['trending_date'] = trends['trending_date'].str[0:10]
trends['publishedAt'] = trends['publishedAt'].str[0:10]
trends['trending_date'] = pd.to_datetime(trends['trending_date'])
trends['publishedAt'] = pd.to_datetime(trends['publishedAt'])

In [39]:
trends.fillna('Empty', inplace = True)

### Feature 1: Likes to Dislikes Ratio

In [40]:
trends['likes_to_dislikes'] = trends['likes'] / trends['dislikes']

In [41]:
trends['likes_to_dislikes'].describe()

count   22248.00000
mean       80.11951
std        83.58158
min         0.13961
25%        30.72792
50%        58.27218
75%        99.91116
max      1659.80000
Name: likes_to_dislikes, dtype: float64

In [42]:
trends[trends['likes_to_dislikes'] == 1659.80000]

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description,likes_to_dislikes
14088,NEpKlsUw9XU,ghost + guest 👻🎶,2020-10-19,UCdkkQvJoB0kGgYHCYwSkdww,Louie Zong,1,2020-10-21,[None],188256,66392,40,2961,https://i.ytimg.com/vi/NEpKlsUw9XU/default.jpg,False,False,yes! they are back! and with a new friend!,1659.8


This video had a really good ratio, but the view count wasn't very high compared to other videos.

### Feature 2: Time to trending

In [43]:
# How long does it generally take for a video to become viral
trends['time_to_trend'] = trends['trending_date'] - trends['publishedAt']

In [44]:
trends['time_to_trend'].describe()

count                     22400
mean     3 days 11:04:08.142857
std      2 days 02:47:55.029928
min             0 days 00:00:00
25%             2 days 00:00:00
50%             3 days 00:00:00
75%             5 days 00:00:00
max            34 days 00:00:00
Name: time_to_trend, dtype: object

In [45]:
# Quick look at data with 34 days till trend
trends[trends['time_to_trend'] == '34 days']

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description,likes_to_dislikes,time_to_trend
20995,Z-pdaEa2XC8,iPad Air — Boiiing,2020-10-23,UCE5_hf5ONW6_qy9ShfYfB4w,Apple Canada,28,2020-11-26,apple|apple ipad|ipad|ipad air|apple ipad air|...,1826775,1535,650,0,https://i.ytimg.com/vi/Z-pdaEa2XC8/default.jpg,True,False,Introducing iPad Air. Featuring an all-screen ...,2.36154,34 days


This is very interesting, the Apple video only started trending on Thanksgiving, maybe this serves as an indication for marketing technology. The iPad also gained traction around Black Friday (maybe there was a deal)

### Feature 3: Youtube Link

In [46]:
# The video_id represents the youtube link, so we can feature engineer it in
trends['youtube_link'] = 'https://www.youtube.com/watch?v=' + trends['video_id']

### Feature 4: No Tags

In [47]:
trends['no_description'] = trends['description'].apply(lambda x: 1 if x == 'Empty' else 0)

In [48]:
trends[trends['no_description'] == 1].head()

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description,likes_to_dislikes,time_to_trend,youtube_link,no_description
90,yyXGTANms4g,BENEFIT PAPA,2020-08-11,UCxAkiUiL3KnMl0BIFBv6xxA,Mr Macaroni,24,2020-08-12,[None],121898,8795,60,1205,https://i.ytimg.com/vi/yyXGTANms4g/default.jpg,False,False,Empty,146.58333,1 days,https://www.youtube.com/watch?v=yyXGTANms4g,1
222,NSuaUok-wTY,[1147] Locksmith Says My Videos Are BS... Lose...,2020-08-12,UCm9K6rby98W8JigLoZOh6FQ,LockPickingLawyer,27,2020-08-13,Lock|picking,422963,62940,168,7301,https://i.ytimg.com/vi/NSuaUok-wTY/default.jpg,False,False,Empty,374.64286,1 days,https://www.youtube.com/watch?v=NSuaUok-wTY,1
317,yyXGTANms4g,BENEFIT PAPA,2020-08-11,UCxAkiUiL3KnMl0BIFBv6xxA,Mr Macaroni,24,2020-08-13,[None],150314,9737,65,1282,https://i.ytimg.com/vi/yyXGTANms4g/default.jpg,False,False,Empty,149.8,2 days,https://www.youtube.com/watch?v=yyXGTANms4g,1
445,NSuaUok-wTY,[1147] Locksmith Says My Videos Are BS... Lose...,2020-08-12,UCm9K6rby98W8JigLoZOh6FQ,LockPickingLawyer,27,2020-08-14,Lock|picking,511773,69611,214,7837,https://i.ytimg.com/vi/NSuaUok-wTY/default.jpg,False,False,Empty,325.28505,2 days,https://www.youtube.com/watch?v=NSuaUok-wTY,1
542,yyXGTANms4g,BENEFIT PAPA,2020-08-11,UCxAkiUiL3KnMl0BIFBv6xxA,Mr Macaroni,24,2020-08-14,[None],173118,10486,74,1343,https://i.ytimg.com/vi/yyXGTANms4g/default.jpg,False,False,Empty,141.7027,3 days,https://www.youtube.com/watch?v=yyXGTANms4g,1


### Feature 5: Views Per Date Until Trending

This feature is somewhat of an assumption, but I want to see how many views (linearly) a video gets during its trending time. I want to see if I can point out trending videos that are more popular than others, and also features that influence that.

In [49]:
# Change format of days till trend
trends['time_to_trend'] = trends['time_to_trend'].astype('timedelta64[D]')

In [50]:
time_to_trend = trends['time_to_trend'].value_counts().to_frame().reset_index().head(10)
time_to_trend.columns = ['days', 'time_to_trend']
time_to_trend

Unnamed: 0,days,time_to_trend
0,4.0,4147
1,3.0,4141
2,2.0,3860
3,5.0,3408
4,1.0,3399
5,6.0,1897
6,7.0,703
7,0.0,485
8,8.0,194
9,9.0,52


In [51]:
# Create the feature
trends['views_per_date'] = trends['view_count'] / trends['time_to_trend']

Our data is now looked into the perspective of one day, rather than multiple days for various videos, as you can see the distribution above.

In [52]:
trends[trends['time_to_trend'] != 0]['view_count'].describe()

count       21915.00000
mean      2508591.40703
std       7150149.56437
min         47714.00000
25%        424022.50000
50%        924406.00000
75%       2173876.50000
max     232649205.00000
Name: view_count, dtype: float64

I needed to filter out time_to_trend = 0 due to zero division error

In [53]:
### Feature 6: Popularity Score

This is going to be an interesting feature. We don't have a data set that informs us of subscribers per Youtuber. This is important because trending videos may be more popular if the Youtuber is popular. Hence, I am creating a feature that counts the number of trending videos and scores youtubers based on how many they have had overtime.

In [54]:
channels_grouped = trends.groupby(['channelTitle'])['title'].count().reset_index().sort_values(by=['title'])
channels_grouped.columns = ['channelTitle', 'num_trending']
channels_grouped

Unnamed: 0,channelTitle,num_trending
1843,Verizon,1
895,Kurt Hugo Schneider,1
425,Demi Adejuyigbe,1
701,Hoovies Garage,1
1682,The Globe and Mail,1
...,...,...
769,James Charles,88
377,DAZN Canada,89
1146,MrBeast,93
1147,MrBeast Gaming,103


In [55]:
# Left join to trends
trends = pd.merge(trends, channels_grouped, how = 'left', on = 'channelTitle')
trends.head()

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,...,thumbnail_link,comments_disabled,ratings_disabled,description,likes_to_dislikes,time_to_trend,youtube_link,no_description,views_per_date,num_trending
0,KX06ksuS6Xo,Diljit Dosanjh: CLASH (Official) Music Video |...,2020-08-11,UCZRdNleCgW-BGUJf-bbjzQg,Diljit Dosanjh,10,2020-08-12,clash diljit dosanjh|diljit dosanjh|diljit dos...,9140911,296541,...,https://i.ytimg.com/vi/KX06ksuS6Xo/default.jpg,False,False,CLASH official music video performed by DILJIT...,47.98398,1.0,https://www.youtube.com/watch?v=KX06ksuS6Xo,0,9140911.0,18
1,J78aPJ3VyNs,I left youtube for a month and THIS is what ha...,2020-08-11,UCYzPXprvl5Y-Sf0g4vX-m6g,jacksepticeye,24,2020-08-12,jacksepticeye|funny|funny meme|memes|jacksepti...,2038853,353797,...,https://i.ytimg.com/vi/J78aPJ3VyNs/default.jpg,False,False,I left youtube for a month and this is what ha...,134.62595,1.0,https://www.youtube.com/watch?v=J78aPJ3VyNs,0,2038853.0,11
2,M9Pmf9AB4Mo,Apex Legends | Stories from the Outlands – “Th...,2020-08-11,UC0ZV6M2THA81QT9hrVWJG3A,Apex Legends,20,2020-08-12,Apex Legends|Apex Legends characters|new Apex ...,2381688,146740,...,https://i.ytimg.com/vi/M9Pmf9AB4Mo/default.jpg,False,False,"While running her own modding shop, Ramya Pare...",52.51969,1.0,https://www.youtube.com/watch?v=M9Pmf9AB4Mo,0,2381688.0,54
3,3C66w5Z0ixs,I ASKED HER TO BE MY GIRLFRIEND...,2020-08-11,UCvtRTOMP2TqYqu51xNrqAzg,Brawadis,22,2020-08-12,brawadis|prank|basketball|skits|ghost|funny vi...,1514614,156914,...,https://i.ytimg.com/vi/3C66w5Z0ixs/default.jpg,False,False,SUBSCRIBE to BRAWADIS ▶ http://bit.ly/Subscrib...,26.79085,1.0,https://www.youtube.com/watch?v=3C66w5Z0ixs,0,1514614.0,12
4,VIUo6yapDbc,Ultimate DIY Home Movie Theater for The LaBran...,2020-08-11,UCDVPcEbVLQgLZX0Rt6jo34A,Mr. Kate,26,2020-08-12,The LaBrant Family|DIY|Interior Design|Makeove...,1123889,45803,...,https://i.ytimg.com/vi/VIUo6yapDbc/default.jpg,False,False,Transforming The LaBrant Family's empty white ...,47.51349,1.0,https://www.youtube.com/watch?v=VIUo6yapDbc,0,1123889.0,11


### Feature 7: Number published

With this feature, we can see how many videos are published on a given day, which may impact whether a video gets onto the trending list

In [56]:
num_trending = trends.groupby(trends['publishedAt'])['video_id'].count().reset_index()
num_trending.columns = ['publishedAt', 'num_per_day']
num_trending

Unnamed: 0,publishedAt,num_per_day
0,2020-07-27,3
1,2020-08-01,6
2,2020-08-03,5
3,2020-08-05,8
4,2020-08-06,21
...,...,...
119,2020-11-29,182
120,2020-11-30,123
121,2020-12-01,81
122,2020-12-02,33


In [57]:
trends = pd.merge(trends, num_trending, how = 'left', on = 'publishedAt')
trends.head()

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,...,comments_disabled,ratings_disabled,description,likes_to_dislikes,time_to_trend,youtube_link,no_description,views_per_date,num_trending,num_per_day
0,KX06ksuS6Xo,Diljit Dosanjh: CLASH (Official) Music Video |...,2020-08-11,UCZRdNleCgW-BGUJf-bbjzQg,Diljit Dosanjh,10,2020-08-12,clash diljit dosanjh|diljit dosanjh|diljit dos...,9140911,296541,...,False,False,CLASH official music video performed by DILJIT...,47.98398,1.0,https://www.youtube.com/watch?v=KX06ksuS6Xo,0,9140911.0,18,230
1,J78aPJ3VyNs,I left youtube for a month and THIS is what ha...,2020-08-11,UCYzPXprvl5Y-Sf0g4vX-m6g,jacksepticeye,24,2020-08-12,jacksepticeye|funny|funny meme|memes|jacksepti...,2038853,353797,...,False,False,I left youtube for a month and this is what ha...,134.62595,1.0,https://www.youtube.com/watch?v=J78aPJ3VyNs,0,2038853.0,11,230
2,M9Pmf9AB4Mo,Apex Legends | Stories from the Outlands – “Th...,2020-08-11,UC0ZV6M2THA81QT9hrVWJG3A,Apex Legends,20,2020-08-12,Apex Legends|Apex Legends characters|new Apex ...,2381688,146740,...,False,False,"While running her own modding shop, Ramya Pare...",52.51969,1.0,https://www.youtube.com/watch?v=M9Pmf9AB4Mo,0,2381688.0,54,230
3,3C66w5Z0ixs,I ASKED HER TO BE MY GIRLFRIEND...,2020-08-11,UCvtRTOMP2TqYqu51xNrqAzg,Brawadis,22,2020-08-12,brawadis|prank|basketball|skits|ghost|funny vi...,1514614,156914,...,False,False,SUBSCRIBE to BRAWADIS ▶ http://bit.ly/Subscrib...,26.79085,1.0,https://www.youtube.com/watch?v=3C66w5Z0ixs,0,1514614.0,12,230
4,VIUo6yapDbc,Ultimate DIY Home Movie Theater for The LaBran...,2020-08-11,UCDVPcEbVLQgLZX0Rt6jo34A,Mr. Kate,26,2020-08-12,The LaBrant Family|DIY|Interior Design|Makeove...,1123889,45803,...,False,False,Transforming The LaBrant Family's empty white ...,47.51349,1.0,https://www.youtube.com/watch?v=VIUo6yapDbc,0,1123889.0,11,230


In [58]:
trends.to_csv(r'/Users/OliverPan/Desktop/youtube_data/trend_features.csv', index = False)