# TrendTracker: Exploring Trending Youtube Videos in Canada

### Feature Engineering

In this notebook, I will create new columns that can help with our model.

### Import Libraries and Data

In [108]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

In [109]:
# File name represents most recent update
file = r'/Users/OliverPan/Desktop/youtube_data/trend_12-03-20.csv'

In [110]:
trends = pd.read_csv(file)

In [111]:
# Change format
pd.options.display.float_format = '{:.5f}'.format

In [112]:
# Turn trending_date and publishedAt to datetime format
trends['trending_date'] = trends['trending_date'].str[0:10]
trends['publishedAt'] = trends['publishedAt'].str[0:10]
trends['trending_date'] = pd.to_datetime(trends['trending_date'])
trends['publishedAt'] = pd.to_datetime(trends['publishedAt'])

In [113]:
trends.fillna('Empty', inplace = True)

### Feature 1: Likes to Dislikes Ratio

In [114]:
trends['likes_to_dislikes'] = trends['likes'] / trends['dislikes']

In [115]:
trends['likes_to_dislikes'].describe()

count   22248.00000
mean       80.11951
std        83.58158
min         0.13961
25%        30.72792
50%        58.27218
75%        99.91116
max      1659.80000
Name: likes_to_dislikes, dtype: float64

In [116]:
trends[trends['likes_to_dislikes'] == 1659.80000]

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description,likes_to_dislikes
14088,NEpKlsUw9XU,ghost + guest 👻🎶,2020-10-19,UCdkkQvJoB0kGgYHCYwSkdww,Louie Zong,1,2020-10-21,[None],188256,66392,40,2961,https://i.ytimg.com/vi/NEpKlsUw9XU/default.jpg,False,False,yes! they are back! and with a new friend!,1659.8


This video had a really good ratio, but the view count wasn't very high compared to other videos.

### Feature 2: Time to trending

In [117]:
# How long does it generally take for a video to become viral
trends['time_to_trend'] = trends['trending_date'] - trends['publishedAt']

In [118]:
trends['time_to_trend'].describe()

count                     22400
mean     3 days 11:04:08.142857
std      2 days 02:47:55.029928
min             0 days 00:00:00
25%             2 days 00:00:00
50%             3 days 00:00:00
75%             5 days 00:00:00
max            34 days 00:00:00
Name: time_to_trend, dtype: object

In [119]:
# Quick look at data with 34 days till trend
trends[trends['time_to_trend'] == '34 days']

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description,likes_to_dislikes,time_to_trend
20995,Z-pdaEa2XC8,iPad Air — Boiiing,2020-10-23,UCE5_hf5ONW6_qy9ShfYfB4w,Apple Canada,28,2020-11-26,apple|apple ipad|ipad|ipad air|apple ipad air|...,1826775,1535,650,0,https://i.ytimg.com/vi/Z-pdaEa2XC8/default.jpg,True,False,Introducing iPad Air. Featuring an all-screen ...,2.36154,34 days


This is very interesting, the Apple video only started trending on Thanksgiving, maybe this serves as an indication for marketing technology. The iPad also gained traction around Black Friday (maybe there was a deal)

### Feature 3: Youtube Link

In [120]:
# The video_id represents the youtube link, so we can feature engineer it in
trends['youtube_link'] = 'https://www.youtube.com/watch?v=' + trends['video_id']

### Feature 4: No Tags

In [121]:
trends['no_description'] = trends['description'].apply(lambda x: 1 if x == 'Empty' else 0)

In [124]:
trends[trends['no_description'] == 1].head()

Unnamed: 0,video_id,title,publishedAt,channelId,channelTitle,categoryId,trending_date,tags,view_count,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,description,likes_to_dislikes,time_to_trend,youtube_link,no_description
90,yyXGTANms4g,BENEFIT PAPA,2020-08-11,UCxAkiUiL3KnMl0BIFBv6xxA,Mr Macaroni,24,2020-08-12,[None],121898,8795,60,1205,https://i.ytimg.com/vi/yyXGTANms4g/default.jpg,False,False,Empty,146.58333,1 days,https://www.youtube.com/watch?v=yyXGTANms4g,1
222,NSuaUok-wTY,[1147] Locksmith Says My Videos Are BS... Lose...,2020-08-12,UCm9K6rby98W8JigLoZOh6FQ,LockPickingLawyer,27,2020-08-13,Lock|picking,422963,62940,168,7301,https://i.ytimg.com/vi/NSuaUok-wTY/default.jpg,False,False,Empty,374.64286,1 days,https://www.youtube.com/watch?v=NSuaUok-wTY,1
317,yyXGTANms4g,BENEFIT PAPA,2020-08-11,UCxAkiUiL3KnMl0BIFBv6xxA,Mr Macaroni,24,2020-08-13,[None],150314,9737,65,1282,https://i.ytimg.com/vi/yyXGTANms4g/default.jpg,False,False,Empty,149.8,2 days,https://www.youtube.com/watch?v=yyXGTANms4g,1
445,NSuaUok-wTY,[1147] Locksmith Says My Videos Are BS... Lose...,2020-08-12,UCm9K6rby98W8JigLoZOh6FQ,LockPickingLawyer,27,2020-08-14,Lock|picking,511773,69611,214,7837,https://i.ytimg.com/vi/NSuaUok-wTY/default.jpg,False,False,Empty,325.28505,2 days,https://www.youtube.com/watch?v=NSuaUok-wTY,1
542,yyXGTANms4g,BENEFIT PAPA,2020-08-11,UCxAkiUiL3KnMl0BIFBv6xxA,Mr Macaroni,24,2020-08-14,[None],173118,10486,74,1343,https://i.ytimg.com/vi/yyXGTANms4g/default.jpg,False,False,Empty,141.7027,3 days,https://www.youtube.com/watch?v=yyXGTANms4g,1
