# Introduction

This dataset includes several months (and counting) of data on daily trending YouTube videos. Data is is from videos in the United States, with up to 200 listed trending videos per day.

Each region’s data is in a separate file. Data includes the video title, channel title, publish time, tags, views, likes and dislikes, description, and comment count.

The associated JSON contains the categories for a specific video, which describes the category_id field.

---


In [25]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
import json
%matplotlib inline

In [2]:
file = "C:/Users/Carter Carlson/Documents/Thinkful/Large Databases/Youtube/videos.csv"
df = pd.read_csv(file)

In [3]:
file = 'C:/Users/Carter Carlson/Documents/Thinkful/Large Databases/Youtube/category_id.json'
with open(file) as data_file:
    data = json.load(data_file)
num_list = []
title_list = []
for group in data['items']:
    num = group['id']
    title = group['snippet']['title']
    num_list.append(num)
    title_list.append(title)

In [4]:
cat_id_list = dict(zip(num_list, title_list))
df['category_title'] = [cat_id_list[str(i)] for i in df['category_id']]

In [36]:
# average views
df.groupby('category_title')['views'].mean().sort_values(ascending=False)/1000
# NOTE: let's look at only top 5 category_titles - how do I do this?

category_title
Music                    4731.975601
Nonprofits & Activism    3462.796521
Film & Animation         2392.016401
Sports                   1848.030017
Entertainment            1841.495112
Gaming                   1815.562915
Comedy                   1396.748527
Autos & Vehicles         1361.112571
People & Blogs           1258.737347
Science & Technology     1027.945100
Howto & Style             862.151344
Travel & Events           857.805181
Shows                     697.089771
Pets & Animals            635.517181
Education                 635.104605
News & Politics           432.745734
Name: views, dtype: float64

In [32]:
views = df.groupby('category_title')['views'].mean()
likes = df.groupby('category_title')['likes'].mean()
likes/views * 1000 # average likes per 1,000 views

category_title
Autos & Vehicles          7.782089
Comedy                   43.191054
Education                40.234743
Entertainment            26.328258
Film & Animation         23.341793
Gaming                   33.651060
Howto & Style            38.635503
Music                    37.835230
News & Politics          13.012881
Nonprofits & Activism    88.591661
People & Blogs           34.149489
Pets & Animals           29.666499
Science & Technology     26.709072
Shows                    24.767829
Sports                   22.884121
Travel & Events          13.375147
dtype: float64

In [37]:
dislikes = df.groupby('category_title')['dislikes'].mean()
dislikes/views * 1000

category_title
Autos & Vehicles          0.465059
Comedy                    1.428402
Education                 1.238194
Entertainment             2.372521
Film & Animation          0.921745
Gaming                    2.332690
Howto & Style             1.335880
Music                     1.320010
News & Politics           3.736069
Nonprofits & Activism    19.908421
People & Blogs            1.542404
Pets & Animals            0.687157
Science & Technology      1.296476
Shows                     0.472126
Sports                    1.228906
Travel & Events           0.997962
dtype: float64

In [53]:
round((1-dislikes/likes) * 100,1) #average ratings 

category_title
Autos & Vehicles         94.0
Comedy                   96.7
Education                96.9
Entertainment            91.0
Film & Animation         96.1
Gaming                   93.1
Howto & Style            96.5
Music                    96.5
News & Politics          71.3
Nonprofits & Activism    77.5
People & Blogs           95.5
Pets & Animals           97.7
Science & Technology     95.1
Shows                    98.1
Sports                   94.6
Travel & Events          92.5
dtype: float64

In [36]:
df.head()

Unnamed: 0,video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description
0,2kyS6SvSYSE,17.14.11,WE WANT TO TALK ABOUT OUR MARRIAGE,CaseyNeistat,22,2017-11-13T17:13:01.000Z,SHANtell martin,748374,57527,2966,15954,https://i.ytimg.com/vi/2kyS6SvSYSE/default.jpg,False,False,False,SHANTELL'S CHANNEL - https://www.youtube.com/s...
1,1ZAPwfrtAFY,17.14.11,The Trump Presidency: Last Week Tonight with J...,LastWeekTonight,24,2017-11-13T07:30:00.000Z,"last week tonight trump presidency|""last week ...",2418783,97185,6146,12703,https://i.ytimg.com/vi/1ZAPwfrtAFY/default.jpg,False,False,False,"One year after the presidential election, John..."
2,5qpjK5DgCt4,17.14.11,"Racist Superman | Rudy Mancuso, King Bach & Le...",Rudy Mancuso,23,2017-11-12T19:05:24.000Z,"racist superman|""rudy""|""mancuso""|""king""|""bach""...",3191434,146033,5339,8181,https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg,False,False,False,WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
3,puqaWrEC7tY,17.14.11,Nickelback Lyrics: Real or Fake?,Good Mythical Morning,24,2017-11-13T11:00:04.000Z,"rhett and link|""gmm""|""good mythical morning""|""...",343168,10172,666,2146,https://i.ytimg.com/vi/puqaWrEC7tY/default.jpg,False,False,False,Today we find out if Link is a Nickelback amat...
4,d380meD0W0M,17.14.11,I Dare You: GOING BALD!?,nigahiga,24,2017-11-12T18:01:41.000Z,"ryan|""higa""|""higatv""|""nigahiga""|""i dare you""|""...",2095731,132235,1989,17518,https://i.ytimg.com/vi/d380meD0W0M/default.jpg,False,False,False,I know it's been a while since we did this sho...


Hypothesis - disabling comments will change the overall likes and dislikes of a video.

- enabling/disabling comments probably doesn't have an impact on views.  People don't know if a video has comments enabled before viewing, so it shouldn't impact views in that sense.  It may impact views if people come back multiple times to comment on the same thread... sharing the video shouldn't change much whether there's comments or not

Test - 