# Question

Does publishing a video at a specific time of day get you more views than another time of day?

# Data

Trending Youtube Video Statistics

- Contains rows of Youtube videos with Video title, Publish time, Views, Likes, Dislikes, and Comments.

- Contains videos from all over the world but will only be focusing on USA Videos in this report. 

- Over 40,000 entries

- Average view for entire dataset: 2.3 Mil 
- Most viewed video: 225 Mil 
- Least viewed video: 550

# Experiment Proposal and Outline

### Analysis that highlights your experimental hypothesis.
1. Separate 'publish time' into three groups. Morning (4AM - 12PM), Afternoon (12PM -8PM), Night (8PM - 4AM). 

2. Create random sample of data for control (i.e. 3,000). Remove sample data from the population. Calculate average views from control group. 

3. Create random sample of data for testing. Take sample (i.e. 3,000) from each testing category (Morning, Afternoon, Night). Remove sample data from the population. Calculate average views from test groups.

4. Hypothesis: 
    - Publishing the video in the morning time will yield more views
    - Publishing the video in the afternoon time will yield more views
    - Publishing the video in the night time will yield more views
    
   NULL:
    - THere will be no difference in the number of views


5. Test each hypothesis against control data. Calculate the difference between the average views from the control. Calculate the t-values. Then calculate the p-value at a .05 significance level.


6. Other measured variables:
    - Likes
    - Dislikes
    - Comment count

### A rollout plan showing how you would implement and rollout the experiment
Using current data: 
1. Randomly create sample groups for control data. Then exclude from population data.
2. Randomly create sample groups for test data split by publish time (morning, afternoon, and night). Then exclude from population data.
3. Graph and gather statistics of sample groups. Compare against population/another sample (A/A test) to ensure minimal bias. 


If possible:
1. Publish same video at different times of day. 
2. Ensure that same video was not published on the same day. We do not want the same video to be competing against itself with the same potential audience. 
3. Have test lasts multiple years to minimize seasonality bias. 
4. Also compare videos of similar niches with each other. For example I would expect sports or beauty videos to perform similarly, so we can compare those videos against similar niche videos published at different times.  

### An evaluation plan showing what constitutes success in this experiment

After we collect the data and evaluate the average view per publishing time. Whichever time period that received the most views on average (compared to the control group) is then considered the best time to publish the video. We also must make sure that the data is statistically significant by ensuring that we have a p-value less than 0.05. 



# Data Overview

In [1]:
#Importing Libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from datetime import datetime as dt
from scipy import stats


#Loading Data
df = pd.read_csv('/Users/Kevin/Files/Thinkful/Data Files/Youtube Data/USvideos.csv')

In [4]:
df.head()

Unnamed: 0,video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description
0,2kyS6SvSYSE,17.14.11,WE WANT TO TALK ABOUT OUR MARRIAGE,CaseyNeistat,22,2017-11-13T17:13:01.000Z,SHANtell martin,748374,57527,2966,15954,https://i.ytimg.com/vi/2kyS6SvSYSE/default.jpg,False,False,False,SHANTELL'S CHANNEL - https://www.youtube.com/s...
1,1ZAPwfrtAFY,17.14.11,The Trump Presidency: Last Week Tonight with J...,LastWeekTonight,24,2017-11-13T07:30:00.000Z,"last week tonight trump presidency|""last week ...",2418783,97185,6146,12703,https://i.ytimg.com/vi/1ZAPwfrtAFY/default.jpg,False,False,False,"One year after the presidential election, John..."
2,5qpjK5DgCt4,17.14.11,"Racist Superman | Rudy Mancuso, King Bach & Le...",Rudy Mancuso,23,2017-11-12T19:05:24.000Z,"racist superman|""rudy""|""mancuso""|""king""|""bach""...",3191434,146033,5339,8181,https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg,False,False,False,WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
3,puqaWrEC7tY,17.14.11,Nickelback Lyrics: Real or Fake?,Good Mythical Morning,24,2017-11-13T11:00:04.000Z,"rhett and link|""gmm""|""good mythical morning""|""...",343168,10172,666,2146,https://i.ytimg.com/vi/puqaWrEC7tY/default.jpg,False,False,False,Today we find out if Link is a Nickelback amat...
4,d380meD0W0M,17.14.11,I Dare You: GOING BALD!?,nigahiga,24,2017-11-12T18:01:41.000Z,"ryan|""higa""|""higatv""|""nigahiga""|""i dare you""|""...",2095731,132235,1989,17518,https://i.ytimg.com/vi/d380meD0W0M/default.jpg,False,False,False,I know it's been a while since we did this sho...


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40949 entries, 0 to 40948
Data columns (total 16 columns):
video_id                  40949 non-null object
trending_date             40949 non-null object
title                     40949 non-null object
channel_title             40949 non-null object
category_id               40949 non-null int64
publish_time              40949 non-null object
tags                      40949 non-null object
views                     40949 non-null int64
likes                     40949 non-null int64
dislikes                  40949 non-null int64
comment_count             40949 non-null int64
thumbnail_link            40949 non-null object
comments_disabled         40949 non-null bool
ratings_disabled          40949 non-null bool
video_error_or_removed    40949 non-null bool
description               40379 non-null object
dtypes: bool(3), int64(5), object(8)
memory usage: 4.2+ MB


In [3]:
df.describe()

Unnamed: 0,category_id,views,likes,dislikes,comment_count
count,40949.0,40949.0,40949.0,40949.0,40949.0
mean,19.972429,2360785.0,74266.7,3711.401,8446.804
std,7.568327,7394114.0,228885.3,29029.71,37430.49
min,1.0,549.0,0.0,0.0,0.0
25%,17.0,242329.0,5424.0,202.0,614.0
50%,24.0,681861.0,18091.0,631.0,1856.0
75%,25.0,1823157.0,55417.0,1938.0,5755.0
max,43.0,225211900.0,5613827.0,1674420.0,1361580.0
