# Aim and Objective

* Getting to know Youtube API and how to obtain video data.
* Analyzing video data and verify different common "myths" about what makes a video do well on Youtube, for example:
* Does the number of likes and comments matter for a video to get more views?
* Does the video duration matter for views and interaction (likes/ comments)?
* To Analyze the video metrics such as views, duration, upload schedule, likes, comments, tags, and upload frequency for popular music channels ('Aditya Music', 'SonyMusicSouthVEVO', 'T-Series', 'Saregama Music', and 'Zee Music Company') to determine their impact on video performance.
* To visualized and compared best/worst and shortest/longest performing videos and examined weekly/yearly upload schedules to identify how these metrics affect views.



# Steps:
1. Data Extraction using Youtube API
2. Data Preprocessing
3. Feature Selection and Adding additional features
4. Exploratory Data Analysis
5. Observations & Conclusions

# Dataset

In regards to music channels, I discovered that there are not many available datasets online that are suitable for this project.

* collected data from the following music channels: 'Aditya Music', 'SonyMusicSouthVEVO', 'T-Series', 'Saregama Music', and 'Zee Music Company'. The parameters that I included in my data collection were 'channelTitle', 'title', 'tags', 'publishedAt', 'viewCount', 'likeCount', 'commentCount', 'duration', 'tagcount', 'publishDayName', 'durationSecs', and 'publishYear'.
* I created my own dataset using the [Google Youtube Data API version 3.0](https://developers.google.com/youtube/v3)

#Code

In [None]:
from googleapiclient.discovery import build
import pandas as pd
from IPython.display import JSON

import seaborn as sns

In [None]:
api_key='My Youtube API key'

In [None]:
channel_id= ['UC_A7K2dXFsTMAciGmnNxy-Q', #Saregama Music
             'UCq-Fj5jknLsUf-MWSy4_brA', #T-series
             'UCTNtRdBAiZtHP9w7JinzfUg', #SonyMusicSouth
             'UCNApqoVYJbYSrni4YsbXzyQ', #Aditya Music
             'UCFFbwnve3yF62-tVXkTyHqg'  #Zee Music
             ] 

In [None]:
api_service_name = "youtube"
api_version = "v3"


# Get credentials and create an API client
youtube = build(
    api_service_name, api_version, developerKey=api_key)

In [None]:
request= youtube.channels().list(
      part="snippet,contentDetails,statistics",
      id=','.join(channel_id)
  )
response= request.execute()

JSON(response)

<IPython.core.display.JSON object>

In [None]:

def get_channel_stats(youtube, channel_id):
  all_data=[]

  request= youtube.channels().list(
      part="snippet,contentDetails,statistics",
      id=','.join(channel_id)
  )
  response= request.execute()

  for item in response['items']:
    data= {'channelName':item['snippet']['title'],
           'subscribers': item['statistics']['subscriberCount'],
           'views':item['statistics']['viewCount'],
           'totalViews': item['statistics']['videoCount'],
           'playlistId': item['contentDetails']['relatedPlaylists']['uploads']
    }

    all_data.append(data)

  return(pd.DataFrame(all_data))

In [None]:
channel_stats= get_channel_stats(youtube, channel_id)

In [None]:
channel_stats

Unnamed: 0,channelName,subscribers,views,totalViews,playlistId
0,T-Series,236000000,216384586962,18820,UUq-Fj5jknLsUf-MWSy4_brA
1,Saregama Music,32500000,13891908009,6787,UU_A7K2dXFsTMAciGmnNxy-Q
2,Aditya Music,27600000,24556038781,21010,UUNApqoVYJbYSrni4YsbXzyQ
3,Zee Music Company,92700000,54244222583,7766,UUFFbwnve3yF62-tVXkTyHqg
4,SonyMusicSouthVEVO,17400000,16758221652,5452,UUTNtRdBAiZtHP9w7JinzfUg


In [None]:
def get_video_ids_for_playlists(youtube, playlist_ids):
    video_ids = []
    for playlist_id in playlist_ids:
        request = youtube.playlistItems().list(
            part="contentDetails",
            playlistId=playlist_id,
            maxResults=50
        )
        response = request.execute()
        for item in response["items"]:
            video_ids.append(item["contentDetails"]["videoId"])
        while "nextPageToken" in response:
            page_token = response["nextPageToken"]
            response = youtube.playlistItems().list(
                part="contentDetails",
                playlistId=playlist_id,
                maxResults=50,
                pageToken=page_token
            ).execute()
            for item in response["items"]:
                video_ids.append(item["contentDetails"]["videoId"])
    return video_ids

In [None]:
playlist_ids = ['UUNApqoVYJbYSrni4YsbXzyQ', 'UUTNtRdBAiZtHP9w7JinzfUg', 'UUq-Fj5jknLsUf-MWSy4_brA', 'UU_A7K2dXFsTMAciGmnNxy-Q', 'UUFFbwnve3yF62-tVXkTyHqg']

# Call the function to get the video IDs for all videos in the playlists
video_ids = get_video_ids_for_playlists(youtube, playlist_ids)


In [None]:
len(video_ids)

58242

In [None]:
def get_video_details(youtube, video_ids):
        
    all_video_info = []
    
    for i in range(0, len(video_ids), 50):
        request = youtube.videos().list(
            part="snippet,contentDetails,statistics",
            id=','.join(video_ids[i:i+50])
        )
        response = request.execute() 

        for video in response['items']:
            stats_to_keep = {'snippet': ['channelTitle', 'title', 'description', 'tags', 'publishedAt'],
                             'statistics': ['viewCount', 'likeCount', 'favouriteCount', 'commentCount'],
                             'contentDetails': ['duration', 'definition', 'caption']
                            }
            video_info = {}
            video_info['video_id'] = video['id']

            for k in stats_to_keep.keys():
                for v in stats_to_keep[k]:
                    try:
                        video_info[v] = video[k][v]
                    except:
                        video_info[v] = None

            all_video_info.append(video_info)

    return pd.DataFrame(all_video_info)

In [None]:
video_df= get_video_details(youtube, video_ids)
len(video_df)

58242

In [None]:
video_df.head()

Unnamed: 0,video_id,channelTitle,title,description,tags,publishedAt,viewCount,likeCount,favouriteCount,commentCount,duration,definition,caption
0,X-hywjtKK8k,Aditya Music,"Hello Alludu Full Songs Jukebox | Suman, Rambh...",Listen & Enjoy #HelloAlludu Full Songs Jukebox...,,2023-02-21T17:30:05Z,139,1,,0,PT20M55S,hd,False
1,tPqYiT6Elps,Aditya Music,"Ammayi Kosam Full Songs Jukebox | Vineeth, Mee...",Listen & Enjoy Ammayi Kosam Full Songs Jukebox...,,2023-02-21T17:00:38Z,458,2,,0,PT26M56S,hd,False
2,nW3-i0jhhTo,Aditya Music,Ammo Okato Thareeku Full Songs Jukebox | Srika...,Listen & Enjoy Ammo Okato Thareeku Full Songs ...,,2023-02-21T16:30:13Z,534,3,,0,PT26M58S,hd,False
3,8QVolL7qUzg,Aditya Music,Interview With Priya Hegde And Kiran Raj | Nuv...,Watch & Enjoy Interview With Priya Hegde And K...,,2023-02-21T13:30:08Z,1364,14,,0,PT26M52S,hd,False
4,TsAJFnjt2Wo,Aditya Music,Manusutho Preminchi Full Video Song | Samudram...,Watch & Enjoy Manusutho Preminchi Full Video S...,,2023-02-21T13:00:16Z,1930,32,,6,PT3M32S,hd,False


In [None]:
video_df.isnull().sum()

video_id              0
channelTitle          0
title                 0
description           0
tags               1200
publishedAt           0
viewCount             0
likeCount           100
favouriteCount    58242
commentCount         53
duration              0
definition            0
caption               0
dtype: int64

In [None]:
video_df.columns

Index(['video_id', 'channelTitle', 'title', 'description', 'tags',
       'publishedAt', 'viewCount', 'likeCount', 'favouriteCount',
       'commentCount', 'duration', 'definition', 'caption'],
      dtype='object')

In [None]:
final_df= video_df.drop(['video_id','description','caption','favouriteCount','definition'],axis=1)

Unnamed: 0,channelTitle,title,tags,publishedAt,viewCount,likeCount,commentCount,duration
0,Aditya Music,"Hello Alludu Full Songs Jukebox | Suman, Rambh...",,2023-02-21T17:30:05Z,139,1,0,PT20M55S
1,Aditya Music,"Ammayi Kosam Full Songs Jukebox | Vineeth, Mee...",,2023-02-21T17:00:38Z,458,2,0,PT26M56S
2,Aditya Music,Ammo Okato Thareeku Full Songs Jukebox | Srika...,,2023-02-21T16:30:13Z,534,3,0,PT26M58S
3,Aditya Music,Interview With Priya Hegde And Kiran Raj | Nuv...,,2023-02-21T13:30:08Z,1364,14,0,PT26M52S
4,Aditya Music,Manusutho Preminchi Full Video Song | Samudram...,,2023-02-21T13:00:16Z,1930,32,6,PT3M32S


In [None]:
final_df.to_csv('final_df.csv', index=False, header=True)

In [None]:
final_df.shape

(58242, 8)

In [None]:
final_df.isnull().sum()

channelTitle       0
title              0
tags            1200
publishedAt        0
viewCount          0
likeCount        100
commentCount      53
duration           0
dtype: int64

In [None]:
final_df[final_df['title'].isnull()]

Unnamed: 0,channelTitle,title,tags,publishedAt,viewCount,likeCount,commentCount,duration


In [None]:
final_df.tail()

Unnamed: 0,channelTitle,title,tags,publishedAt,viewCount,likeCount,commentCount,duration
58237,Zee Music Company,Sar Utha Ke - Hawaa Hawaai - Full Audio Song -...,"[Hawaa Hawaai, Saqib Saleem (Film Actor), Part...",2014-04-19T07:35:00Z,23382,143,3,PT3M30S
58238,Zee Music Company,Hawaa Hawaai (Title Track) - Hawaa Hawaai - F...,"[Hawaa Hawaai, Saqib Saleem (Film Actor), Part...",2014-04-19T07:34:27Z,17703,91,12,PT4M20S
58239,Zee Music Company,Sapnon Ko Ginte Ginte - Hawaa Hawaai - Full A...,"[Hawaa Hawaai, Saqib Saleem (Film Actor), Part...",2014-04-19T07:33:41Z,17835,91,6,PT5M33S
58240,Zee Music Company,Tu Hi Toh Hai - Full Audio Song | Holiday | Ak...,"[Akshay Kumar (TV Personality), Sonakshi Sinha...",2014-04-18T12:13:21Z,186138,1173,57,PT4M15S
58241,Zee Music Company,Sar Utha Ke | Hawaa Hawaai Official HD Video f...,"[Hawaa Hawaai, Saqib Saleem (Film Actor), Part...",2014-04-11T14:23:18Z,492630,3097,114,PT2M18S
