### Youtube Trending Data

In [1]:
pip install google-api-python-client

Note: you may need to restart the kernel to use updated packages.


In [2]:
## Phase I Project Proposal
### Understanding Factors Influencing YouTube Video Popularity

#### Name: Damanbir Anand DS3000


### Intro 

What makes a YouTube video popular? There are many factors that can affect the success of videos on YouTube, from the video’s length to the time it was uploaded. With millions of videos uploaded every day, creators often struggle to understand which factors drive engagement and viewership.

I'm particularly interested in two questions: Does the length of a YouTube video impact its total views, likes, and comments? and Does the time of day or day of the week a video is uploaded affect its overall popularity? These questions are important because understanding them can help content creators optimize both the production and the timing of their videos, leading to increased engagement and views.

A study by Pew Research highlights that only a small percentage of content creators drive the majority of views on the platform, indicating that creators need to adopt data-driven strategies to succeed. 
 
By exploring these two questions, this project seeks to provide practical insights into the factors that influence YouTube video popularity.

### Data Collection

I plan to use the YouTube Data API to collect data on the top 50 trending videos from the YouTube platform. These videos represent the most popular content at the moment, which will help me target up-to-date information relevant to my questions of interest. The YouTube Data API is user-friendly and allows me to access important video attributes like view count, likes, comments, and upload time. I demonstrate below how I can retrieve this data, even though it may require some cleaning in future.

In [4]:

from googleapiclient.discovery import build
import pandas as pd

# YOUTUBE APIKEY
api_key = "AIzaSyCdOb2itUmvosF49JgJ1yFsgVU0Qb7Tnfo"


youtube = build('youtube', 'v3', developerKey=api_key)

# Requesting the top 50 trending videos currently 
request = youtube.videos().list(
    part="snippet,statistics,contentDetails",  
    chart="mostPopular",
    maxResults=50,  
    regionCode="US" 
)


response = request.execute()


videos = []

# Looping over each video in the response
for item in response['items']:
    video_data = {
        'title': item['snippet']['title'],
        'videoId': item['id'],
        'publishedAt': item['snippet']['publishedAt'],
        'viewCount': item['statistics'].get('viewCount', 0),
        'likeCount': item['statistics'].get('likeCount', 0),
        'commentCount': item['statistics'].get('commentCount', 0),
        'duration': item['contentDetails']['duration']
    }
    videos.append(video_data)

video_df = pd.DataFrame(videos)

video_df.head()


Unnamed: 0,title,videoId,publishedAt,viewCount,likeCount,commentCount,duration
0,CHROMAKOPIA VINYL,dL6LM4DyzU8,2024-10-22T14:05:55Z,1116375,176268,7427,PT39S
1,NOID,Qer3lwd5hyA,2024-10-21T14:44:20Z,5081775,568598,28912,PT2M43S
2,Ambessa Abilities | Ability Reveal & Gameplay,pqQ00QqJEys,2024-10-22T15:00:56Z,525162,15040,3163,PT2M21S
3,Yellowstone Official Trailer | Paramount Network,n17AZkUXy58,2024-10-22T13:15:05Z,672269,3747,315,PT1M25S
4,The Brutalist | Official Trailer HD | A24,6d7yU379Ur0,2024-10-22T13:00:07Z,441460,16172,946,PT1M17S


### How the Data Will Be Used
By analyzing the relationships between these variables, I can identify patterns that may help content creators optimize their videos for greater engagement. I anticipate using regression models to predict numerical values such as the number of views based on video length or upload time. Additionally, I may explore classification techniques to group videos into categories of high or low engagement based on their features. These methods will help us better understand the factors that influence video popularity on YouTube
 
The data I collected from the YouTube API is mostly ready to use but may require some cleaning. For example, the video duration is provided in ISO 8601 format which will need to be converted into a more usable format like total seconds or minutes.