##### Rough project parameters
- Statistics that YouTube creators care about includes view count, subscriber count, watch time, average view duration, and click through rate
- Some sort of automation component regarding data collection and processing
- Insights / recommendations based on data analysis
- Integrate data into web applications
- Be creative! Show that I know some statistics!

##### Instructions / Helpers
- Using the channels, playlist items, videos
- website to convert handle (@) to channel ID: https://www.streamweasels.com/tools/youtube-channel-id-and-user-id-convertor/

### Loading Python Libraries

In [117]:
# loading necessary libraries
import pandas as pd
from googleapiclient.discovery import build

### Accessing the YouTube API

##### Accessing the Channel

In [118]:
# key to access YouTube API
api_key = "AIzaSyAcL_fq1YQz4tDxxTHmwkAsjub0yj0c6Zo"

# interacting with the API
api_service_name = "youtube"
api_version = "v3"

youtube = build(
    api_service_name, api_version, developerKey = api_key)

request = youtube.channels().list(
    part="snippet,contentDetails,statistics",

    # unique channel id that corresponds to the channel I'm interested in
    id="UCIPPMRA040LQr5QPyJEbmXA"
)
channel_response = request.execute()


In [119]:
number_of_subscribers = int(channel_response['items'][0]['statistics']['subscriberCount'])
number_of_views = int(channel_response['items'][0]['statistics']['viewCount'])
number_of_videos = int(channel_response['items'][0]['statistics']['videoCount'])
uploads_id = channel_response['items'][0]['contentDetails']['relatedPlaylists']['uploads']

print('Here are some statistics about the channel, MrBeast Gaming:')
print("Number of subscribers:", number_of_subscribers)
print("Number of views:", number_of_views)
print("Number of videos:", number_of_videos)
print("Upload ID:", uploads_id)

Here are some statistics about the channel, MrBeast Gaming:
Number of subscribers: 30700000
Number of views: 5401897801
Number of videos: 138
Upload ID: UUIPPMRA040LQr5QPyJEbmXA


##### Accessing the Uploaded Videos

In [120]:
request = youtube.playlistItems().list(
        part="snippet,contentDetails",
        playlistId="UUIPPMRA040LQr5QPyJEbmXA"
    )
videos_response = request.execute()

videos = []
for item in videos_response['items']:
        videos.append(item['contentDetails']['videoId'])

next_page_token = videos_response.get('nextPageToken')
while next_page_token is not None:
    request = youtube.playlistItems().list(
                part='contentDetails',
                playlistId = "UUIPPMRA040LQr5QPyJEbmXA",
                maxResults = 50,
                pageToken = next_page_token)
    videos_response = request.execute()

    for item in videos_response['items']:
        videos.append(item['contentDetails']['videoId'])

    next_page_token = videos_response.get('nextPageToken')
print('We have successfully accessed', len(videos), 'videos from the channel.')
print("There are actually", number_of_videos, "videos on the channel.")
print('This is a difference of', number_of_videos - len(videos), 'videos.')

We have successfully accessed 138 videos from the channel.
There are actually 138 videos on the channel.
This is a difference of 0 videos.


##### Turning Video Information from a .JSON into a DataFrame

In [121]:
temp = []
for i in range(len(videos)):
    # getting the information about the ith video
    video_stats_request = youtube.videos().list(
            part="snippet,contentDetails,statistics",
            id = videos[i]
        )
    video_stats_request = video_stats_request.execute()
    # get video type
    video_type = video_stats_request['items'][0]['kind'].split('#')[1]
    # get title
    title = video_stats_request['items'][0]['snippet']['title']
    # get publish date
    publish_date = video_stats_request['items'][0]['snippet']['publishedAt']
    # get number of views
    views = int(video_stats_request['items'][0]['statistics']['viewCount'])
    # get number of likes
    likes = int(video_stats_request['items'][0]['statistics']['likeCount'])
    # get number of comments
    comments = int(video_stats_request['items'][0]['statistics']['commentCount'])
    # get duration
    duration = video_stats_request['items'][0]['contentDetails']['duration']

    temp.append([title, publish_date, views, likes, comments, duration, video_type])
video_statistics = pd.DataFrame(temp, columns = ['Title', 'Publish Date', 'Views', 'Likes', 'Comments', 'Duration', 'Video Type'])

In [122]:
video_statistics.head()


Unnamed: 0,Title,Publish Date,Views,Likes,Comments,Duration,Video Type
0,"If You Build It, I'll Pay For It!",2022-12-31T20:00:04Z,15484077,585229,19409,PT11M42S,video
1,World's Hardest Challenge!,2022-12-16T22:18:00Z,17339113,536531,22811,PT14M30S,video
2,100 Youtuber Minecraft Battle Royale!,2022-10-28T21:00:09Z,17476263,988578,45748,PT16M3S,video
3,"Extreme $1,000,000 Challenge!",2022-10-12T20:00:12Z,10025319,393436,11483,PT10M43S,video
4,Minecraft with Ultra Realistic Graphics!,2022-09-16T19:00:37Z,14538761,482191,12887,PT8M47S,video


### Data Cleaning

In [123]:
video_statistics.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 138 entries, 0 to 137
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Title         138 non-null    object
 1   Publish Date  138 non-null    object
 2   Views         138 non-null    int64 
 3   Likes         138 non-null    int64 
 4   Comments      138 non-null    int64 
 5   Duration      138 non-null    object
 6   Video Type    138 non-null    object
dtypes: int64(3), object(4)
memory usage: 7.7+ KB


The above code...
- Gets the data types of each of the variables
- Shows us that there are no missing values, which makes our lives much easier.
- Show that 'Publish Date' is not in a date time format

In [124]:
video_statistics['Video Type'].value_counts()

video    138
Name: Video Type, dtype: int64

In [125]:
del video_statistics['Video Type']

All video are a video, none appear to be labeled specifically as shorts. Since all of the values are the same I decided to delete is since it doesn't give us much information.

In [126]:
# duration includes H
video_statistics[video_statistics['Duration'].str.contains('H')]

Unnamed: 0,Title,Publish Date,Views,Likes,Comments,Duration


No videos are an hour long or greater.

In [127]:
# converting duration to seconds

# TO DO BEFORE SUBMISSION:
- convert duration to seconds
- convert publish to date time
- make a column that is time since published (days)
- add like to view ratio
- add comment to view ratio

### Exporting the Data to an Excel File

### Basic Exploratory Data Analysis

In [128]:
# sorting by number of views
video_statistics.sort_values(by = 'Views', ascending = False).head(10)

Unnamed: 0,Title,Publish Date,Views,Likes,Comments,Duration
36,World’s Largest Explosion!,2021-04-07T18:45:24Z,112649125,1585623,76481,PT8M32S
102,"Whatever You Build, I'll Pay For!",2020-08-06T17:00:23Z,102409927,4900968,201502,PT11M8S
53,"Minecraft, But It's Only One Block!",2020-12-17T20:14:17Z,90856862,1054493,34053,PT10M7S
50,"If You Build a House, I'll Pay For It!",2021-01-02T19:07:38Z,90054569,2217942,83514,PT10M10S
103,"Minecraft, But Everything is Random!",2020-08-02T17:17:36Z,84713803,1211642,47824,PT10M42S
47,1000 Zombies vs Mutant Enderman!,2021-01-27T19:00:23Z,79361470,1453078,50044,PT10M16S
26,I Survived 100 Days Of Hardcore Minecraft!,2021-07-22T19:46:51Z,77824399,1314813,78897,PT15M37S
83,The Most Insane 900 IQ Among Us Outplay!,2020-09-22T17:45:34Z,75610689,1582796,50645,PT10M9S
115,I Made a 100 Player Building Competition!,2020-07-03T16:30:15Z,73336475,2690968,43937,PT11M37S
18,"$45,600 Squid Game Challenge!",2021-10-14T18:00:17Z,70590498,1724239,55942,PT11M21S
