In [30]:
import pandas as pd

from googleapiclient.discovery import build

API_KEY = os.environ.get('API_KEY')

# Getting statistics of YouTube Videos

We will be using Google's YouTube v3 API to scrape the video statistics of each video uploaded by GitHub's YouTube 
channel.\

To get the video id of each video uploaded by a channel, please refer to my GitHub: https://bit.ly/3fG7FP8 \
Which shows you the process of getting information on each video uploaded by a channel.

First we'll read in our csv file into a DataFrame and have a preview our data:\
The __head( )__ method will give us our first 5 entries by default, but you can pass in any value.

In [8]:
df = pd.read_csv('videoids.csv')
df.head(10)

Unnamed: 0,Date,Video ID,Title
0,2020-05-08,NuonD5G28L8,Closing remarks - GitHub Satellite 2020
1,2020-05-08,pYzfGaLTqC0,Finding security vulnerabilities in JavaScript...
2,2020-05-08,nvCd0Ee4FgE,Finding security vulnerabilities in Java with ...
3,2020-05-08,PYsZeFTdJ50,Continuous delivery with GitHub Actions - GitH...
4,2020-05-08,cyh8DU2QPzg,Continuous integration with GitHub Actions - G...
5,2020-05-08,wcxOJq9YemE,Building GitHub integrations with webhooks and...
6,2020-05-08,l3g41dGObJ4,Building automations with GitHub Apps and Grap...
7,2020-05-08,ECdxifljjE4,ART MACHINES: fostering digital creativity thr...
8,2020-05-07,dy2eYaNxaQc,How to get from idea to contribution in minute...
9,2020-05-08,U8NpT_myO9A,Algorithmic Positivity Symphony - GitHub Satel...


#### Grab the video IDs column and turn it into a list:

In [41]:
vid_ids = df['Video ID'].to_list()
print(f'The length of vid_ids is: {len(vid_ids)}.')
print(f'Preview of list: {vid_ids[:10]}.')

The length of vid_ids is: 521.
Preview of list: ['NuonD5G28L8', 'pYzfGaLTqC0', 'nvCd0Ee4FgE', 'PYsZeFTdJ50', 'cyh8DU2QPzg', 'wcxOJq9YemE', 'l3g41dGObJ4', 'ECdxifljjE4', 'dy2eYaNxaQc', 'U8NpT_myO9A'].


# Data Collection:


#### We will be collecting the following data for each video: 
Video Title, the channel name, time and date published, video description, the amount of views the video has, both like and dislike counts, the amount of comments a video got, and the tags used for the video.

YouTube tags,also known as __video tags__, are important key phrases that help with video visibility when a certain topic is being searched.

#### **Google's definition of tags:**
*Tags are descriptive keywords you can add to your video to help viewers find your content. Your video’s title, thumbnail, and description are more important pieces of metadata for your video’s discovery. These main pieces of information help viewers decide which videos to watch.*\

You can see how important it is for a new/smaller channels to use the right tags in order to gain traction for their channel.

#### Here is a function to scrape YouTube tags given to each video by the channel uploading it.
If no tags were provided for a video, then we'll return N/A.\

In [78]:
def get_video_tags(vid_response):
    tags = []
    try:
        for tag in vid_response['items'][0]['snippet']['tags']:
            tags.append(tag)
    except Exception:
        error = 'N/A'
        return error
    return tags

#### I also created some error handling functions in case some of the specified information for each video is not available for public view:

In [81]:
def get_comment_count(vid_reponse):
    try:
        comments = vid_response['items'][0]['statistics']['commentCount']
    except Exception:
        error = 'Comments disabled/turned off.'
        return error
    return comments

In [82]:
def get_video_desc(vid_reponse):
    try:
        description = response ['items'][0]['snippet']['localized']['description'].replace('\n', '')
    except Exception:
        error = 'N/A'
        return error
    return description

In [83]:
def get_like_count(vid_reponse):
    try:
        likes = response['items'][0]['statistics']['likeCount']
    except Exception:
        error = 'Like count hidden.'
        return error
    return likes

In [84]:
def get_dislike_count(vid_reponse):
    try:
        dislikes = response['items'][0]['statistics']['dislikeCount']
    except Exception:
        error = 'Dislike count hidden.'
        return error
    return dislikes

### First we have to build the service object using the build() function in order to use the YouTube service.

To read in depth information please refer to the **google-api-python-client** documentation: https://bit.ly/2zPOrrg

In [28]:
youtube_service_example = build('youtube', 'v3', developerKey=API_KEY)

The videos instance method returns the resource that we need for this case. We will also pass in both the **statistics** and **snippet** parameters into the **part** argument within the **list** method to specify the information we want in our response. \
**statistics** and **snippet** give us different responses that we'll need:
#### 'statistics' parameter example:

In [23]:
git_vid_id = 'w3jLJU7DT5E'
stats_response = youtube_service_example.videos().list(id = git_vid_id,
                                 part='statistics').execute()
stats_response

{'kind': 'youtube#videoListResponse',
 'etag': 'YA6bQNroKFs6oh3VdXSOwKhLOAM',
 'items': [{'kind': 'youtube#video',
   'etag': 'GGoWXR1ApqRBIoOLWhfu8EwuqbU',
   'id': 'w3jLJU7DT5E',
   'statistics': {'viewCount': '869773',
    'likeCount': '15773',
    'dislikeCount': '327',
    'favoriteCount': '0',
    'commentCount': '532'}}],
 'pageInfo': {'totalResults': 1, 'resultsPerPage': 1}}

#### 'snippet' parameter example:

In [24]:
snippet_response = youtube_service_example.videos().list(id = git_vid_id,
                                 part='snippet').execute()
snippet_response

{'kind': 'youtube#videoListResponse',
 'etag': 'yJYDsbcHwo-CBYSDBGryzG9lTrg',
 'items': [{'kind': 'youtube#video',
   'etag': 'iU6_F1ZCYhkqWhUww9IfEj601WU',
   'id': 'w3jLJU7DT5E',
   'snippet': {'publishedAt': '2016-12-19T19:47:35Z',
    'channelId': 'UC7c3Kb6jYCRj4JOHHZTxKsQ',
    'title': 'What is GitHub?',
    'description': "Ever wondered how GitHub works? Let's see how Eddie and his team use GitHub.\n\nAs always, feel free to leave us a comment below and don't forget to subscribe: http://bit.ly/subgithub\n\nThanks!\n\nConnect with us.\nFacebook: http://fb.com/github\nTwitter: http://twitter.com/github\nGoogle+: http://google.com/+github\nLinkedIn: http://linkedin.com/company/github\n\nAbout GitHub\nGitHub is the best place to share code with friends, co-workers, classmates, and complete strangers. Millions of people use GitHub to build amazing things together. For more info, go to http://github.com",
    'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/w3jLJU7DT5E/defaul

In [42]:
# For building the service object:
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'

# For API key stored as an environmentbal variable
import os
#For Timer
import time

# Our Final Code:

The information we want is retrieved and then added into a list. I added some timers and print statements to see how long it takes to make a service call for each video and how long it takes to scrape all of the desired videos.\
We'll only return the time it took to retrieve the first 20 videos or else this whole notebook will be full of retrieve statuses.

In [85]:
total_time = time.time()

youtube = build(API_SERVICE_NAME, API_VERSION, developerKey=API_KEY)

results = []
count = 1
for video_id in vid_ids:
    
    start = time.time()
    
    response = youtube.videos().list(id = video_id,
                                     part='statistics,snippet').execute()
    
    title = response['items'][0]['snippet']['title']
    channel_name = response['items'][0]['snippet']['channelTitle']
    published_date = response['items'][0]['snippet']['publishedAt'][:10]
    published_time = response['items'][0]['snippet']['publishedAt'][11:19]
    view_count = response['items'][0]['statistics']['viewCount']
    dislike_count = get_like_count(response)
    like_count = get_like_count(response)
    description = get_video_desc(response)
    comment_count = get_comment_count(response)
    tags = get_video_tags(response)
    
    data = {
        'Video_ID' : video_id,
        'Title' : title,
        'Channel' : channel_name,
        'Publish_Date' : published_date,
        'Publish_Time' : published_time,
        'Views' : view_count,
        'Likes' : like_count,
        'Disliked' : dislike_count,
        'Comment_Count' : comment_count,
        'Video_Tags' : tags,
        'Video_Description' : description
    }
    
    results.append(data)
    
    end = time.time() - start
    
    if count < 21:
        print(f'Got YouTube Video #{count}: "{title}" in {round(end,4)} seconds')
    else: 
        pass
    
    count += 1
    time.sleep(1)
    
total_end_time = time.time() - total_time
print()
print(f'Finished fetching all video data in {total_end_time}.')

Got YouTube Video #1: "Closing remarks - GitHub Satellite 2020" in 0.2321 seconds
Got YouTube Video #2: "Finding security vulnerabilities in JavaScript with CodeQL - GitHub Satellite 2020" in 0.1329 seconds
Got YouTube Video #3: "Finding security vulnerabilities in Java with CodeQL - GitHub Satellite 2020" in 0.1312 seconds
Got YouTube Video #4: "Continuous delivery with GitHub Actions - GitHub Satellite 2020" in 0.1232 seconds
Got YouTube Video #5: "Continuous integration with GitHub Actions - GitHub Satellite 2020" in 0.1274 seconds
Got YouTube Video #6: "Building GitHub integrations with webhooks and REST- GitHub Satellite 2020" in 0.1085 seconds
Got YouTube Video #7: "Building automations with GitHub Apps and GraphQL - GitHub Satellite 2020" in 0.1298 seconds
Got YouTube Video #8: "ART MACHINES: fostering digital creativity through live coding and ML - GitHub Satellite 2020" in 0.1256 seconds
Got YouTube Video #9: "How to get from idea to contribution in minutes - GitHub Satellite 

#### Lastly we'll want to save and export our results into a csv or xlsx file.

I'll be posting the data onto my GitHub: https://github.com/stephanie-y

In [93]:
df = pd.DataFrame(results)
df.head(10)

Unnamed: 0,Video_ID,Title,Channel,Publish_Date,Publish_Time,Views,Likes,Disliked,Comment_Count,Video_Tags,Video_Description
0,NuonD5G28L8,Closing remarks - GitHub Satellite 2020,GitHub,2020-05-08,17:46:48,2349,42,42,Comments disabled/turned off.,"[git, github, github universe, github satellit...","Presented by Erica Brescia, COO, GitHubGitHub ..."
1,pYzfGaLTqC0,Finding security vulnerabilities in JavaScript...,GitHub,2020-05-08,06:00:18,1629,50,50,Comments disabled/turned off.,"[git, github, github universe, github satellit...",CodeQL is GitHub's expressive language and eng...
2,nvCd0Ee4FgE,Finding security vulnerabilities in Java with ...,GitHub,2020-05-08,06:00:18,1083,30,30,Comments disabled/turned off.,"[git, github, github universe, github satellit...",CodeQL is GitHub's expressive language and eng...
3,PYsZeFTdJ50,Continuous delivery with GitHub Actions - GitH...,GitHub,2020-05-08,06:00:17,437,8,8,Comments disabled/turned off.,"[git, github, github universe, github satellit...",GitHub Actions gives us the power to use our r...
4,cyh8DU2QPzg,Continuous integration with GitHub Actions - G...,GitHub,2020-05-08,06:00:18,1142,23,23,Comments disabled/turned off.,"[git, github, github universe, github satellit...","""GitHub Actions gives teams world-class CI cap..."
5,wcxOJq9YemE,Building GitHub integrations with webhooks and...,GitHub,2020-05-08,06:00:29,439,15,15,Comments disabled/turned off.,"[git, github, github universe, github satellit...",Webhooks are valuable tools for powering real-...
6,l3g41dGObJ4,Building automations with GitHub Apps and Grap...,GitHub,2020-05-08,06:00:18,763,13,13,Comments disabled/turned off.,"[git, github, github universe, github satellit...",Have you ever wondered how to make cool robots...
7,ECdxifljjE4,ART MACHINES: fostering digital creativity thr...,GitHub,2020-05-08,06:00:18,341,15,15,Comments disabled/turned off.,,Creative Coders use machines to create aesthet...
8,dy2eYaNxaQc,How to get from idea to contribution in minute...,GitHub,2020-05-07,22:27:40,1652,57,57,Comments disabled/turned off.,"[git, github, github universe, github satellit...","Presented by Sasha Rosenbaum, Product Manager,..."
9,U8NpT_myO9A,Algorithmic Positivity Symphony - GitHub Satel...,GitHub,2020-05-08,06:00:34,826,13,13,Comments disabled/turned off.,"[git, github]","During the time of lockdown and isolation, Myn..."


In [94]:
df.to_csv('git_video_info.csv', index=False, header=True)