# YouTube API - Part 1: Channel Specific Data Collection
## Introduction

In this lab, we will familiarize ourselves with the YouTube API. By completing this lab, you should be able to:
* Collect summary information for one or more YouTube channels
* Gather data on the channel's uploaded videos
* Gather samples of comments from each video
* Navigate and use code documentation

Some code/Python skills used in this lab include: dictionaries, JSON, while loops, try/except, and pandas Dataframes

**NOTE:** This lab is _HEAVILY_ based on [this video](https://www.youtube.com/watch?v=D56_Cx36oGY). A lot of the code comes directly from it, but it works perfectly well for our purposes. Feel free to watch it yourself for a more guided walkthrough of the process (especially setting up your API key).

### Pre-requisites
- Install the `google-api-python-client` package with the code below or directly in your terminal

In [1]:
#%pip install --upgrade google-api-python-client

Note: you may need to restart the kernel to use updated packages.


In [2]:
from googleapiclient.discovery import build
import pandas as pd

To simulate a project, we'll be focusing on analyzing a case study of controversy as it affects one YouTuber. More context will be provided in the next part, but for now, the channel is that of [Gus Johnson](https://www.youtube.com/@gustoonz) and our main question is: **How has controversy affected the metric performance of Gus Johnson's YouTube Channel?**
## Gus Johnson - Youtuber Analytic Performance In the Face of Controversy

### Connecting to the API
Your first goal in most API accesses is to get an API key - an access code that allows you to send requests to and from the platform. To get that API key, you'll need to:
* Log into Google API Console
* Create and set up a new project
* Get credentials 
* Create a new key

There are many guides to walk you through this process including 
* A document uploaded for you in Canvas <- I would start with this one
* the [official quickstart guide](https://developers.google.com/youtube/v3/quickstart/python) 
* or the previously mentioned [YouTube video](https://www.youtube.com/watch?v=D56_Cx36oGY)

In [4]:
api_key = "" #REMEMBER TO REMOVE BEFORE UPLOADING
api_service_name = "youtube"
api_version = "v3"

In [5]:
#Get credentials and create an API client
youtube = build(
    api_service_name, api_version, developerKey=api_key
)

## Collecting Channel Info

The first step is to extract our channel of interest. 
<br><br>
We can start with finding the channel id in the channel page.


In [6]:
channel_id = "UC3w193M5tYPJqF0Hi-7U-2g" # <- Save the ID to a variable
# dr berg

Using the channel ID, we can make a request to the API in this format: (Note the structure of the request - this pattern repeats for almost every request we're going to make to this API)

In [7]:
request = youtube.channels().list(
    part="snippet,contentDetails,statistics",
    id=channel_id
)

response = request.execute()

response

{'kind': 'youtube#channelListResponse',
 'etag': 'felf4od_wYRk_hZTxWgBE4mB6Mc',
 'pageInfo': {'totalResults': 1, 'resultsPerPage': 5},
 'items': [{'kind': 'youtube#channel',
   'etag': 'QJNbc9zmx1aiHbX4Lllzm4Qrmmw',
   'id': 'UC3w193M5tYPJqF0Hi-7U-2g',
   'snippet': {'title': 'Dr. Eric Berg DC',
    'description': "Dr. Eric Berg DC, age 59, discusses the truth about getting healthy and losing weight. Dr. Berg specializes in Healthy Ketosis and Intermittent Fasting. He is the director of Dr. Berg's Nutritionals, and a best-selling amazon.com author. \n\nHis book, The Healthy Keto Plan describes specific strategies on doing the healthy version of the ketogenic diet as well as intermittent\nfasting. He has conducted over 4800 seminars on health-related topics and trained over 2500 doctors world-wide in his methods. Dr. Berg breaks down confusing complex health topics into easy to understand, usable knowledge. \n\nFor more information, go to our website at www.drberg.com or call customer s

As you can see, the response is a JSON of channel attributes. We don't necessesarily want all of this, so let's simplify it out. 

#### Try it yourself

Create a function that extracts the following information from an inputted channel username (I recommend making a [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)):
* Name
* Channel Creation Date
* Subscriber Count
* View Count
* Upload Count
* ID of their uploads playlist <- This will allow us to look at their uploads later

**HINT**: Look closely at the structure of the JSON object. Track what attributes are nested under other categories.

In [8]:
def get_channel_info(userid):
    request = youtube.channels().list(
        part="snippet,contentDetails,statistics",
        id=userid
    )
    response = request.execute()
    item = response['items'][0]
    # Your solution
    return {
        'channelName': item['snippet']['title'],
        'channelStartDate': item['snippet']['publishedAt'],
        'subscribers': item['statistics']['subscriberCount'],
        'viewCount': item['statistics']['viewCount'],
        'videoCount': item['statistics']['videoCount'],
        'uploadsPlaylist': item['contentDetails']['relatedPlaylists']['uploads']
    }

In [9]:
# Test your code:
channel_info = get_channel_info(channel_id)
channel_info

{'channelName': 'Dr. Eric Berg DC',
 'channelStartDate': '2008-11-23T18:27:59Z',
 'subscribers': '11800000',
 'viewCount': '2356082819',
 'videoCount': '5171',
 'uploadsPlaylist': 'UU3w193M5tYPJqF0Hi-7U-2g'}

## Get All Uploads

Now that we've gotten our basic channel info, we can start to collect data on what they're uploading. This is where we use that playlist ID we collected to get the IDs of every video this channel has published.

To get practice using [documentation](https://developers.google.com/youtube/v3/docs), see if you can find the code to get information on items in a playlist. 

Looking at the playlist IDs that can be retrieved from the API's contentDetails.relatedPlaylists.uploads, we can see that the "UC" at the beginning of the channel ID is replaced with "UU".

The same format can be used by replacing "UC" at the beginning of the channel ID with "UUSH" to get a playlist of only short videos.

For example, a channel ID of "UCutJqz56653xV2wwSvut_hQ" will result in a playlist ID of "UUSHutJqz56653xV2wwSvut_hQ".

In [10]:
shorts_playlistid = "UUSH3w193M5tYPJqF0Hi-7U-2g"

request = youtube.playlistItems().list(
        part="snippet,contentDetails",
        playlistId=shorts_playlistid, #uploadsPlaylist
        maxResults=50
    )
response = request.execute()

You might notice that the maximum number of videos you can get from the playlist is 50. Gus Johnson and most other large YouTubers have uploaded more than that. This is where we use the "next page token". <br>
Here we can see a while loop that collects the "address" of the next page of data for as long as there is a page to go to. 

In [11]:
# Create a list to store video IDs in
video_ids = []
# Add our first page to the list
video_ids.extend([item['contentDetails']['videoId'] for item in response['items']])

next_page = response.get('nextPageToken')

while next_page is not None: 
    request = youtube.playlistItems().list(
        part='snippet,contentDetails',
        playlistId = shorts_playlistid,
        maxResults = 50,
        pageToken = next_page # The page token goes here
    )
    response = request.execute()

    video_ids.extend([item['contentDetails']['videoId'] for item in response['items']])
    next_page = response.get('nextPageToken')

len(video_ids) # Check to see if this length matches the upload count collected above

169

### Video Information Queries
Now that we've collected the video IDs, we can query each one for its own information. 

The below code uses a for loop to request information in batches. This practice is commonly used to avoid limits on how many requests one user can make at once to an API (aka a **limit** on the **rate** at which you can make requests)

In [12]:
def get_video_data(video_ids):
    video_data = []
    for i in range (0, len(video_ids), 50): # performs requests in batches to avoid rate-limiting
        request = youtube.videos().list(
            part='snippet,contentDetails,statistics',
            id=','.join(video_ids[i:i+50])
        )
        response = request.execute() #record response

        for item in response['items']:
            relevant_stats = {
                'snippet': ['title', 'description', 'tags', 'publishedAt'], 
                'statistics': ['viewCount', 'likeCount', 'commentCount'], 
                'contentDetails': ['duration', 'definition', 'caption']
            } #collects information that we care about... check documentation to choose information

            video_info = {}
            video_info['video_id'] = item['id']

            for cat in relevant_stats.keys():
                for stat in relevant_stats[cat]:
                    try:
                        video_info[stat] = item[cat][stat]
                    except:
                        video_info[stat] = None

            video_data.append(video_info)
    video_data
    return pd.DataFrame(video_data)

In [13]:
videos_df = get_video_data(video_ids)
videos_df

Unnamed: 0,video_id,title,description,tags,publishedAt,viewCount,likeCount,commentCount,duration,definition,caption
0,zqdxQWTdIM4,The Absolute Best Natural Vitamin for Arthriti...,"Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...",,2024-03-11T01:32:38Z,616020,52585,970,PT1M,hd,false
1,aYV9EWaiz_Y,3 Tips to Lose Weight While Sleeping #health #...,"Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...",,2024-03-07T21:30:23Z,508116,34743,451,PT1M,hd,false
2,G4guVvCYAEA,Discover the biggest culprit behind inflammati...,"Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...",,2024-03-07T00:00:12Z,164294,11317,212,PT53S,hd,false
3,u0-U3-f4VHM,Explore the health advantages provided by ging...,"Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...",,2024-03-06T20:14:30Z,115227,9890,241,PT40S,hd,false
4,llDg68l626M,"Craving some KFC? 🍗🍟 Before your next run, dis...","Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...",,2024-03-05T19:31:40Z,351683,22369,1079,PT59S,hd,false
...,...,...,...,...,...,...,...,...,...,...,...
164,49kS_26uvS0,The Truth About the Most Popular Vitamin!,Check out this video for the truth behind the ...,,2021-07-21T16:24:07Z,231585,14272,708,PT1M,hd,false
165,HNm-kBPv0kc,The Ideal Intermittent Fasting Ratio,Have you ever wondered what the best ratio is ...,,2021-07-19T20:10:30Z,168065,10464,615,PT57S,hd,false
166,nkW_e9RUuAc,Man Against Hill: DON'T TRY THIS AT HOME,"Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...","[dr berg, dr eric berg, dr berg goes sledding,...",2021-01-18T13:27:43Z,48488,4044,589,PT55S,hd,true
167,Gr0paxS3pvk,Kick-Start Your Weight Loss Goals with Dr. Ber...,Start my 30-day challenge and experience the m...,"[Ketogenic diet, ketosis, keto, intermittent f...",2020-01-19T22:12:04Z,37799,1163,59,PT28S,hd,false


# Combine code

In [20]:
# Define any helper functions here
def get_video_ids(playlistID):
    request = youtube.playlistItems().list(
        part='snippet,contentDetails',
        playlistId = playlistID,
        maxResults=50
    )

    response = request.execute()

    video_ids = []
    video_ids.extend([item['contentDetails']['videoId'] for item in response['items']])

    next_page = response.get('nextPageToken')

    while next_page is not None: 
        request = youtube.playlistItems().list(
            part='snippet,contentDetails',
            playlistId = playlistID,
            maxResults = 50,
            pageToken = next_page 
        )
        response = request.execute()

        video_ids.extend([item['contentDetails']['videoId'] for item in response['items']])
        next_page = response.get('nextPageToken')

    return video_ids

In [21]:
def get_channel_data(userid):
    channel_info = get_channel_info(userid)
    shorts_playlistid = "UUSH" + userid[2:]

    video_ids = get_video_ids(shorts_playlistid)
    upload_data = get_video_data(video_ids)

    return channel_info, upload_data

In [22]:
info, uploads = get_channel_data('UC3w193M5tYPJqF0Hi-7U-2g')

print(info)
uploads

{'channelName': 'Dr. Eric Berg DC', 'channelStartDate': '2008-11-23T18:27:59Z', 'subscribers': '11800000', 'viewCount': '2356082819', 'videoCount': '5171', 'uploadsPlaylist': 'UU3w193M5tYPJqF0Hi-7U-2g'}


Unnamed: 0,video_id,title,description,tags,publishedAt,viewCount,likeCount,commentCount,duration,definition,caption
0,zqdxQWTdIM4,The Absolute Best Natural Vitamin for Arthriti...,"Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...",,2024-03-11T01:32:38Z,616065,52591,970,PT1M,hd,false
1,aYV9EWaiz_Y,3 Tips to Lose Weight While Sleeping #health #...,"Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...",,2024-03-07T21:30:23Z,508128,34746,451,PT1M,hd,false
2,G4guVvCYAEA,Discover the biggest culprit behind inflammati...,"Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...",,2024-03-07T00:00:12Z,164302,11319,212,PT53S,hd,false
3,u0-U3-f4VHM,Explore the health advantages provided by ging...,"Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...",,2024-03-06T20:14:30Z,115228,9892,241,PT40S,hd,false
4,llDg68l626M,"Craving some KFC? 🍗🍟 Before your next run, dis...","Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...",,2024-03-05T19:31:40Z,351697,22372,1080,PT59S,hd,false
...,...,...,...,...,...,...,...,...,...,...,...
164,49kS_26uvS0,The Truth About the Most Popular Vitamin!,Check out this video for the truth behind the ...,,2021-07-21T16:24:07Z,231585,14272,708,PT1M,hd,false
165,HNm-kBPv0kc,The Ideal Intermittent Fasting Ratio,Have you ever wondered what the best ratio is ...,,2021-07-19T20:10:30Z,168066,10464,615,PT57S,hd,false
166,nkW_e9RUuAc,Man Against Hill: DON'T TRY THIS AT HOME,"Dr. Eric Berg DC Bio:\nDr. Berg, age 58, is a ...","[dr berg, dr eric berg, dr berg goes sledding,...",2021-01-18T13:27:43Z,48488,4044,589,PT55S,hd,true
167,Gr0paxS3pvk,Kick-Start Your Weight Loss Goals with Dr. Ber...,Start my 30-day challenge and experience the m...,"[Ketogenic diet, ketosis, keto, intermittent f...",2020-01-19T22:12:04Z,37799,1163,59,PT28S,hd,false


## BONUS - Getting Comments

Another way to measure reception of a video beyond the likes and views is the comments the audience leaves. We can do all kinds of text analysis with this data later on.

This method uses something called a [try/except](https://docs.python.org/3/tutorial/errors.html#handling-exceptions) which basically runs the code in the try section, but if any errors occur it switches to the except section. Here, the function attempts to collect comment data on an inputted list of videos, but if one or more videos have comments disabled the code prints out 'Could not get comments for video {video_id}' instead of stopping the program.

In [14]:
# modified function to get 100 comments per video
def get_comments_in_videos(video_ids):
    all_comments = []

    for video_id in video_ids:
        try:
            comments_in_video = []

            request = youtube.commentThreads().list(
                part="snippet",
                videoId=video_id,
                maxResults=100  # Adjust this number as per your requirements
            )

            while request:
                response = request.execute()

                for comment in response['items']:
                    comments_in_video.append(comment['snippet']['topLevelComment']['snippet']['textOriginal'])

                request = youtube.commentThreads().list_next(request, response)

            comments_in_video_info = {'video_id': video_id, 'comments': comments_in_video}
            all_comments.append(comments_in_video_info)

        except Exception as e:
            print(f'Could not get comments for video {video_id}: {str(e)}')

    return pd.DataFrame(all_comments)

In [15]:
#get comments with likes. We pull 100 comments, and then find out the top 10 likes from the 100 comments
def get_comments_with_likes(video_ids):
  all_comments = []

  for video_id in video_ids:
    try:
      comments_with_likes = []

      request = youtube.commentThreads().list(
          part="snippet",
          videoId=video_id,
          maxResults=100  # Adjust this number as per your requirements
      )

      while request:
        response = request.execute()

        for comment in response['items']:
          comment_text = comment['snippet']['topLevelComment']['snippet']['textOriginal']
          like_count = comment['snippet']['topLevelComment']['snippet']['likeCount']
          comments_with_likes.append((comment_text, like_count))

        request = youtube.commentThreads().list_next(request, response)

      # Sort comments by like_count (descending order)
      comments_with_likes.sort(key=lambda x: x[1], reverse=True)

      comments_in_video_info = {'video_id': video_id, 'comments': comments_with_likes}
      all_comments.append(comments_in_video_info)

    except Exception as e:
      print(f'Could not get comments for video {video_id}: {str(e)}')

  return pd.DataFrame(all_comments)

In [15]:
comments = get_comments_with_likes(videos_df['video_id'])
comments

NameError: name 'get_comments_with_likes' is not defined

In [16]:
comments = get_comments_in_videos(videos_df['video_id'])
comments

Unnamed: 0,video_id,comments
0,zqdxQWTdIM4,"[🙄😒, Is nicotinamide riboside (NR-E) the same ..."
1,aYV9EWaiz_Y,[Have tested these and boy they are golden kno...
2,G4guVvCYAEA,"[Let’s do it! Let’s make it illegal!, People ..."
3,u0-U3-f4VHM,[I have used gimger for many years and still h...
4,llDg68l626M,"[I cringe when I hear the words ""fact checked""..."
...,...,...
164,49kS_26uvS0,"[Centrum upset my stomach too much, I stopped ..."
165,HNm-kBPv0kc,"[❤, Does having a liquid protein shake break a..."
166,nkW_e9RUuAc,"[so funny, Really? Was this your first time se..."
167,Gr0paxS3pvk,[Am going to start November 1st . I hope I sta...


In [17]:
comments.to_csv('comments.csv')