### YouTube API scraper

This notebook collects information from channels based on YouTube channel IDs. The script was adapted from this [script](https://github.com/lamthuyvo/social-media-data-scripts/blob/master/01-apis/scripts/youtube-get-channel-info.py). 

In [None]:
# —————— libraries built into Python ———————
import csv
import json
import time

# —————— libraries that need to be installed, which you can do via pip ———————
import pandas as pd
import requests

To use this script you need a YouTube API key associated with your Google account. You can sign up for one [here](https://developers.google.com/youtube/registering_an_application).

In [None]:
# YouTube credentials
YOUTUBE_API_KEY = "INSERT YOUR OWN API KEY HERE"

This is where you specify the channel ids for which you would like to compile the information. A channel ID can usually be found at the end of the URL:  https://www.youtube.com/channel/UCjnWysJh9-r9wo82zlbMT3A

When a user changed the end of their URL you can also find them via free tools online, such as [this one](https://commentpicker.com/youtube-channel-id.php).

In [None]:
# this is where we define the API query and all its variable
api_key = YOUTUBE_API_KEY
# add the YOUTUBE IDs into the lists here, the ID can usually be found at the end of the URL:  https://www.youtube.com/watch?v=tGRzz0oqgUE
channel_ids = [
   
]

Function to scrape data (with built-in 3-second break in between attempts to ping the API):

In [None]:
def get_channel_data(channel_id):
    time.sleep(3)
    # api parameters
    params = 'snippet,status,contentDetails,statistics,topicDetails,localizations'
    api_url = 'https://www.googleapis.com/youtube/v3/channels?part='+ params +'&id='+ channel_id +'&key='+api_key
    # this opens the link and tells your computer that the format it is reading is JSON
    api_response = requests.get(api_url)
    channeldetails = json.loads(api_response.text)
    print(channel_id)
    '''
    Alternatively:
    from apiclient.discovery import build
    obj = build('youtube', 'v3', developerKey=api_key)
    channeldetails = obj.channels().list(part=params, id=channel_id).execute()
    '''
    if len(channeldetails['items']) > 0:
        # Assign values from API to variables
        for item in channeldetails['items']:
            youtube_id = item['id']  
            publishedAt = item['snippet']['publishedAt']
            title = item['snippet']['title']
            description = item['snippet']['description']
            viewCount = item['statistics']['viewCount']
            subscriberCount = item['statistics']['subscriberCount']
            videoCount = item['statistics']['videoCount']
            commentCount = item['statistics'].get('commentCount')
#             country = item['snippet']['country']
            
            row = {
                    'youtube_id': youtube_id,
                    'publishedAt': publishedAt,
                    'title': title,
                    'description': description,
                    'viewCount': viewCount,
                    'subscriberCount': subscriberCount,
                    'videoCount': videoCount,
                    'commentCount': commentCount,
                }
            rows.append(row)
    else:
        print(channel_id + " is not a valid ID")



This line runs the scraper:

In [None]:
rows=[]
for channel_id in channel_ids:
    get_channel_data(channel_id)

In [None]:
len(rows)

In [None]:
channel_subscriptions = pd.DataFrame(rows).drop_duplicates()
print(len(channel_subscriptions))
channel_subscriptions.head()

In [None]:
channel_subscriptions.to_csv("../output/channel_subscriptions.csv")