# YouTube Comments via YouTube API

This notebook utilizes Google's YouTube Data API to generate a dataset of YouTube comments from the top 15 YouTube channels by subscriber count in Japan on each channel's top 10 most viewed videos and outputs the data to a CSV.

To Do:
1) ~~Check for successful request from API~~
2) ~~Make function that can pull comments using video ID~~
3) ~~Make function that adds relevant parts of reponse to a list of dictionaries~~
4) Get video ids necessary
5) Iterate through ids and use created functions to generate DF
6) Import DF to CSV

Notes:
* Google allows 10000 units on their quota; listing comments uses 1 unit but when testing check this
* Listing comments has a max value of 100 - assuming each video has at least 100 comments, should result in 15,000 comments
* However might not be able to filter language here so less than that

In [None]:
import os
import pandas as pd
from dotenv import load_dotenv
from googleapiclient.discovery import build

In [None]:
# Load in credentials from environment variables
load_dotenv()
API_KEY = os.getenv('API_KEY')

# Initialize API client
youtube = build(
    'youtube', 'v3', developerKey=API_KEY
)

In [None]:
def retrieve_comments(video_id, max_results=10):
    
    # Make request to API and save as a variable
    request = youtube.commentThreads().list(
        part='snippet,replies',
        maxResults=max_results,
        order='relevance',
        videoId=video_id
    )
    response = request.execute()
    
    return response

In [None]:
test_response = retrieve_comments(video_id='4V0UAhe8o5c')
test_response

In [None]:
test_response['items'][1]

In [None]:
print(len(test_response['items'][1]['replies']['comments']))
test_response['items'][1]['replies']['comments']

## Making the function to add to a DF

1) Create empty list
2) Iterate through response['items']
3) Create dictionary to hold top level comment
4) Add dictionary
5) Check for replies - if response['items][i]['snippet']['totalReplyCount']
6) Iterate through replies
7) Create empty dictionary for each reply
8) Add information from snippet

When running through the response, will use a dictionary:
{channel_id: [video_ids]}

Columns desired for DataFrame:
* comment_id 
    * top-level: response['items'][i]['snippet']['topLevelComment']['id']
    * reply: response['items'][i]['replies']['comments'][i]['id]
* channel - from dictionary used to iterate through
* video_id - from dictionary used to iterate through
* text
    * top-level: response['items'][i]['snippet']['topLevelComment']['snippet']['textOriginal']
    * reply: response['items'][i]['replies']['comments'][i]['snippet']['textOriginal']
* date_published
    * top-level: response['items'][i]['snippet']['topLevelComment']['snippet']['publishedAt']
    * reply: response['items'][i]['replies']['comments'][i]['snippet']['publishedAt']

In [None]:
def extract_info(response):
    
    comment_data = []
    comment_thread = response['items']
    
    for item in comment_thread:
        
        # Grab the top-level comment first
        comment_data.append({
            'comment_id': item['snippet']['topLevelComment']['id'],
            'channel': 'placeholder',
            'video_id': 'placeholder',
            'text': item['snippet']['topLevelComment']['snippet']['textOriginal'],
            'date_published': item['snippet']['topLevelComment']['snippet']['publishedAt']
        })
        
        # Check if there are replies and get same info if there are
        if item['snippet']['totalReplyCount']:
            
            replies = item['replies']['comments']
            
            for reply in replies:
                
                comment_data.append({
                    'comment_id': reply['id'],
                    'channel': 'placeholder',
                    'video_id': 'placeholder',
                    'text': reply['snippet']['textOriginal'],
                    'date_published': reply['snippet']['publishedAt']
                })
        
    return comment_data

In [None]:
test_data = extract_info(test_response)
print(len(test_data))
test_data