# YouTube Comments via YouTube API

This notebook utilizes Google's YouTube Data API to generate a dataset of YouTube comments from the top 15 YouTube channels by subscriber count in Japan on each channel's top 10 most viewed videos and outputs the data to a CSV.

To Do:
1) ~~Check for successful request from API~~
2) ~~Make function that can pull comments using video ID~~
3) ~~Make function that adds relevant parts of reponse to a list of dictionaries~~
4) Get video ids necessary
5) ~~Iterate through ids and use created functions to generate DF~~
6) Import DF to CSV

Notes:
* Google allows 10000 units on their quota; listing comments uses 1 unit but when testing check this
* Listing comments has a max value of 100 - assuming each video has at least 100 comments, should result in 15,000 comments + replies
* However might not be able to filter language here so less than that
* While YouTube search can be used to get video_id data, quota cost is high

In [None]:
import os
import pandas as pd
from dotenv import load_dotenv
from googleapiclient.discovery import build
from time import sleep

In [None]:
# Load in credentials from environment variables
load_dotenv()
API_KEY = os.getenv('API_KEY')

# Initialize API client
youtube = build(
    'youtube', 'v3', developerKey=API_KEY
)

In [None]:
def retrieve_comments(video_id, max_results=10):
    
    # Make request to API and save as a variable
    request = youtube.commentThreads().list(
        part='snippet,replies',
        maxResults=max_results,
        order='relevance',
        videoId=video_id
    )
    try:
        response = request.execute()
        
        return response
    except:
        return None

In [None]:
test_response = retrieve_comments(video_id='4V0UAhe8o5c')
test_response

In [None]:
test_response['items'][1]

In [None]:
print(len(test_response['items'][1]['replies']['comments']))
test_response['items'][1]['replies']['comments']

## Making the function to add to a DF

1) Create empty list
2) Iterate through response['items']
3) Create dictionary to hold top level comment
4) Add dictionary
5) Check for replies - if response['items][i]['snippet']['totalReplyCount']
6) Iterate through replies
7) Create empty dictionary for each reply
8) Add information from snippet

When running through the response, will use a dictionary:
{channel_id: [video_ids]}

Columns desired for DataFrame:
* channel - from dictionary used to iterate through
* video_id - from dictionary used to iterate through
* text
    * top-level: response['items'][i]['snippet']['topLevelComment']['snippet']['textOriginal']
    * reply: response['items'][i]['replies']['comments'][i]['snippet']['textOriginal']
* date_published
    * top-level: response['items'][i]['snippet']['topLevelComment']['snippet']['publishedAt']
    * reply: response['items'][i]['replies']['comments'][i]['snippet']['publishedAt']

In [None]:
def extract_info(response, youtuber, video_id):
    
    comment_data = []
    comment_thread = response['items']
    
    for item in comment_thread:
        
        # Grab the top-level comment first
        comment_data.append({
            'channel': youtuber,
            'video_id': video_id,
            'text': item['snippet']['topLevelComment']['snippet']['textOriginal'],
            'date_published': item['snippet']['topLevelComment']['snippet']['publishedAt']
        })
        
        # Check if there are replies and get same info if there are
        if 'replies' in item.keys():
            
            replies = item['replies']['comments']

            for reply in replies:

                comment_data.append({
                    'channel': youtuber,
                    'video_id': video_id,
                    'text': reply['snippet']['textOriginal'],
                    'date_published': reply['snippet']['publishedAt']
                })
        
    return pd.DataFrame(comment_data)

In [None]:
# Dictionary of YouTube Channels and top 10 videos
youtubers = {'Junya Official Channel': ['4V0UAhe8o5c', '0dGh2KWJd84', 'c10am2Y1xfo', 'dYgIyCtyVXM', 'uHxJDYjzuVs',
                                        'KnxxMhLcO2Q', 'C8GtKZDXTAk', 'YKqX_ABcI_M', '6C9P1q3oon4', 'wcCNqumbM-I'],
             'Sagawa /さがわ': ['VVrM6JOX6gA', 'hSv5eJKniaQ', 'rSsqD1usaBM', 'zhnElWMuT0w', 'uWw8sfnZjIo',
                               'a7ViRAx1iE0', 'FmyncJTqaQ', 'j5aR4-Bj1aE', 'nGGI_luJsO0', 'xe9XiS9AtNk']}

In [None]:
# Initialize a DataFrame to store data
comments_df = pd.DataFrame(columns=['channel',
                                    'video_id',
                                    'text',
                                    'date_published'])

In [None]:
# Loop through youtubers to add data to DF
for youtuber, videos in youtubers.items():
    
    print(f'Getting comments for channel: {youtuber}')
          
    for video in videos:
        
        # Query the YouTube Data API
        response = retrieve_comments(video)
        sleep(3)
        
        # Add the data from the response to the DF
        if response:
            data = extract_info(response, youtuber=youtuber, video_id=video)
            comments_df = pd.concat([comments_df, data])

In [None]:
comments_df