# YouTube Analysis

## Google APIs

* Create a [Google Cloud Project](https://console.developers.google.com/cloud-resource-manager), and save the **project ID**.
* Click the Google APIs logo at the top-left corner.
* Click "Library" on the left menu, search for "YouTube Data API v3", and enable it.
* Click "Credentials" on the left menu and create an "OAuth 2.0 client ID" where the application type is "Other".
* Click the client ID you just created and save the **client ID** and the **client secret**.

## Authentication

Create `client_secrets.json` using your project ID, client ID, and client secret:

```json
{
  "installed": {
    "project_id":"your_project_id",
    "client_id":"your_client_id",
    "client_secret":"your_client_secret",
    "auth_uri":"https://accounts.google.com/o/oauth2/auth",
    "token_uri":"https://accounts.google.com/o/oauth2/token",
    "redirect_uris":["urn:ietf:wg:oauth:2.0:oob","http://localhost"],
    "auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs"
  }
}
```

In [1]:
!pip install google-auth-httplib2
!pip install google-auth-oauthlib
!pip install google-api-python-client

from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build

CLIENT_SECRETS_FILE = "src/client_secrets.json"
SCOPES = ['https://www.googleapis.com/auth/youtube.force-ssl']
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'

def get_authenticated_service():
    flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRETS_FILE, SCOPES)
    credentials = flow.run_console()
    return build(API_SERVICE_NAME, API_VERSION, credentials=credentials)

## Comments per Video

https://developers.google.com/youtube/v3/docs/commentThreads/list

In [3]:
def remove_empty_kwargs(**kwargs):
    return {k: v for k, v in kwargs.items() if v} if kwargs is not None else kwargs

def comment_threads_list_by_video_id(client, **kwargs):
    kwargs = remove_empty_kwargs(**kwargs)
    response = client.commentThreads().list(**kwargs).execute()
    return response

In [29]:
import json
import os

os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '0'
client = get_authenticated_service()

response = comment_threads_list_by_video_id(
    client,
    videoId='eWnOU6CAvE0',
    part='snippet,replies',
    maxResults=100)

out_file = 'youtube-video.json'

with open(out_file, 'w') as fout:
    json.dump(response, fout, indent=2)

In [22]:
print(response.keys())

dict_keys(['kind', 'etag', 'nextPageToken', 'pageInfo', 'items'])


## Exercise

The above code retrieves only the most recent 100 comments.
Write a function that retrieves all comments from the specific video by adding the `pageToken` field to `kwargs` using the `nextPageToken` field in the `response`.

In [26]:
def comment_items_by_video_id(client, **kwargs):
    kwargs = remove_empty_kwargs(**kwargs)
    response = client.commentThreads().list(**kwargs).execute()
    items = response['items']
    
    while 'nextPageToken' in response:
        kwargs['pageToken'] = response['nextPageToken']
        response = client.commentThreads().list(**kwargs).execute()
        items.extend(response['items'])
    
    return items

In [27]:
import datetime

video_id = 'eWnOU6CAvE0'

items = comment_items_by_video_id(
    client,
    videoId=video_id,
    part='snippet,replies',
    maxResults=100)

time = datetime.datetime.now().strftime('-%Y-%m-%d-%H-%M-%S')
out_file = video_id + time + '.json'

with open(out_file, 'w') as fout:
    json.dump(items, fout, indent=2)

In [28]:
print(len(items))

305


## Exercise

The JSON file saved by the above code includes lots of meta information.
Write a function that extracts only comments (including replies) in text using the `textOriginal` field.
Note that YouTube currently supports replies only for top-level comments as indicated in their [documentation](https://developers.google.com/youtube/v3/docs/comments/list).

In [68]:
def extract_comments(items):
    texts = []
    for item in items:
        text = item['snippet']['topLevelComment']['snippet']['textOriginal'].strip()
        if text: texts.append(text)
        
        if 'replies' in item:
            for comment in item['replies']['comments']:
                text = comment['snippet']['textOriginal'].strip()
                if text: texts.append(text)

    return texts

In [76]:
items = json.load(open(out_file))
texts = extract_comments(items)

with open(out_file+'.txt', 'w') as fout:
    for i, text in enumerate(texts):
        fout.write('# comment_id=%d\n' % i)
        fout.write(text+'\n')

In [77]:
print(len(texts))

445


## Comments per Channel

https://developers.google.com/youtube/v3/docs/search/list

In [80]:
def channel_items(client, **kwargs):
    kwargs = remove_empty_kwargs(**kwargs)
    response = client.search().list(**kwargs).execute()
    items = response['items']
    
    while 'nextPageToken' in response:
        kwargs['pageToken'] = response['nextPageToken']
        response = client.search().list(**kwargs).execute()
        items.extend(response['items'])
        
    return items

In [81]:
channel_id = 'UCVSSpcmZD2PwPBqb8yKQKBA'

items = channel_items(
    client,
    part='snippet',
    channelId=channel_id,
    order='date',
    publishedAfter='2018-04-01T00:00:00Z',
    safeSearch='none',
    type='video',
    maxResults=50)

out_file = channel_id + '.json'

with open(out_file, 'w') as fout:
    json.dump(items, fout, indent=2)

In [83]:
print(len(items))

116


## Exercise

Write a function that creats a list of dictionaries, where each dictionary consists of the following information from every video in the channel:

* `videoId`
* `title`
* `description`
* `comments`: comment items from the specific video.

In [96]:
def channel_comments(client, **kwargs):
    c_items = channel_items(client, **kwargs)
    data = []

    for c_item in c_items:
        videoId = c_item['id']['videoId']
        snippet = c_item['snippet']
        title = snippet['title']
        description = snippet['description']
        print(title)
        
        items = comment_items_by_video_id(
            client,
            videoId=video_id,
            part='snippet,replies',
            maxResults=100)
        data.append({'title':title, 'description':description, 'comments': items})
    
    return data

In [97]:
channel_id = 'UCVSSpcmZD2PwPBqb8yKQKBA'

data = channel_comments(
    client,
    part='snippet',
    channelId=channel_id,
    order='date',
    publishedAfter='2018-04-17T00:00:00Z',
    safeSearch='none',
    type='video',
    maxResults=50)

time = datetime.datetime.now().strftime('-%Y-%m-%d-%H-%M-%S')
out_file = channel_id + time + '.json'

with open(out_file, 'w') as fout:
    json.dump(data, fout, indent=2)

[FULL] Alvin Gentry on Jrue Holiday: Show me a better two-way player in the league | NBA on ESPN
[FULL] Terry Stotts on being down 0-2: 'We're very capable of winning' the next two | NBA on ESPN
[FULL] Jaylen Brown after Celtics' Game 2 win: 'We just keep proving people wrong' | NBA on ESPN
[FULL] Giannis Antetokounmpo on Bucks' Game 2 loss: 'We didn't show up tonight' | NBA on ESPN
[FULL] Brad Stevens on Jaylen Brown's 30-point game: 'Jaylen loves the moment' | NBA on ESPN
[FULL] Kelly Oubre Jr. responds to Drake's trash talk: 'That's my guy, though' | NBA on ESPN
[FULL] John Wall after Game 2 loss: We're 'desperate, but we have a lot of confidence' | NBA on ESPN
[FULL] Kyle Lowry and DeMar DeRozan's great banter during Game 2 news conference | NBA on ESPN
[FULL] Scott Brooks after DeMar DeRozan's 37 points: 'A great player making plays' | NBA on ESPN
Kevin Love on Game 1 loss: 'Sometimes you need to get hit in the mouth' | NBA on ESPN
[FULL] Gregg Popovich on loss to Warriors: 'I don