# Fetching YouTube comments and classifying them

In this notebook we fetch comments from YouTube and classify them using the model trained in the previous notebook."

See this page for a handy way of setting up access to the YouTube data API: [https://python.gotrained.com/youtube-api-extracting-comments/](https://python.gotrained.com/youtube-api-extracting-comments/)

YouTube API
[https://developers.google.com/youtube/v3/docs/commentThreads/list](https://developers.google.com/youtube/v3/docs/commentThreads/list)

In [4]:
!pip install google-api-python-client
!pip install google-auth google-auth-oauthlib google-auth-httplib2

Looking in indexes: https://packages.dns.ad.zopa.com/artifactory/api/pypi/pypi-python2711-virtual/simple/
[33mYou are using pip version 10.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Looking in indexes: https://packages.dns.ad.zopa.com/artifactory/api/pypi/pypi-python2711-virtual/simple/
Collecting google-auth-oauthlib
  Downloading https://packages.dns.ad.zopa.com/artifactory/api/pypi/pypi-python2711-virtual/packages/7b/b8/88def36e74bee9fce511c9519571f4e485e890093ab7442284f4ffaef60b/google_auth_oauthlib-0.4.1-py2.py3-none-any.whl
Collecting requests-oauthlib>=0.7.0 (from google-auth-oauthlib)
  Downloading https://packages.dns.ad.zopa.com/artifactory/api/pypi/pypi-python2711-virtual/packages/a3/12/b92740d845ab62ea4edf04d2f4164d82532b5a0b03836d4d4e71c6f3d379/requests_oauthlib-1.3.0-py2.py3-none-any.whl
Collecting oauthlib>=3.0.0 (from requests-oauthlib>=0.7.0->google-auth-oauthlib)
[?25l  Downloading https:/

In [11]:
import os
import pickle

Enter my personal YouTube client keys.

In [9]:
CLIENT_SECRETS_FILE = 'client_secret_1049876915637-7ia95c7rg5teak6crcuodies22keluuh.apps.googleusercontent.com.json'

Specify the scope of this application.

In [5]:
SCOPES = ['https://www.googleapis.com/auth/youtube.force-ssl']
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'

In [6]:
import google.oauth2.credentials

from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google_auth_oauthlib.flow import InstalledAppFlow

def get_authenticated_service():
    credentials = None
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            credentials = pickle.load(token)
    #  Check if the credentials are invalid or do not exist
    if not credentials or not credentials.valid:
        # Check if the credentials have expired
        if credentials and credentials.expired and credentials.refresh_token:
            credentials.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                CLIENT_SECRETS_FILE, SCOPES)
            credentials = flow.run_console()
 
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(credentials, token)
 
    return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)

In [14]:
# When running locally, disable OAuthlib's HTTPs verification. When
# running in production *do not* leave this option enabled.
os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'
service = get_authenticated_service()

Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=1049876915637-7ia95c7rg5teak6crcuodies22keluuh.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fyoutube.force-ssl&state=xfvM85KQNj7cVxvpWQx3PKb5KNJzSG&prompt=consent&access_type=offline


Enter the authorization code:  4/xwGsBXx6iibf8BNxTJaIiykthOk2qKgbNzMStmECaRbNTW0JDqqFFa0


### Search videos and fetch comments

In [21]:
import csv


def get_video_comments(service, **kwargs):
    comments = []
    results = service.commentThreads().list(**kwargs).execute()

    while results:
        for item in results['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            comments.append(comment)

        if 'nextPageToken' in results:
            kwargs['pageToken'] = results['nextPageToken']
            results = service.commentThreads().list(**kwargs).execute()
        else:
            break

    return comments

        
def search_videos_by_keyword(service, **kwargs):
    results = get_videos(service, **kwargs)
    final_result = []
    for item in results:
        title = item['snippet']['title']
        video_id = item['id']['videoId']
        comments = get_video_comments(service, part='snippet', videoId=video_id, textFormat='plainText')
        final_result.extend([(video_id, title, comment) for comment in comments])
    
    return final_result


def write_to_csv(comments):
    with open('comments.csv', 'w') as comments_file:
        comments_writer = csv.writer(comments_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        comments_writer.writerow(['Video ID', 'Title', 'Comment'])
        for row in comments:
            comments_writer.writerow(list(row))


In [24]:
keyword = input('Enter a keyword: ')
comments = search_videos_by_keyword(service, q=keyword, part='id,snippet', eventType='completed', type='video')

Enter a keyword:  town of salem


In [26]:
len(comments)

2321

In [28]:
import pandas as pd

comments_data = pd.DataFrame({
    'title': [comment[1] for comment in comments],
    'comment': [comment[2] for comment in comments]
})
comments_data

Unnamed: 0,title,comment
0,TROLLING PLAYERS - TOWN OF SALEM MYSTERY GAME ...,When you realize that Jerome’s GA target tried...
1,TROLLING PLAYERS - TOWN OF SALEM MYSTERY GAME ...,"Wait...\nIn ACE ToS vids, his name is JeromeAS..."
2,TROLLING PLAYERS - TOWN OF SALEM MYSTERY GAME ...,I had those exact gumby toys. With the pony i ...
3,TROLLING PLAYERS - TOWN OF SALEM MYSTERY GAME ...,Your stupid the mayor left the game
4,TROLLING PLAYERS - TOWN OF SALEM MYSTERY GAME ...,Question...\n\n\nhow can hades the god of the ...
5,TROLLING PLAYERS - TOWN OF SALEM MYSTERY GAME ...,59:59 really?
6,TROLLING PLAYERS - TOWN OF SALEM MYSTERY GAME ...,Here are some sample names person in comments ...
7,TROLLING PLAYERS - TOWN OF SALEM MYSTERY GAME ...,Your target as GA was trying to kill you but t...
8,TROLLING PLAYERS - TOWN OF SALEM MYSTERY GAME ...,1
9,TROLLING PLAYERS - TOWN OF SALEM MYSTERY GAME ...,1


In [34]:
import pickle
with open('models/models_dict.pkl', 'rb') as f:
    models_dict = pickle.load(f)

In [39]:
word_vectorizer = models_dict['word_vectorizer']
char_vectorizer = models_dict['char_vectorizer']
models = models_dict['models']

In [43]:
from scipy.sparse import hstack

def classify_comment(comment, word_vectorizer, char_vectorizer, models, probability=False, return_df=False):
    """
    :param comment: a string, the raw data to score
    """
    word_features = word_vectorizer.transform([comment])
    char_features = char_vectorizer.transform([comment])
    combined_features = hstack([char_features, word_features])
    
    prediction = {}
    for class_name, model in models.items():
        if probability:
            # Take the positive class probability prediction
            class_prediction = model.predict_proba(combined_features)[0][1]
        else:
            class_prediction = model.predict(combined_features)[0]
            
        prediction[class_name] = [class_prediction]
    
    if return_df:
        return  pd.DataFrame(prediction)
    else:
        return prediction


In [50]:
comments_data.loc[3, 'comment']

'Your stupid the mayor left the game'

In [51]:
classify_comment(comments_data.loc[3, 'comment'], word_vectorizer, char_vectorizer, models, return_df=True)

Unnamed: 0,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,1,0,0,0,0,0
