# Youtube API Demo

This notebook details a local demonstration on how to use the Youtube API to search for videos and filter them based on a natural language query.

In [1]:
# data science
import numpy as np
import pandas as pd

import joblib
from time import perf_counter
from sentence_transformers import SentenceTransformer

import youtube.get_youtube_data as get_youtube_data
import machine_learning.embedding as embedding
import save_dislikes.sentiment as sentiment

  from .autonotebook import tqdm as notebook_tqdm


You will have to write your own `youtube/secrets.toml` file containing your own Youtube API key if you wish to use our custom function. Otherwise, you can simply set the `YOUTUBE_API_KEY` to your key as a string.

In [2]:
# Step 1. Get the Youtube API key. The Youtube API key created in Google Cloud by following the instructions on
#         the API overview page: https://developers.google.com/youtube/v3/getting-started
YOUTUBE_API_KEY = get_youtube_data.get_youtube_api_key()

In [3]:
# Step 2. Build the Youtube client to make API calls.
youtube = get_youtube_data.make_client(YOUTUBE_API_KEY)

Go ahead and specify a desired query. Note that to include the comments or transcripts, each method takes a considerable amount of time even for a few number of videos. Currently our 2 filter layers (1) Save the Dislikes and (2) BERT cosine similarity, do not require anything other than the titles. This may change in the future.

In [4]:
# Step 3. Perform a Youtube search with a user-specified query.
youtube_df = get_youtube_data.search_youtube(
    youtube,
    query='Patrick Bet David',
    max_vids=15,       # only grab at most 50 to reduce quota usage
    order='relevance', # default is relevance
    comments=True,
    max_comments=20,   # only grab at most 100 to reduce quota usage
    transcripts=True,
)

Searching for: Patrick Bet David
Returning 15 results


In [5]:
# Let's check the columns in the Youtube DataFrame
youtube_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 24 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   video_id                15 non-null     object 
 1   published_at            15 non-null     object 
 2   channel_id              15 non-null     object 
 3   title                   15 non-null     object 
 4   description             15 non-null     object 
 5   channel_title           15 non-null     object 
 6   live_broadcast_content  15 non-null     object 
 7   thumbnail_default_url   15 non-null     object 
 8   thumbnail_medium_url    15 non-null     object 
 9   thumbnail_high_url      15 non-null     object 
 10  comments                15 non-null     object 
 11  thumbnail_standard_url  15 non-null     object 
 12  thumbnail_maxres_url    14 non-null     object 
 13  tags                    14 non-null     object 
 14  video_category_id       15 non-null     int6

In [6]:
# Let's check a few rows.
youtube_df.sample(5)

Unnamed: 0,video_id,published_at,channel_id,title,description,channel_title,live_broadcast_content,thumbnail_default_url,thumbnail_medium_url,thumbnail_high_url,...,video_category_id,video_duration,video_caption,video_view_count,video_like_count,video_comment_count,is_comments_enabled,is_live_content,NoCommentsBinary,transcript
0,QnAOUMB-BjE,2023-08-05T01:08:35Z,UCGX7nGXpz-CmO_Arg-cgJ7A,Vivek Ramaswamy Town Hall | PBD Podcast,"For the first time, the PBD Podcast will host ...",PBD Podcast,none,https://i.ytimg.com/vi/QnAOUMB-BjE/default.jpg,https://i.ytimg.com/vi/QnAOUMB-BjE/mqdefault.jpg,https://i.ytimg.com/vi/QnAOUMB-BjE/hqdefault.jpg,...,24,5617.0,False,701598.0,33719.0,6511.0,1,0,0,homie look what I become all right buddy Make ...
5,VWh0imOe3U4,2023-07-28T03:24:29Z,UCaJiHWcjkaZRpwP_PEXCT5Q,Patrick Bet-David Converts to Christianity #ch...,,Lila Rose,none,https://i.ytimg.com/vi/VWh0imOe3U4/default.jpg,https://i.ytimg.com/vi/VWh0imOe3U4/mqdefault.jpg,https://i.ytimg.com/vi/VWh0imOe3U4/hqdefault.jpg,...,22,60.0,False,38240.0,2632.0,73.0,1,0,0,the 22 year old Pat didn't believe in monogamy...
3,M4LJArRl67g,2023-08-05T01:15:07Z,UCIHdDJ0tjn_3j-FS7s_X1kQ,&quot;They Won&#39;t Let Biden Run&quot; - Viv...,"For the first time, PBD Podcast hosts a Town H...",Valuetainment,none,https://i.ytimg.com/vi/M4LJArRl67g/default.jpg,https://i.ytimg.com/vi/M4LJArRl67g/mqdefault.jpg,https://i.ytimg.com/vi/M4LJArRl67g/hqdefault.jpg,...,27,232.0,False,87952.0,2191.0,501.0,1,0,0,the santis and Newsome okay announcing that th...
14,_pDcMKlBkh8,2023-04-25T23:00:13Z,UCIHdDJ0tjn_3j-FS7s_X1kQ,Emotional Story On How PBD Earned $30 Million,shorts #short #valuetainment #patrickbetdavid.,Valuetainment,none,https://i.ytimg.com/vi/_pDcMKlBkh8/default.jpg,https://i.ytimg.com/vi/_pDcMKlBkh8/mqdefault.jpg,https://i.ytimg.com/vi/_pDcMKlBkh8/hqdefault.jpg,...,27,59.0,False,334305.0,17860.0,127.0,1,0,0,I talked to Eli I said Eli how can I help you ...
7,N6asuGrBRpw,2023-08-05T01:48:07Z,UCIHdDJ0tjn_3j-FS7s_X1kQ,Vivek&#39;s Gameplan to Drain the Swamp Better...,"For the first time, PBD Podcast hosts a Town H...",Valuetainment,none,https://i.ytimg.com/vi/N6asuGrBRpw/default.jpg,https://i.ytimg.com/vi/N6asuGrBRpw/mqdefault.jpg,https://i.ytimg.com/vi/N6asuGrBRpw/hqdefault.jpg,...,27,364.0,False,119182.0,3577.0,1181.0,1,0,0,to follow up on that you know for you to say I...


We have adapted the Save the Dislikes model to either `sort_by_sentiment` or `filter_by_sentiment`. Here, we simply use the sorting method to push negative (-1) videos to the end of the queue.

In [7]:
print("Loading model...")
start_load_time = perf_counter()
# Path to load the model
model_pickle_path = 'save_dislikes/rfclf.joblib.pkl'
rf_clf = joblib.load(model_pickle_path)
print(f"  Time taken to load model: {(perf_counter() - start_load_time):.4f} seconds")
print()

youtube_df = sentiment.sort_by_sentiment(rf_clf, youtube_df)

Loading model...
  Time taken to load model: 2.3307 seconds

Making predictions...
  Time taken to make predictions: 0.0184 seconds

No negative videos found. No sorting occured.


For the BERT cosine similarity filter layer, we run 2 examples: 
1. On a curated list of videos with several that are obviously (at least to us) political.
2. On the Youtube video DataFrame.

In [8]:
# How to strikethrough text taken from this StackOverflow post:
# https://stackoverflow.com/questions/25244454/python-create-strikethrough-strikeout-overstrike-string-type
def strikethrough(text):
    result = ''
    for c in text:
        result = result + '\u0336' + c
    return result

def print_filter_results(before, after):
    print(f"Filter layer removed {len(before) - len(after)} videos. {len(after)} remaining.")
    print("-" * 10)
    for title in before:
        if title not in after:
            print(strikethrough(title))
        else:
            print(title)

In [9]:
filter_sent = "Politics"
list_of_videos = ["Who's Really Supporting Russia",
                  "The Perfect Hillary Clinton Analogy",
                  "The Evolution of Alex Jones",
                  "Patrick Bet David on The Breakfast Club",
                  "The Truth About The 2020 Election",
                  "Kobe Bryant's Last Great Interview"]
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
filtered_list_of_videos = embedding.filter_out_embed(model,filter_sent,list_of_videos)

print_filter_results(list_of_videos, filtered_list_of_videos)

Filter layer removed 4 videos. 2 remaining.
----------
̶W̶h̶o̶'̶s̶ ̶R̶e̶a̶l̶l̶y̶ ̶S̶u̶p̶p̶o̶r̶t̶i̶n̶g̶ ̶R̶u̶s̶s̶i̶a
̶T̶h̶e̶ ̶P̶e̶r̶f̶e̶c̶t̶ ̶H̶i̶l̶l̶a̶r̶y̶ ̶C̶l̶i̶n̶t̶o̶n̶ ̶A̶n̶a̶l̶o̶g̶y
̶T̶h̶e̶ ̶E̶v̶o̶l̶u̶t̶i̶o̶n̶ ̶o̶f̶ ̶A̶l̶e̶x̶ ̶J̶o̶n̶e̶s
Patrick Bet David on The Breakfast Club
̶T̶h̶e̶ ̶T̶r̶u̶t̶h̶ ̶A̶b̶o̶u̶t̶ ̶T̶h̶e̶ ̶2̶0̶2̶0̶ ̶E̶l̶e̶c̶t̶i̶o̶n
Kobe Bryant's Last Great Interview


In [10]:
titles = youtube_df['title'].tolist()
filtered_titles = embedding.filter_out_embed(model,filter_sent,titles)

print_filter_results(titles, filtered_titles)

Filter layer removed 2 videos. 13 remaining.
----------
Vivek Ramaswamy Town Hall | PBD Podcast
Patrick Bet-David&#39;s Multi Millionaire Diet
̶C̶e̶n̶k̶ ̶a̶n̶d̶ ̶P̶a̶t̶r̶i̶c̶k̶ ̶B̶e̶t̶-̶D̶a̶v̶i̶d̶ ̶D̶i̶s̶c̶u̶s̶s̶ ̶T̶r̶u̶m̶p̶&̶#̶3̶9̶;̶s̶ ̶C̶o̶r̶r̶u̶p̶t̶i̶o̶n
&quot;They Won&#39;t Let Biden Run&quot; - Vivek Reacts to DeSantis &amp; Newsom Agreeing to Debate
The Day I Became A NEW MAN - Emotional Story by Patrick Bet-David
Patrick Bet-David Converts to Christianity #christianity #lilarose #valuetainment #patrickbetdavid
Cenk Uygur  | PBD Podcast | Ep. 292
Vivek&#39;s Gameplan to Drain the Swamp Better Than Trump Did
Patrick Bet-David&#39;s Top 5 Books - MUST READS for Entrepreneurs!
How Patrick Bet-David Raises His Children
&quot;They Need to Step it up!&quot; - Can India Replace China?
̶B̶i̶g̶g̶e̶s̶t̶ ̶P̶r̶e̶s̶i̶d̶e̶n̶t̶i̶a̶l̶ ̶U̶P̶S̶E̶T̶S̶ ̶-̶ ̶W̶h̶y̶ ̶T̶h̶e̶ ̶U̶n̶d̶e̶r̶d̶o̶g̶s̶ ̶A̶l̶m̶o̶s̶t̶ ̶A̶l̶w̶a̶y̶s̶ ̶W̶i̶n
&quot;You Can&#39;t Say He&#39;s A ____&quot;: Cenk Blows PBD Podcasts&#39