# Experiment: 2. Core Use Cases

## The Problem
As described in ```YT Search - Plan.md```, the user faces a set of issues and limitations when searching through YT. These issues could potentially be addressed through a basic application that integrates with YT API. These issues are:
1. Results include Ads 
   - Seems fixable (ads can be filtered by field 'isPaidPromotion')
2. Results include recommended videos that are unrelated to the search
   - Seems fixable -> Validating results by keywords included in title/description
3. Results provide only just a few videos that are actually related to the search
    - Seems fixable -> Request enough pages to get a minimum number of filtered results
4. It is difficult to find old videos
    - Seems fixable -> Support Search parameters before: and after:
5. It is almost impossible to find videos by date
    - Seems fixable -> Support Search parameters: before: and after:
6. Can't blacklist channels
    - Seems fixable -> A few ways to solve: adding blacklisted channels into query string, filtering out results
7. Can't blacklist keywords
    - Seems fixable -> Always add blacklisted strings into the query

## Requirements
- Implement the necessary functionalities to support the listed issues above

## Next Steps
- [x] Define and document Logic (e.g.: query, params, default values, process results, present results)
- [x] Function for filtering out ads
- [x] Function for filtering out unrelated videos
- [ ] Query -> set min number of filtered results
- [x] Query Params -> Start and End dates
- [x] Query Defaults -> Blacklist Channels
- [x] Query Defaults -> Blacklist Keywords

## Dependencies
- google-api-python-client
- google-auth-oauthlib 
- google-auth-httplib

In [93]:
# This is an example that was obtained from here:
# https://medium.com/mcd-unison/youtube-data-api-v3-in-python-tutorial-with-examples-e829a25d2ebd#5999

# IMPORTS
import json
import numpy as np
import pandas as pd
# API client library
import googleapiclient.discovery

# API information
api_service_name = "youtube"
api_version = "v3"

# API key
with open('dev.key') as f:
    DEVELOPER_KEY = f.readline()

# API client
youtube = googleapiclient.discovery.build(
    api_service_name, 
    api_version, 
    developerKey = DEVELOPER_KEY)

In [95]:
# Query Preparations
def add_params_to_query(query, before_date="", after_date="", blacklists=None):
    if(before_date == "" and after_date == "" and blacklists is None):
        return query
    
    paramed_query = query
    
    if(before_date != ""):
        paramed_query = paramed_query + " before:" + before_date
    
    if(after_date != ""):
        paramed_query = paramed_query + " after:" + after_date
        
    if(blacklists is not None):
        for key in blacklists:
            for word in blacklists[key]:
                paramed_query = paramed_query + ' -"' + word + '"'
    
    return paramed_query


# Filters

# Further analysis seems to show that ads are not included on YT API results
#def filter_ads(result_items):
#    """Filters result items that are ads (paid promotion)
#
#        Parameters
#        ----------
#        result_items : list
#            result_items should be a list of YT's search response.items[]
#            https://developers.google.com/youtube/v3/docs/search#resource
#    """
#
#    return list(filter(lambda r: (), result_items)) 

def filter_unrelated(result_items, search_query, title_is_related=True, description_is_related=False):
    """Filters result items that do not include any of the search keyword either on the title or description

        Parameters
        ----------
        result_items : list
            result_items should be a list of YT's search response.items[]
            https://developers.google.com/youtube/v3/docs/search#resource
        
        search_query : str
            The exact search string used to query YT search
            
        title_is_related : bool, optional
            Specifies whether the title must include at least one of the keywords (default is True)
            
        description_is_related : bool, optional
            Specifies whether the description must include at least one of the keywords (default is False)
    """
    # No filtering required
    if(not title_is_related) and (not description_is_related):
        return result_items
    
    # Extract keywords
    # set all to lower case
    parsed_query = search_query.lower()

    # Remove specific characters
    chars_to_remove = ['"', '+', '|', '(',')','*']
    parsed_query = parsed_query.translate(
        {ord(char): '' for char in chars_to_remove}
    )

    # Remove query search reserved words
    reserved_words = ['intitle:', 'allintitle:', 'description:', 'before:', 'after:', 'or']
    for reserved_word in reserved_words:
        parsed_query = parsed_query.replace(reserved_word,'')

    # Split query into keywords
    keywords = str.split(parsed_query)

    # Remove "-" keywords (specific words the user specified not to be included into the video)
    keywords = list(filter(lambda word: (not word.startswith("-")),keywords))
    print(keywords)
    
    # Filter results based on the final keywords
    filtered_items = []
    for item in result_items:
        # Validate that at least one of the keywords on the search query is present on the Title
        if(title_is_related and any(substring in item['snippet']['title'].lower() for substring in keywords)):
            filtered_items.append(item)

        # Validate that at least one of the keywrods on the search query is present on the description
        if(description_is_related and any(substring in item['snippet']['description'].lower() for substring in keywords)):
            filtered_items.append(item)
    
    return filtered_items

In [None]:
# Base Query String (direct user input)
query_string = "dungeon synth"

# Query Preparation
# Parameters
before_date = ""
after_date = ""
blacklists = {
    'channels': ['The Dungeon Synth Archives'],
    'keywords': [],
    'videos': []
    }

query_string = add_params_to_query(before_date, after_date, blacklists)

In [72]:
# Check query length
if(len(query_string) > 300):
    print("Warning! Long Query String: " + str(len(query_string)))

# Execute Query

# 'request' variable is the only thing you must change
# depending on the resource and method you need to use
# in your query
request = youtube.search().list(
    part="snippet",
    maxResults=50,
    q=query_string
)

# Query execution
response = request.execute()

# Extract Results
total_results = response["pageInfo"]["totalResults"]
next_page_token = response["nextPageToken"]
items = response["items"]

In [96]:
# Print Results
print("Total Results: " + str(total_results))
result_list = []

for i in items:
    # Extract base data (TODO -> How to identify shorts?)
    result_kind = i["id"]["kind"]
    if(result_kind == "youtube#video"):
        kind = 'video'
        result_id = i["id"]["videoId"]
    elif(result_kind == "youtube#playlist"):
        kind = 'playlist'
        result_id = i["id"]["playlistId"]
    elif(result_kind == "youtube#channel"):
        kind = 'channel'
        result_id = i["id"]["channelId"]
    else:
        kind = 'unknown'
        result_id = i["id"][1]

    # Extract Snippet
    snippet = i["snippet"]
    channel_id = snippet["channelId"]
    channel_title = snippet["channelTitle"]
    channel_link = "https://www.youtube.com/channel/" + str(channel_id)
    title = snippet["title"]
    description = snippet["description"]
    date = snippet["publishedAt"]
    link = "https://www.youtube.com/watch?v=" + str(result_id)

    # Add to List
    result_list.append({
        'kind': kind,
        'id': result_id, 
        'channel_id': channel_id, 
        'channel_title': channel_title, 
        'channel_link': channel_link,
        'title': title, 
        'description': description, 
        'date': date, 
        'link': link
    })

result_df = pd.DataFrame(result_list)
result_df

Total Results: 409959


Unnamed: 0,kind,id,channel_id,channel_title,channel_link,title,description,date,link
0,video,jRNhOdlTMAs,UCsCCifMby57qV_UmrYGladQ,Awesome Restorations,https://www.youtube.com/channel/UCsCCifMby57qV...,CRAZY Zippo lighter rebuild 💀👊,AWESOME Rebuild of a pink Zippo lighter. I wil...,2022-12-10T17:13:02Z,https://www.youtube.com/watch?v=jRNhOdlTMAs
1,video,ahuLxJmbK88,UCWEmaH_sNjEeKlxofh45Gaw,Adrian Yeager,https://www.youtube.com/channel/UCWEmaH_sNjEeK...,How to refill a zippo lighter in under 45 seconds,,2022-06-23T03:19:34Z,https://www.youtube.com/watch?v=ahuLxJmbK88
2,video,N2IsE4cTP-U,UCOFxJWxZQE0A2rPNeImjrZw,TX Tool Crib,https://www.youtube.com/channel/UCOFxJWxZQE0A2...,Zippo : A Beginner&#39;s Guide,This beginner's guide will help you to setup y...,2020-04-30T10:05:12Z,https://www.youtube.com/watch?v=N2IsE4cTP-U
3,video,R0p0Ad6BtOQ,UCRWAuDoxkeYiDUDYwouctDQ,temp,https://www.youtube.com/channel/UCRWAuDoxkeYiD...,Zippo Slim Lighter! 🔥 lovin the orange 🍊,,2021-11-17T01:37:27Z,https://www.youtube.com/watch?v=R0p0Ad6BtOQ
4,video,x5MR3j0KiBg,UCTcxeBzkk-kZrsaLlyNGLMA,Amazing Workshop,https://www.youtube.com/channel/UCTcxeBzkk-kZr...,1993 Zippo Lighter Restoration NO TIME TO DIE ...,zippo #jamesbond #007 1993 Zippo Lighter Resto...,2021-11-10T18:11:05Z,https://www.youtube.com/watch?v=x5MR3j0KiBg
5,video,f1azCHoc4FE,UCHJuQZuzapBh-CuhRYxIZrg,Insider,https://www.youtube.com/channel/UCHJuQZuzapBh-...,How Zippo Lighters Are Made | The Making Of,Zippo lighters are not only used to spark a fl...,2019-10-30T17:00:14Z,https://www.youtube.com/watch?v=f1azCHoc4FE
6,video,g0wOK89tfwc,UCg2EKkPyFCvkI9LSWQL5f6A,Inspired Disorder - Ray Taylor,https://www.youtube.com/channel/UCg2EKkPyFCvkI...,Zippo Lighter - Beginners Guide - How To Use -...,Beginners Guide - How To Use - Unboxing In thi...,2018-01-11T08:33:38Z,https://www.youtube.com/watch?v=g0wOK89tfwc
7,video,nXd2dMZLOZU,UCi6vrYGqo3DU_8E93LpPWYA,Chrispy Things [EDC],https://www.youtube.com/channel/UCi6vrYGqo3DU_...,I found a vintage Zippo lighter (Zippo tape me...,How to tell how old your Zippo lighter is. #ed...,2022-12-07T16:00:32Z,https://www.youtube.com/watch?v=nXd2dMZLOZU
8,video,RivmPTa3pmo,UCkDbLiXbx6CIRZuyW9sZK1g,Taras Kul,https://www.youtube.com/channel/UCkDbLiXbx6CIR...,5 Zippo Lighters Insert You Didn&#39;t Know Ex...,3 Vintage Hot Dog Cookers You Never Knew Exist...,2020-05-06T20:55:50Z,https://www.youtube.com/watch?v=RivmPTa3pmo
9,video,h_oHwLwb9tg,UCaQRmy4Beww-riYiV44qjhg,jedrek29t,https://www.youtube.com/channel/UCaQRmy4Beww-r...,A Burning ZIPPO Lighter in Epoxy Resin. DIY a ...,I will show you how to easily make a realistic...,2022-11-25T13:45:11Z,https://www.youtube.com/watch?v=h_oHwLwb9tg


# Nice to have: Free Request Methods
This is a brief exploration on how to achieve all previous results by requesting without an API Key (free).
There are a few approaches:
1. Using URL Params
2. Registering an app and using the user's auth (not sure if costs/limits are still applied to the user - need to further research)
3. Using intermediate free API's (e.g.: https://yt.lemnoslife.com)

## Using URL Params
There are two main approaches here:
1. URL Formation -> help the user complete the URL parameters, generate it and open in new tab
2. Scrapper -> Same as above, but instead of open in new tab, get response, scrap results from page and format (need to check SLA for this one)