# Experiment: 2. Core Use Cases

## The Problem
As described in ```YT Search - Plan.md```, the user faces a set of issues and limitations when searching through YT. These issues could potentially be addressed through a basic application that integrates with YT API. These issues are:
1. Results include Ads 
   - Seems fixable (ads can be filtered by field 'isPaidPromotion')
2. Results include recommended videos that are unrelated to the search
   - Seems fixable -> Validating results by keywords included in title/description
3. Results provide only just a few videos that are actually related to the search
    - Seems fixable -> Request enough pages to get a minimum number of filtered results
4. It is difficult to find old videos
    - Seems fixable -> Support Search parameters before: and after:
5. It is almost impossible to find videos by date
    - Seems fixable -> Support Search parameters: before: and after:
6. Can't blacklist channels
    - Seems fixable -> A few ways to solve: adding blacklisted channels into query string, filtering out results
7. Can't blacklist keywords
    - Seems fixable -> Always add blacklisted strings into the query

## Requirements
- Implement the necessary functionalities to support the listed issues above

## Next Steps
- [x] Define and document Logic (e.g.: query, params, default values, process results, present results)
- [x] Function for filtering out ads
- [x] Function for filtering out unrelated videos
- [ ] Query -> set min number of filtered results -> will be done by the service later on
- [x] Query Params -> Start and End dates
- [x] Query Defaults -> Blacklist Channels
- [x] Query Defaults -> Blacklist Keywords

## Dependencies
- google-api-python-client
- google-auth-oauthlib 
- google-auth-httplib

In [93]:
# This is an example that was obtained from here:
# https://medium.com/mcd-unison/youtube-data-api-v3-in-python-tutorial-with-examples-e829a25d2ebd#5999

# IMPORTS
import json
import numpy as np
import pandas as pd
# API client library
import googleapiclient.discovery

# API information
api_service_name = "youtube"
api_version = "v3"

# API key
with open('dev.key') as f:
    DEVELOPER_KEY = f.readline()

# API client
youtube = googleapiclient.discovery.build(
    api_service_name, 
    api_version, 
    developerKey = DEVELOPER_KEY)

In [97]:
# Query Preparations
def add_params_to_query(query, before_date="", after_date="", blacklists=None):
    if(before_date == "" and after_date == "" and blacklists is None):
        return query
    
    paramed_query = query
    
    if(before_date != ""):
        paramed_query = paramed_query + " before:" + before_date
    
    if(after_date != ""):
        paramed_query = paramed_query + " after:" + after_date
        
    if(blacklists is not None):
        for key in blacklists:
            for word in blacklists[key]:
                paramed_query = paramed_query + ' -"' + word + '"'
    
    return paramed_query


# Filters

# Further analysis seems to show that ads are not included on YT API results
#def filter_ads(result_items):
#    """Filters result items that are ads (paid promotion)
#
#        Parameters
#        ----------
#        result_items : list
#            result_items should be a list of YT's search response.items[]
#            https://developers.google.com/youtube/v3/docs/search#resource
#    """
#
#    return list(filter(lambda r: (), result_items)) 

def filter_unrelated(result_items, search_query, title_is_related=True, description_is_related=False):
    """Filters result items that do not include any of the search keyword either on the title or description

        Parameters
        ----------
        result_items : list
            result_items should be a list of YT's search response.items[]
            https://developers.google.com/youtube/v3/docs/search#resource
        
        search_query : str
            The exact search string used to query YT search
            
        title_is_related : bool, optional
            Specifies whether the title must include at least one of the keywords (default is True)
            
        description_is_related : bool, optional
            Specifies whether the description must include at least one of the keywords (default is False)
    """
    # No filtering required
    if(not title_is_related) and (not description_is_related):
        return result_items
    
    # Extract keywords
    # set all to lower case
    parsed_query = search_query.lower()

    # Remove specific characters
    chars_to_remove = ['"', '+', '|', '(',')','*']
    parsed_query = parsed_query.translate(
        {ord(char): '' for char in chars_to_remove}
    )

    # Remove query search reserved words
    reserved_words = ['intitle:', 'allintitle:', 'description:', 'before:', 'after:', 'or']
    for reserved_word in reserved_words:
        parsed_query = parsed_query.replace(reserved_word,'')

    # Split query into keywords
    keywords = str.split(parsed_query)

    # Remove "-" keywords (specific words the user specified not to be included into the video)
    keywords = list(filter(lambda word: (not word.startswith("-")),keywords))
    print(keywords)
    
    # Filter results based on the final keywords
    filtered_items = []
    for item in result_items:
        # Validate that at least one of the keywords on the search query is present on the Title
        if(title_is_related and any(substring in item['snippet']['title'].lower() for substring in keywords)):
            filtered_items.append(item)

        # Validate that at least one of the keywrods on the search query is present on the description
        if(description_is_related and any(substring in item['snippet']['description'].lower() for substring in keywords)):
            filtered_items.append(item)
    
    return filtered_items

In [111]:
# Base Query String (direct user input)
query_string = "dungeon synth"
print("Query string: " + query_string)

# Query Preparation
# Parameters
before_date = ""
after_date = ""
blacklists = {
    'channels': ['The Dungeon Synth Archives', 'Pseudiom'],
    'keywords': ['make great again'],
    'videos': []
    }

query_string = add_params_to_query(query_string, before_date, after_date, blacklists)
print("Parameterized query: " + query_string)

Query string: dungeon synth
Parameterized query: dungeon synth -"The Dungeon Synth Archives" -"Pseudiom" -"make great again"


In [112]:
# Check query length
if(len(query_string) > 300):
    print("Warning! Long Query String: " + str(len(query_string)))

# Execute Query

# 'request' variable is the only thing you must change
# depending on the resource and method you need to use
# in your query
request = youtube.search().list(
    part="snippet",
    maxResults=50,
    q=query_string
)

# Query execution
response = request.execute()

# Extract Results
total_results = response["pageInfo"]["totalResults"]
next_page_token = response["nextPageToken"]
items = response["items"]

In [114]:
# Print Results
print("Total Results: " + str(total_results))
result_list = []

for i in items:
    # Extract base data (TODO -> How to identify shorts?)
    result_kind = i["id"]["kind"]
    if(result_kind == "youtube#video"):
        kind = 'video'
        result_id = i["id"]["videoId"]
    elif(result_kind == "youtube#playlist"):
        kind = 'playlist'
        result_id = i["id"]["playlistId"]
    elif(result_kind == "youtube#channel"):
        kind = 'channel'
        result_id = i["id"]["channelId"]
    else:
        kind = 'unknown'
        result_id = i["id"][1]

    # Extract Snippet
    snippet = i["snippet"]
    channel_id = snippet["channelId"]
    channel_title = snippet["channelTitle"]
    channel_link = "https://www.youtube.com/channel/" + str(channel_id)
    title = snippet["title"]
    description = snippet["description"]
    date = snippet["publishedAt"]
    link = "https://www.youtube.com/watch?v=" + str(result_id)

    # Add to List
    result_list.append({
        'kind': kind,
        'id': result_id, 
        'channel_id': channel_id, 
        'channel_title': channel_title, 
        'channel_link': channel_link,
        'title': title, 
        'description': description, 
        'date': date, 
        'link': link
    })

result_df = pd.DataFrame(result_list)
result_df

Total Results: 264485


Unnamed: 0,kind,id,channel_id,channel_title,channel_link,title,description,date,link
0,video,QdjXgpmsPCs,UCLWEUXsMZfHHZWc_j3Kkb3A,Lòkideath,https://www.youtube.com/channel/UCLWEUXsMZfHHZ...,&#39;make Dungeon synth great again&#39;: a co...,Modern Dungeon synth (21-onwards) is mostly an...,2022-11-12T10:47:58Z,https://www.youtube.com/watch?v=QdjXgpmsPCs
1,video,mCg0gc1bdeM,UCwItdPrWFQpreItlMgy5PMw,Sire Ravaillac,https://www.youtube.com/channel/UCwItdPrWFQpre...,50 Nuances de Dungeon Synth : épisode 1,Une audacieuse rétrospective du micro-genre le...,2019-04-25T11:12:43Z,https://www.youtube.com/watch?v=mCg0gc1bdeM
2,video,TRhNuGY58iU,UC8xWMcEZF966UudRjbqawfw,The Sovereign Hammer,https://www.youtube.com/channel/UC8xWMcEZF966U...,Coniferous Myst - Forest Rehearsal 5.20.19 (20...,Artist: Coniferous Myst Album: Forest Rehearsa...,2019-06-17T01:23:44Z,https://www.youtube.com/watch?v=TRhNuGY58iU
3,video,mZWsm7XAmfo,UCLWEUXsMZfHHZWc_j3Kkb3A,Lòkideath,https://www.youtube.com/channel/UCLWEUXsMZfHHZ...,Make Dungeon Synth Great Again Vol 3,1-Cathederal Of glistening Hope-Quest Master 2...,2022-12-05T13:26:02Z,https://www.youtube.com/watch?v=mZWsm7XAmfo
4,video,9KLweqtrIcc,UCvKPc71Dd8qnuKZd2nRyH4g,FlynnFlyTaggart,https://www.youtube.com/channel/UCvKPc71Dd8qnu...,Что такое Dungeon Synth?,https://4ga.me/3xZtuTv - релиз Crowfall 28го и...,2021-07-26T14:29:05Z,https://www.youtube.com/watch?v=9KLweqtrIcc
5,video,CBtL8ogXZcw,UCV-RyRTJZym-lksT9VG8E-g,In The Woods,https://www.youtube.com/channel/UCV-RyRTJZym-l...,"DIM - Steeped Sky, Stained Light (2022) (Full...",Medieval Ambient/Dungeon Synth/Fantasy Synth f...,2022-09-17T23:07:24Z,https://www.youtube.com/watch?v=CBtL8ogXZcw
6,video,CD16AW2lvlg,UCwyHy-h3aajm0kvEQ0qWlOg,Black Casket Records,https://www.youtube.com/channel/UCwyHy-h3aajm0...,Old Distant Weep - Detuned Sonata (Ukrainian D...,The new EP album from Old Distant Weep in the ...,2022-05-31T09:30:12Z,https://www.youtube.com/watch?v=CD16AW2lvlg
7,video,MPUFQOh7lnw,UCV-RyRTJZym-lksT9VG8E-g,In The Woods,https://www.youtube.com/channel/UCV-RyRTJZym-l...,Tales Under The Oak - Swamp Kingdom (2022) (Fu...,Medieval Ambient/Dungeon Synth from Berlin Ger...,2022-08-14T18:01:25Z,https://www.youtube.com/watch?v=MPUFQOh7lnw
8,video,tGUe0TJwadc,UCedCahx-XHprI3arkbuntXQ,Dueling Dragon Adventures,https://www.youtube.com/channel/UCedCahx-XHprI...,Darkrune - Book of the Black Rose (2022) (Dung...,Artist: Darkrune Album: Book of the Black Rose...,2022-10-12T21:21:14Z,https://www.youtube.com/watch?v=tGUe0TJwadc
9,video,wWvuUu4DxCk,UCu-4oG7obwqzVSbh5fdOglg,Funeral Boy,https://www.youtube.com/channel/UCu-4oG7obwqzV...,¿ QUE ES EL DUNGEON SYNTH?/ Funeral Boy,en este video hablare del subgénero dungeon sy...,2020-12-18T05:35:31Z,https://www.youtube.com/watch?v=wWvuUu4DxCk


# Nice to have: Free Request Methods
This is a brief exploration on how to achieve all previous results by requesting without an API Key (free).
There are a few approaches:
1. Using URL Params
2. Registering an app and using the user's auth (not sure if costs/limits are still applied to the user - need to further research)
3. Using intermediate free API's (e.g.: https://yt.lemnoslife.com)

## Using URL Params
There are two main approaches here:
1. URL Formation -> help the user complete the URL parameters, generate it and open in new tab
2. Scrapper -> Same as above, but instead of open in new tab, get response, scrap results from page and format (need to check SLA for this one)