# Minimum Viable Product

## YouTube Custom Web App

### MVP Overview
- **Goal:** To create a web app that has the ability to return popular YouTube videos of specific subjects based on criteria such as views, comments, etc. for a specific subject.
- **Solution Path:** The data chosen to be used in this project is from [YouTube](youtube.com) using the API key. The Key with be used to query different YouTube APIs that will give data relevant to video searches and videos themselves. I will start with approximately 10 known search terms, put the parsed response into a postgres database, and then have a front end be able to query data from there with items such as:
  - subject
  - tags
  - comments
  - views
- **Work Completed**: I chose GCP for the cloud platform to use, initiated an account with GCP's cloud shell, OAuth, YouTube API, and GCP based SQL database. I then created the code to interact with the YouTube API, and put sample responses into a dataframe.
- **Recent Findings:** Instead of using the API call to search one search term at a time which will cost toward the GCP account, using many search terms at once will enable me to gather much more data with one API call as it only has to be queried once.
- **Moving Forward:** I will create the database in GCP, interact with it with my console, post all code in Cloud Shell, and deploy the app with Flask.

### References

- https://www.youtube.com/watch?v=fklHBWow8vE
- https://googleapis.github.io/google-api-python-client/docs/dyn/youtube_v3.html
- https://github.com/Strata-Scratch/api-youtube/blob/main/importing_df_to_db_final.ipynb
- https://www.freecodecamp.org/news/int-object-is-not-iterable-python-error-solved/#:~:text=If%20you%20are%20running%20your,%2C%20dictionaries%2C%20and%20so%20on.
- https://www.folkstalk.com/2022/10/futurewarning-the-frame-append-method-is-deprecated-and-will-be-removed-from-pandas-in-a-future-version-use-pandas-concat-instead-with-code-examples.html

In [1]:
import pandas as pd
import time
import os
import google_auth_oauthlib.flow
import googleapiclient.discovery
import googleapiclient.errors

In [2]:
# ref: https://developers.google.com/youtube/v3/docs/search/list?apix=true
scopes = ["https://www.googleapis.com/auth/youtube.force-ssl"]
DEVELOPER_KEY = "AIzaSyD2gLGvJTZmCMwjjJix4k6ef08XcOkolro"
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
youtube = googleapiclient.discovery.build(api_service_name, api_version, developerKey=DEVELOPER_KEY)

In [3]:
def youtube_search(search_term, max_results):

    request = youtube.search().list(
                                    part="snippet",
                                    maxResults=max_results,
                                    order="date",
                                    q=search_term
                                    )
    response = request.execute()
    return response['items']

In [146]:
# def youtube_comments(videoId):
maxResults = 10
request = youtube.commentThreads().list(
                                        part="snippet,replies",
                                        maxResults=maxResults,
                                        textFormat="plainText",
                                        videoId="_VB39Jo8mAQ"
                                        )
response = request.execute()

df = pd.DataFrame(columns=["vid_id", "comment"])

for result in range(maxResults):
    comment = response['items'][result]['snippet']['topLevelComment']['snippet']['textDisplay']
    vid_id = response['items'][result]['snippet']['videoId']
    new_row = pd.Series({"vid_id":vid_id, "comment":comment})
    df = pd.concat([df, new_row.to_frame().T], ignore_index=True)

In [147]:
df

Unnamed: 0,vid_id,comment
0,_VB39Jo8mAQ,The energetic channel invariably reduce becaus...
1,_VB39Jo8mAQ,"Even with physical money, a dollar is really a..."
2,_VB39Jo8mAQ,I kinda disconnected from the talk for a while...
3,_VB39Jo8mAQ,It is true that credit card transactions are h...
4,_VB39Jo8mAQ,Sounds like this isn't an experiment about tea...
5,_VB39Jo8mAQ,"So close but still so far, money isn’t real."
6,_VB39Jo8mAQ,I worked at a bank from 24-31 years old. I we...
7,_VB39Jo8mAQ,money has never existed
8,_VB39Jo8mAQ,But money isn’t real. It’s fiat
9,_VB39Jo8mAQ,Whatever type of investment you decide to get ...


In [142]:
def youtube_stats(video_ids, df):

    request = youtube.videos().list(
                                    part="snippet,contentDetails,statistics",
                                    id=video_ids
                                    )
    response = request.execute()


    while True:        

        for vid in response['items']:
            
            # snippet
            added_date = vid['snippet']['publishedAt']
            channel_id = vid['snippet']['channelId']
            channel_title = vid['snippet']['channelTitle']
            vid_id = vid['id']
            title = vid['snippet']['title']
            title = str(title).replace("&","")
            description = vid['snippet']['description']
            
            try:
                tags = vid['snippet']['tags']
            except:
                tags = ""
            
            category_id = vid['snippet']['categoryId']
            
            # contentDetails
            duration = vid['contentDetails']['duration']

            # statistics
            view_count = vid['statistics']['viewCount']
            
            try:
                like_count = vid['statistics']['likeCount']
            except:
                like_count = 0          
            
            favorited_count = vid['statistics']['favoriteCount']
            comment_count = vid['statistics']['commentCount']

            new_row = pd.Series({"added_date":added_date, 
                                    "channel_id":channel_id, 
                                    "channel_title":channel_title, 
                                    "vid_id":vid_id, 
                                    "title":title, 
                                    "description":description, 
                                    "tags":tags, 
                                    "category_id":category_id, 
                                    "duration":duration, 
                                    "view_count":view_count, 
                                    "like_count":like_count, 
                                    "favorited_count":favorited_count, 
                                    "comment_count":comment_count}
                                    )
            df = pd.concat([df, new_row.to_frame().T], ignore_index=True)
        

        return df

In [143]:
# needed is search term and max number of results
search_results = youtube_search("cyber security", 3)
video_ids = []

for video in search_results:
    if video['id']['kind'] == "youtube#video":
        video_id = video['id']['videoId']
        video_ids.append(video_id)

# ref: https://www.simplilearn.com/tutorials/python-tutorial/list-to-string-in-python
video_ids = ','.join(video_ids)
print(video_ids)

ik-7GkfRzp0,eA3uXu4X5H0,EkOQ11mOVCs


In [144]:
df = pd.DataFrame(columns=["added_date", "channel_id", "channel_title", "vid_id", "title", "description", "tags", "category_id", "duration", "view_count", "like_count", "favorited_count", "comment_count"]) 

df = youtube_stats(video_ids, df)

In [145]:
df

Unnamed: 0,added_date,channel_id,channel_title,vid_id,title,description,tags,category_id,duration,view_count,like_count,favorited_count,comment_count
0,2022-11-21T07:57:25Z,UC2gMLnn7iZbOyMKDzyLbbsg,Bosch Energy and Building Solutions,ik-7GkfRzp0,Vorstellung der neuen Services: NEXOSPACE Cybe...,Erfahren Sie mehr zu unseren neuen Services au...,,28,PT12M25S,0,0,0,0
1,2022-11-21T07:56:19Z,UCWcsLzgs4L7kEpFn9lkp7PA,ITsecura,eA3uXu4X5H0,CYBER SECURITY CONSULTING VIDEO,About cyber security consulting,,27,PT2M20S,3,0,0,0
2,2022-11-21T07:30:04Z,UCSFGtckCobuF1Sf_cuhExTg,IBC RAJANIKANT BADA BUSINESS,EkOQ11mOVCs,5th:- Cyber Security Problem Solving Course | ...,I'm Rajanikant Mallick from Cuttack Odisha.\nS...,"[ibc rajanikant bada business, ibc rajnikant b...",27,PT4M18S,0,0,0,0


In [None]:
# prefetch - search on DB first
# possible NLP - similarity between videos