# YouTube Custom Web App

### 12-8-2022
- **Goal:** This attempts to create a web app that has the ability to return popular YouTube videos of specific subjects based on criteria such as views, comments, etc. for a specific subject. The subject chosen is Cyber Security.

- **Work Completed**: I chose GCP for the cloud platform to use, and initiated an account with GCP's cloud shell, OAuth, YouTube API, and GCP based PostgreSQL database. I created the code to interact with the YouTube API, put sample responses into a dataframe, and then output into a csv due to YouTube API request limits. The notebook called "database-operations.ipynb" is specifically created to put the csv data into a database located in GCP. After that occurred, I then created a web app in Flask and deployed it via GCP. The web app link is deployed and is located [here](https://youtube-b3foqn367a-uw.a.run.app/).

- **Findings**: The actual process of getting data down from YouTube wasn't too difficult but setting up and ensuring GCP worled correctly was. The requirements for all services to talk to each other ended up being the primary item that took the longest time as even though all data and code were correct to render the app, there's about 5 GCP services that have to be configured correctly for the data to come through seemlessly.

### References

- https://www.youtube.com/watch?v=fklHBWow8vE
- https://googleapis.github.io/google-api-python-client/docs/dyn/youtube_v3.html
- https://github.com/Strata-Scratch/api-youtube/blob/main/importing_df_to_db_final.ipynb
- https://github.com/miguelgrinberg/flask-tables
- https://www.simplilearn.com/tutorials/python-tutorial/list-to-string-in-python
- https://www.youtube.com/watch?v=IsuhCAptNbg
- https://blog.miguelgrinberg.com/post/beautiful-interactive-tables-for-your-flask-templates
- https://stackoverflow.com/questions/53773578/flask-sqlalchemy-cant-connect-to-google-cloud-postgresql-database-with-unix-soc
- https://www.freecodecamp.org/news/int-object-is-not-iterable-python-error-solved/#:~:text=If%20you%20are%20running%20your,%2C%20dictionaries%2C%20and%20so%20on.
- https://stackoverflow.com/questions/60412473/using-the-youtube-api-im-getting-invalid-filter-parameter-error-what-is-wron
- https://www.folkstalk.com/2022/10/futurewarning-the-frame-append-method-is-deprecated-and-will-be-removed-from-pandas-in-a-future-version-use-pandas-concat-instead-with-code-examples.html
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

In [2]:
import pandas as pd
import time
import os
import google_auth_oauthlib.flow
import googleapiclient.discovery
import googleapiclient.errors
import psycopg2 as ps

In [3]:
# ref: https://developers.google.com/youtube/v3/docs/search/list?apix=true
scopes = ["https://www.googleapis.com/auth/youtube.force-ssl"]

DEVELOPER_KEY = <APIKEY>

os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
youtube = googleapiclient.discovery.build(api_service_name, api_version, developerKey=DEVELOPER_KEY)

In [4]:
def youtube_search(search_term, max_results):

    request = youtube.search().list(
                                    part="snippet",
                                    maxResults=max_results,
                                    order="date",
                                    q=search_term
                                    )
    response = request.execute()
    
    return response['items']

In [5]:
def youtube_comments(videoId):
    maxResults = 100
    request = youtube.commentThreads().list(
                                            part="snippet,replies",
                                            maxResults=maxResults,
                                            textFormat="plainText",
                                            videoId=videoId
                                            )
    
    try:
        response = request.execute()
    except:
        pass

    df = pd.DataFrame(columns=["vid_id", "comment"])

    try:
        for result in range(maxResults):
            try:
                comment = response['items'][result]['snippet']['topLevelComment']['snippet']['textDisplay']
            except:
                comment = "unknown or disabled comments"
            vid_id = response['items'][result]['snippet']['videoId']
            new_row = pd.Series({"vid_id":vid_id, "comment":comment})
            df = pd.concat([df, new_row.to_frame().T], ignore_index=True)
    except:
        pass

    return df

In [6]:
def youtube_stats(video_ids, df):


    request = youtube.videos().list(
                                    part="snippet,contentDetails,statistics",
                                    id=video_ids
                                    )
    response = request.execute()


    while True:        

        for vid in response['items']:
            
            # snippet
            added_date = vid['snippet']['publishedAt']
            channel_id = vid['snippet']['channelId']
            channel_title = vid['snippet']['channelTitle']
            vid_id = vid['id']
            title = vid['snippet']['title']
            title = str(title).replace("&","")
            description = vid['snippet']['description']
            
            try:
                tags = vid['snippet']['tags']
            except:
                tags = ""
            
            category_id = vid['snippet']['categoryId']
            
            # contentDetails
            duration = vid['contentDetails']['duration']

            # statistics

            try:
                view_count = vid['statistics']['viewCount']
            except:
                view_count = 0
            
            try:
                like_count = vid['statistics']['likeCount']
            except:
                like_count = 0          
            
            try:
                favorited_count = vid['statistics']['favoriteCount']
            except:
                favorited_count = 0
            
            try:
                comment_count = vid['statistics']['commentCount']
            except:
                comment_count = 0

            new_row = pd.Series({"added_date":added_date, 
                                    "channel_id":channel_id, 
                                    "channel_title":channel_title, 
                                    "vid_id":vid_id, 
                                    "title":title, 
                                    "description":description, 
                                    "tags":tags, 
                                    "category_id":category_id, 
                                    "duration":duration, 
                                    "view_count":view_count, 
                                    "like_count":like_count, 
                                    "favorited_count":favorited_count, 
                                    "comment_count":comment_count}
                                    )
            df = pd.concat([df, new_row.to_frame().T], ignore_index=True)
        

        return df

In [7]:
terms = ['advanced persistent threat', 'cyber security', 'APT cyber', 
        'hacking', 'ransomware', 'state sponsored hackers', 
        'anonymous hackers', 'crimeware', 'malware', 
        'offensive security', 'phreaking', 'kevin mitnick', 
        'ive been hacked', 'russian hackers', 'chinese hackers', 
        'north korean hackers', 'iranian hackers', 'american hackers']

# term_df = pd.read_csv("https://raw.githubusercontent.com/StrangerealIntel/EternalLiberty/main/EternalLiberty.csv")
# terms = term_df['Threat Actor Official Name'].to_list()

In [8]:
video_ids = []

for t in terms:
    search_results = youtube_search(t, 1000)
    for video in search_results:
        if video['id']['kind'] == "youtube#video":
            video_id = video['id']['videoId']
            video_ids.append(video_id)

In [9]:
# have to do this for max 50 id passes per chunk
slice_start = 0
slice_end = 9
counter = 1

df = pd.DataFrame(columns=["added_date", "channel_id", "channel_title", "vid_id", "title", "description", "tags", "category_id", "duration", "view_count", "like_count", "favorited_count", "comment_count"]) 

while len(video_ids) >= slice_end:

    video_id_slice = ','.join(video_ids[slice_start:slice_end])
    df_a = pd.DataFrame(columns=["added_date", "channel_id", "channel_title", "vid_id", "title", "description", "tags", "category_id", "duration", "view_count", "like_count", "favorited_count", "comment_count"]) 
    new_df = youtube_stats(video_id_slice, df_a)
    df = pd.concat([df, new_df], ignore_index=True)

    slice_start += 9
    slice_end += 9
    counter += 1

In [10]:
df1 = pd.DataFrame(columns=["vid_id", "comment"])
for vid_id in df['vid_id']:
    df1 = pd.concat([df1, youtube_comments(vid_id)], ignore_index=True)

combined_df = df.set_index('vid_id').join(df1.set_index('vid_id'))

In [11]:
combined_df

Unnamed: 0_level_0,added_date,channel_id,channel_title,title,description,tags,category_id,duration,view_count,like_count,favorited_count,comment_count,comment
vid_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
-0fFmGoOL7o,2021-05-22T06:47:53Z,UC4mk8pXgOdg_f98_Qmd51lA,Tio Shadow,Status para whatsapp-Crimeware,,,22,PT1M37S,19,4,0,0,
-ACVFM9yNR0,2022-08-04T07:46:37Z,UCt2LlN_IGp3QH07WLVaw6NA,ESET Česká republika,ESET Threat Report T1 2022,Report začíná již tradičně krátkým shrnutím a ...,[ESET],28,PT4M56S,73,0,0,0,
-AyM2UDDoYE,2022-04-10T18:40:38Z,UCMf7wlcoGWaXQHY0o8ahSEQ,DUCTUS EDGE,Cyber attack attempt on Power Grid of India th...,Cyber attacks in India are concern and cyber s...,,22,PT1M,912,34,0,0,
-CWUqrJAGaI,2022-10-10T05:38:50Z,UC5r3Ff-EwRezCtfT3GQgoYg,i am poor,better then Chinese hackers sparys 🤯🫰#shorts #...,,,20,PT53S,3,3,0,0,
-EF3U9eH4MY,2022-09-01T14:06:03Z,UCVkqN-3u8Aa7POrnxFK5e1A,myAppsOnline,Phishing Exploit Hacks LinkedIn 2Factor Authen...,"May 5, 2018 Kevin Mitnick shows how the exploi...","[#KnowBe4, #phishing, #ransomware, #cybersecur...",22,PT6M21S,40,3,0,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
zsQZkX1TKqI,2022-11-24T07:30:58Z,UChxIPVfAkpCIulPXB-NET7w,Greater Iran,Iranian Hackers (Moses Staff) Attack Israel,ضربه امنیتی بزرگ به اسرائیل؛ ویدیویی که عصای م...,"[قدس اشغالی, عصای موسی, Moses Staff, Hackers, ...",25,PT2M,1928,173,0,110,خداروشکر. \nبه امید اتفاقات بیشتر.در سرزمین اش...
zsQZkX1TKqI,2022-11-24T07:30:58Z,UChxIPVfAkpCIulPXB-NET7w,Greater Iran,Iranian Hackers (Moses Staff) Attack Israel,ضربه امنیتی بزرگ به اسرائیل؛ ویدیویی که عصای م...,"[قدس اشغالی, عصای موسی, Moses Staff, Hackers, ...",25,PT2M,1928,173,0,110,درود
zsQZkX1TKqI,2022-11-24T07:30:58Z,UChxIPVfAkpCIulPXB-NET7w,Greater Iran,Iranian Hackers (Moses Staff) Attack Israel,ضربه امنیتی بزرگ به اسرائیل؛ ویدیویی که عصای م...,"[قدس اشغالی, عصای موسی, Moses Staff, Hackers, ...",25,PT2M,1928,173,0,110,سلام دمتون گرم 💕💕👌👌💪💪
zsQZkX1TKqI,2022-11-24T07:30:58Z,UChxIPVfAkpCIulPXB-NET7w,Greater Iran,Iranian Hackers (Moses Staff) Attack Israel,ضربه امنیتی بزرگ به اسرائیل؛ ویدیویی که عصای م...,"[قدس اشغالی, عصای موسی, Moses Staff, Hackers, ...",25,PT2M,1928,173,0,110,خسته نباشید


In [12]:
combined_df.to_csv("youtube-data.csv")