# Youtube Data Extraction 
## Task
Write a script that extracts YouTube data to analyze the #endsars# trend that rocked the entire world.
The script should be able to perform the following:
* Filter out channels and playlists.
* Get only videos published this year.
* Include videos that are between 4 to 20 mins long.
* Generic such that the search query can be changed.

## Output
Store the output into a csv with the filename having the following format: current_timestamp_youtube_data.

The following video attributes should be a part of the dataset:
* the time video was published
* the video id
* the title of the video
* description
* the URL of the video thumbnail
* number of views
* number of likes
* number of dislikes
* number of comments

Create an additional the column that builds the video URL using the video id.


## Install packages
Install packages if not existing in your environment

In [None]:
!pip install --upgrade google-api-python-client

!pip install python-decouple

## Import packages

In [None]:
import os
import csv
import json
from decouple import config

from googleapiclient.discovery import build

print("Packages imported successfully.")

## Set API Parameters

In [None]:
api_key = config('API-KEY')

youtube = build('youtube', 'V3', developerKey=api_key)

print("API parameters set successfully.")

## Set Search Request Rate Limit
The YouTube Data API uses a quota to ensure that developers use the service as intended. The Rate Limit is used to control the maximum number that can be made. 

Recommended Max Rate Limit: 5

In [None]:
search_rate_limit = 5

print("Search Rate Limit has been set to {0}.".format(search_rate_limit))

## Make Search Request

In [None]:
allItems = []
count = 0

query = '#endsars'
max_results = 50
nextPage_token = None

while (count < search_rate_limit) and 1:
    searchRequest = youtube.search().list(
        part='snippet',
        type='video',
        q=query,
        videoDuration='medium',
        publishedAfter='2020-01-01T00:00:00Z',
        maxResults=max_results,
        pageToken=nextPage_token
    )

    searchResponse = searchRequest.execute()

    allItems += searchResponse['items']

    nextPage_token = searchResponse.get('nextPageToken')

    count += 1

    if nextPage_token is None:
        break
    
print("Total numer of items: {0}".format(len(allItems)))

# TODO: 
# 1. Show only success message. Handle Exceptions
# 2. Place code into function

## Extract videos ids

In [None]:
videosIds = list(map(lambda x:x['id']['videoId'], allItems))

print("{0} video IDs was extracted succesfully.".format(len(videosIds)))

## Make Videos Request

In [None]:
allVideoItems = []
count = 0

nextPage_token = None

while (count < len(videosIds)) and 1: 
    videosRequest = youtube.videos().list(
        part='snippet, statistics',
        id=",".join(videosIds[count:count+50]),
    )

    videosResponse = videosRequest.execute()
    
    allVideoItems += videosResponse['items']
    
    count+=50
    
print("Total numer of items: {0}".format(len(allVideoItems)))

# TODO: Show only success message. Handle Exceptions

In [None]:
video_id = []
video_title = []
video_desc = []
time_published = []
thumbnail_url = []
view_count = []
like_count = []
dislike_count = []
comments_count = []
video_url = []

for i in allVideoItems:
    video_id.append(i['id'])
    video_title.append(i['snippet']['title'])
    video_desc.append(i['snippet']['description'])
    time_published.append(i['snippet']['publishedAt'])  
    thumbnail_url.append(i['snippet']['thumbnails']['default']['url'])
    
    view_count.append(i['statistics'].get('viewCount'))
    like_count.append(i['statistics'].get('likeCount'))
    dislike_count.append(i['statistics'].get('dislikeCount'))
    comments_count.append(i['statistics'].get('commentCount'))
    
    video_url.append("https://www.youtube.com/watch?v=" + i['id'])

print("Data extracted successfully.")

# TODO: Handle nonType being assigned instead of number or string

In [None]:
import pandas as pd

data = {
    'videoId': video_id,
    'tltle': video_title,
    'description': video_desc,
    'timePublished': time_published,
    'thumnailUrl': thumbnail_url,
    'views': view_count,
    'likes': like_count,
    'dislikes': dislike_count,
    'comments': comments_count,
    'videoUrl': video_url
}

df = pd.DataFrame(data)

df.head()

## Store data to file using current timestamp

In [None]:
import time

def save_to_csv(df):
    path = os.getcwd() + '\data\\'
    current_timestamp = time.strftime("%y%m%d_%H%M%S")

    if not os.path.exists(path):
        os.mkdir(path)

    file_name = current_timestamp+"_youtube_data.csv"
    
    full_path = os.path.join(path, file_name)
    
    df.to_csv(full_path, index=False)
    
    return full_path

# TODO: Handle Error exceptions

In [None]:
path = save_to_csv(df)
print("File created successfully. File save to: {0}".format(path))