# Youtube Data Extraction 
## Task
Write a script that extracts YouTube data to analyze the #endsars# trend that rocked the entire world.
The script should be able to perform the following:
* Filter out channels and playlists.
* Get only videos published this year.
* Include videos that are between 4 to 20 mins long.
* Generic such that the search query can be changed.

## Output
Store the output into a csv with the filename having the following format: current_timestamp_youtube_data.

The following video attributes should be a part of the dataset:
* the time video was published
* the video id
* the title of the video
* description
* the URL of the video thumbnail
* number of views
* number of likes
* number of dislikes
* number of comments

Create an additional the column that builds the video URL using the video id.


## Install packages

In [1]:
!pip install --upgrade google-api-python-client
!conda install pandas

Requirement already up-to-date: google-api-python-client in c:\users\kingsley\miniconda3\envs\py-3.7\lib\site-packages (1.12.8)
Collecting package metadata (current_repodata.json): ...working... donePackages installed successfully
Solving environment: ...working... done


# All requested packages already installed.



## Import packages

In [12]:
import os
import csv
import json
import pandas as pd

from googleapiclient.discovery import build

print("Packages imported successfully.")

Packages imported successfully.


## Set API Parameters

In [13]:
api_key = os.environ.get('YOUTUBE_V3_API_KEY')
api_service_name = 'youtube'
api_version='V3'

youtube = build(api_service_name, api_version, developerKey=api_key)

print("API parameters set successfully.")

API parameters set successfully.


## Set Search Request Rate Limit
The YouTube Data API uses a quota to ensure that developers use the service as intended. The Rate Limit is used to control the maximum number that can be made. 

Recommended Max Rate Limit: 5

In [14]:
search_rate_limit = 4

print("Search Rate Limit has been set to {0}.".format(search_rate_limit))

Search Rate Limit has been set to 4


## Make Search Request

In [15]:
allItems = []
count = 0

max_results = 50
nextPage_token = None

while (count < search_rate_limit) and 1:
    searchRequest = youtube.search().list(
        part='snippet',
        type='video',
        q='#endsars',
        videoDuration='medium',
        publishedAfter='2020-01-01T00:00:00Z',
        maxResults=max_results,
        pageToken=nextPage_token
    )

    searchResponse = searchRequest.execute()

    allItems += searchResponse['items']

    nextPage_token = searchResponse.get('nextPageToken')

    count += 1

    if nextPage_token is None:
        break
    
print("Total numer of items: {0}".format(len(allItems)))

# TODO: Show only success message. Handle Exceptions

Total numer of items: 200


## Extract videos ids

In [41]:
videosIds = list(map(lambda x:x['id']['videoId'], allItems))

print("{0} video IDs was extracted succesfully.".format(len(videosIds)))

200 video IDs was extracted succesfully.


## Make Videos Request

In [44]:
allVideoItems = []
count = 0

nextPage_token = None

while (count < len(videosIds)) and 1: 
    videosRequest = youtube.videos().list(
        part='snippet, statistics',
        id=videosIds[count:count+50],
    )

    videosResponse = videosRequest.execute()
    
    allVideoItems += videosResponse['items']
    
    count+=50
    
print("Total numer of items: {0}".format(len(allVideoItems)))

# TODO: Show only success message. Handle Exceptions

Total numer of items: 200


In [47]:
video_id = []
video_title = []
video_desc = []
time_published = []
thumbnail_url = []
views_count = []
likes_count = []
dislike_count = []
comments_count = []

for i in allVideoItems:
    video_id.append(i['id'])
    video_title.append(i['snippet']['title'])
    video_desc.append(i['snippet']['description'])
    time_published.append(i['snippet']['publishedAt'])
    
    
    thumbnail_url.append(i['snippet']['thumbnails']['url'])
    views_count.append(i['statistics']['viewCount'])
    likes_count.append(i['statistics']['likeCount'])
    dislike_count.append(i['statistics']['dislikeCount'])
    comments_count.append(i['statistics']['commentCount'])

len(video_id)

KeyError: 'url'