# Youtube Data Extraction 
## Task
Write a script that extracts YouTube data to analyze the #endsars# trend that rocked the entire world.
The script should be able to perform the following:
* Filter out channels and playlists.
* Get only videos published this year.
* Include videos that are between 4 to 20 mins long.
* Generic such that the search query can be changed.

## Output
Store the output into a csv with the filename having the following format: current_timestamp_youtube_data.

The following video attributes should be a part of the dataset:
* the time video was published
* the video id
* the title of the video
* description
* the URL of the video thumbnail
* number of views
* number of likes
* number of dislikes
* number of comments

Create an additional the column that builds the video URL using the video id.


## Install packages
Install packages if not existing in your environment

In [None]:
!pip install --upgrade google-api-python-client
!conda install pandas

Requirement already up-to-date: google-api-python-client in c:\users\kingsley\miniconda3\envs\py-3.7\lib\site-packages (1.12.8)


## Import packages

In [2]:
import os
import csv
import json

from googleapiclient.discovery import build

print("Packages imported successfully.")

Packages imported successfully.


## Set API Parameters

In [3]:
api_key = os.environ.get('YOUTUBE_V3_API_KEY')
api_version='V3'

youtube = build('youtube', api_version, developerKey=api_key)

print("API parameters set successfully.")

API parameters set successfully.


## Set Search Request Rate Limit
The YouTube Data API uses a quota to ensure that developers use the service as intended. The Rate Limit is used to control the maximum number that can be made. 

Recommended Max Rate Limit: 5

In [4]:
search_rate_limit = 5

print("Search Rate Limit has been set to {0}.".format(search_rate_limit))

Search Rate Limit has been set to 5.


## Make Search Request

In [5]:
allItems = []
count = 0

query = '#endsars'
max_results = 50
nextPage_token = None

while (count < search_rate_limit) and 1:
    searchRequest = youtube.search().list(
        part='snippet',
        type='video',
        q=query,
        videoDuration='medium',
        publishedAfter='2020-01-01T00:00:00Z',
        maxResults=max_results,
        pageToken=nextPage_token
    )

    searchResponse = searchRequest.execute()

    allItems += searchResponse['items']

    nextPage_token = searchResponse.get('nextPageToken')

    count += 1

    if nextPage_token is None:
        break
    
print("Total numer of items: {0}".format(len(allItems)))

# TODO: Show only success message. Handle Exceptions

Total numer of items: 250


## Extract videos ids

In [6]:
videosIds = list(map(lambda x:x['id']['videoId'], allItems))

print("{0} video IDs was extracted succesfully.".format(len(videosIds)))

250 video IDs was extracted succesfully.


## Make Videos Request

In [7]:
allVideoItems = []
count = 0

nextPage_token = None

while (count < len(videosIds)) and 1: 
    videosRequest = youtube.videos().list(
        part='snippet, statistics',
        id=",".join(videosIds[count:count+50]),
    )

    videosResponse = videosRequest.execute()
    
    allVideoItems += videosResponse['items']
    
    count+=50
    
print("Total numer of items: {0}".format(len(allVideoItems)))

# TODO: Show only success message. Handle Exceptions

Total numer of items: 250


In [11]:
video_id = []
video_title = []
video_desc = []
time_published = []
thumbnail_url = []
view_count = []
like_count = []
dislike_count = []
comments_count = []
video_url = []

for i in allVideoItems:
    video_id.append(i['id'])
    video_title.append(i['snippet']['title'])
    video_desc.append(i['snippet']['description'])
    time_published.append(i['snippet']['publishedAt'])  
    thumbnail_url.append(i['snippet']['thumbnails']['default']['url'])
    
    view_count.append(i['statistics'].get('viewCount'))
    like_count.append(i['statistics'].get('likeCount'))
    dislike_count.append(i['statistics'].get('dislikeCount'))
    comments_count.append(i['statistics'].get('commentCount'))
    
    video_url.append("https://www.youtube.com/watch?v=" + i['id'])

print("Data extracted successfully.")

# TODO: Handle nonType being assigned instead of number or string

Data extracted successfully.


In [12]:
import pandas as pd

data = {
    'videoId': video_id,
    'tltle': video_title,
    'description': video_desc,
    'timePublished': time_published,
    'thumnailUrl': thumbnail_url,
    'views': view_count,
    'likes': like_count,
    'dislikes': dislike_count,
    'comments': comments_count,
    'videoUrl': video_url
}

df = pd.DataFrame(data)

df.head()

Unnamed: 0,videoId,tltle,description,timePublished,thumnailUrl,views,likes,dislikes,comments,videoUrl
0,2m8tiFokS78,#ENDSARS: The Nigerian footballers protesting ...,#ENDSARS: The Nigerian footballers protesting ...,2020-11-11T07:00:01Z,https://i.ytimg.com/vi/2m8tiFokS78/default.jpg,26605,2343,28,297,https://www.youtube.com/watch?v=2m8tiFokS78
1,DTFofOXS5PU,Why are Nigerians protesting? The #EndSARS mov...,Amid mounting pressure from residents and an o...,2020-11-12T16:00:01Z,https://i.ytimg.com/vi/DTFofOXS5PU/default.jpg,2484,41,5,4,https://www.youtube.com/watch?v=DTFofOXS5PU
2,qHr-wa6sFpw,What is the #EndSars movement in Nigeria? | #T...,The #EndSars hashtag has been trending for wee...,2020-10-27T12:04:00Z,https://i.ytimg.com/vi/qHr-wa6sFpw/default.jpg,2681,48,2,2,https://www.youtube.com/watch?v=qHr-wa6sFpw
3,Il5qL7YbawY,End Sars protests: People 'shot dead' in Lagos...,A number of people taking part in a protest ag...,2020-10-21T12:47:05Z,https://i.ytimg.com/vi/Il5qL7YbawY/default.jpg,199928,2905,173,1177,https://www.youtube.com/watch?v=Il5qL7YbawY
4,0MwlFIPy0OI,Police brutality in Nigeria: what is the #EndS...,"After days of fierce protests, Nigeria's gover...",2020-10-13T19:40:33Z,https://i.ytimg.com/vi/0MwlFIPy0OI/default.jpg,53570,571,14,50,https://www.youtube.com/watch?v=0MwlFIPy0OI


## Store data to file using current timestamp

In [13]:
import time

current_timestamp = time.strftime("%Y%m%d-%H%M%S")
file_name = current_timestamp+"_youtube_data.csv"
df.to_csv(file_name, index=False)

print("File created successfully. File name is {0}".format(file_name))
# TODO: Handle Error exceptions

File created successfully. File name is 20201125-111217_youtube_data.csv
