## Course1 : Foundation of information

**Assignment**: Data extraction and analysis from social media platform Youtube ( 30 Marks )

**Problem statement**

Videos are a fast growing medium where people communicate, share knowledge, showcase skills etc. YouTube is one of the biggest platforms which hosts videos. The YouTube platform hosts content from many different professions/arts/ cultures across the world.

People can express their opinion about the video in the form of likes, dislikes, comments which are features provided by the YouTube platform which provides the information on the sentiment about the video.

The assignment involves the steps on programmatic data extraction from YouTube on which analysis can be conducted to understand various attributes related to a video.

**Steps to be performed**

1. Connect to the Youtube API using a Python client ( 5 Marks )



> 1.a Create a YouTube API key (3 marks)





> 1.b Install the Google API python client  (2 marks)



refer to the [supporting](https://developers.google.com/youtube/v3/getting-started) link on how to create YouTube API Key

Reference link : https://developers.google.com/youtube/v3/quickstart/python

In [None]:
#pip install --upgrade google-api-python-client
#pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

2. Search and extract the data



> 2.a Search videos related to the query string  “avatar movie”
(For this part, choose/search one video of your choice and perform data collection steps on that specific video ) (3 marks)

> Output expected : ID, Snippet with following attributes Channel ID, Video Description, Channel Title, Video Title






Reference link:  https://developers.google.com/youtube/v3/docs/search/list

In [None]:
import json
import googleapiclient.discovery

api_service_name = "youtube"
api_version = 'v3'
developer_key = "Your API Key"

youtube = googleapiclient.discovery.build(
    api_service_name, api_version, developerKey = developer_key)

search_response = youtube.search().list(
    q='avatar movie',
    part='id,snippet',
    type='video',
).execute()

video = search_response['items'][0]
video_id = video['id']['videoId']
snippet = video['snippet']
channel_id = snippet['channelId']
video_description = snippet['description']
channel_title = snippet['channelTitle']
video_title = snippet['title']

print(f'Video ID: {video_id}')
print(f'Snippet: {json.dumps(snippet, indent=2)}')
print(f'Channel ID: {channel_id}')
print(f'Video Description: {video_description}')
print(f'Channel Title: {channel_title}')
print(f'Video Title: {video_title}')


> 2.b  Provide the following statistics for query string “avatar movie” of top 50 videos sorted by relevance in the US region ( 7 marks )

> Output expected: video ID, title, no of views, no of likes,no of comments exported to CSV file






Reference link: https://developers.google.com/youtube/v3/docs/videos/list

In [None]:
import csv
import pandas as pd

# Search for the top 50 videos related to the query string "avatar movie" in the US region
search_response = youtube.search().list(
    q='avatar movie',
    part='id, snippet',
    type='video',
    maxResults=50,
    regionCode='US',
).execute()

# Get statistics for each video
video_data_list = []
for video in search_response['items']:
    video_id = video['id']['videoId']
    snippet = video['snippet']
    video_response = youtube.videos().list(
        part='statistics',
        id=video_id,
    ).execute()
    video_statistics = video_response['items'][0]['statistics']
    video_data = {
        'Video ID': video_id,
        'Title': snippet['title'],
        'Views': video_statistics.get('viewCount', ''),
        'Likes': video_statistics.get('likeCount', ''),
        'Comments': video_statistics.get('commentCount', ''),
    }
    video_data_list.append(video_data)

# Create DataFrame from video data list
df = pd.DataFrame(video_data_list)

# Write the data to a CSV file
df.to_csv('Desktop/youtube_videos.csv', index=False)

# Output the data
print(video_data_list)

 3. Analyze the exported data obtained in 2.b and carry out the following tasks (15 marks )



> 3.a Sort the data 2.b  by top 10 comments in descending order and consider the video IDs and Titles of top 10 videos which have highest comments. (3mark)



In [None]:
top_10_videos= sorted(video_data_list, key=lambda item: item['Comments'], reverse=True)[:10]
for video in top_10_videos:
    print(video['Video ID'] + " " + video['Title'])


> 3.b Use a suitable method to retrieve comments of those top 10 videos from 3.a. For doing this, write a program to loop through each video id from 3.a and pass in the part parameter set to "snippet", to retrieve basic details about the comments. Execute this request and print the response using the pprint() method.
 - Note: pprint() will print out the response from the API in a more human-readable format.
- Reference link:  [link](https://developers.google.com/youtube/v3/docs )


> **Output expected** : Use the python library “ pprint “ to print the output of the program with the following properties  etag, items, id , kind, snippet and snippet to have the text display field which represents the comment of videos.






In [None]:
import requests
import googleapiclient.discovery
from pprint import pprint

def get_video_comments(id):
    request = youtube.commentThreads().list(
        part='snippet',
        videoId=id)
    response = request.execute()
    return response

for video in top_10_videos:
    pprint(get_video_comments(video['Video ID']))



> 3.c Write a program to export the output of question 3.b in JSON file format and submit the file as part of the assignment (3 marks)



In [None]:
import json

json_output = "Desktop/json_output_comments"

with open(json_output, 'w') as json_file:
    for video in top_10_videos:
        json.dump(get_video_comments(video['Video ID']), json_file, indent=2)

>3.d Write a function to get  the likes vs views ratio of the top 10 videos obtained in 3.a with the highest comments (3 marks)




In [None]:
def get_likes_views():
    likes_views_ratios = []
    for videos in top_10_videos:
        likes_views_ratios.append({
            'video_id': videos['Video ID'],
            'ratio': int(videos['Likes']) / int(videos['Views'])})
    return likes_views_ratios

print(get_likes_views())