**Data extraction and analysis from social media platform Youtube**

**Problem statement**

Videos are a fast growing medium where people communicate, share knowledge, showcase skills etc. YouTube is one of the biggest platforms which hosts videos. The YouTube platform hosts content from many different professions/arts/ cultures across the world.

People can express their opinion about the video in the form of likes, dislikes, comments which are features provided by the YouTube platform which provides the information on the sentiment about the video.

The assignment involves the steps on programmatic data extraction from YouTube on which analysis can be conducted to understand various attributes related to a video.

**Steps to be performed**

1. Connect to the Youtube API using a Python client 



> 1.a Create a YouTube API key






In [1]:
# Steps to Obtain a YouTube API Key:
# 1. Navigate to the website: https://console.cloud.google.com/
# 2. Create a new project and begin the setup for the subsequent steps.
# 3. From the left dropdown in the Navigation menu, choose "APIs & Services" >> "Library."
# 4. Search for "YouTube Data API v3" and activate it.
# 5. Proceed to "APIs & Services" > "Credentials" and generate your API Key.


# api_key = "AIzaSyBXoUg_omyWBIVcpgTXaESBhCqzQEQKm0E"



> 1.b Install the Google API python client  



In [2]:
pip install --upgrade google-api-python-client

Collecting google-api-python-client
  Downloading google_api_python_client-2.112.0-py2.py3-none-any.whl.metadata (6.6 kB)
Downloading google_api_python_client-2.112.0-py2.py3-none-any.whl (13.0 MB)
   ---------------------------------------- 0.0/13.0 MB ? eta -:--:--
   ---------------------------------------- 0.0/13.0 MB ? eta -:--:--
   ---------------------------------------- 0.0/13.0 MB 991.0 kB/s eta 0:00:14
   ---------------------------------------- 0.0/13.0 MB 991.0 kB/s eta 0:00:14
   ---------------------------------------- 0.0/13.0 MB 991.0 kB/s eta 0:00:14
   ---------------------------------------- 0.0/13.0 MB 991.0 kB/s eta 0:00:14
   ---------------------------------------- 0.0/13.0 MB 991.0 kB/s eta 0:00:14
    --------------------------------------- 0.2/13.0 MB 518.9 kB/s eta 0:00:25
    --------------------------------------- 0.2/13.0 MB 567.2 kB/s eta 0:00:23
   - -------------------------------------- 0.3/13.0 MB 840.2 kB/s eta 0:00:16
   - -------------------------


[notice] A new release of pip is available: 23.3.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


refer to the [supporting](https://developers.google.com/youtube/v3/getting-started) link on how to create YouTube API Key

Reference link : https://developers.google.com/youtube/v3/quickstart/python

2. Search and extract the data



> 2.a Search videos related to the query string  “avatar movie”
(For this part, choose/search one video of your choice and perform data collection steps on that specific video ) 

> Output expected : ID, Snippet with following attributes Channel ID, Video Description, Channel Title, Video Title






Reference link:  https://developers.google.com/youtube/v3/docs/search/list

In [3]:
#(2.a)
# Python import statements
import googleapiclient.discovery
from pprint import pprint

# Initializing Path
api_service_name = "youtube"
api_version = "v3"
api_key = "AIzaSyBXoUg_omyWBIVcpgTXaESBhCqzQEQKm0E" 

# Performing Search
youtube = googleapiclient.discovery.build(api_service_name, api_version, developerKey=api_key)
response = youtube.search().list(
    q="avatar movie",
    type='video',
    part="id,snippet",
    maxResults=1, # searching one video of my choice
    fields="items(id(videoId),snippet(channelId,description,channelTitle,title))"
).execute()

# Printing the Response
pprint(response)


{'items': [{'id': {'videoId': 'w8PmrLG6v5I'},
            'snippet': {'channelId': 'UCBOmfqgTZi7yDp4-3Lr_3lA',
                        'channelTitle': 'Shemaroo Movies',
                        'description': 'ShemarooMovies #Hindi #Movie '
                                       '#Bollywood #FullMovie #HD Click here '
                                       'to Subscribe the channel ...',
                        'title': 'BOLLYWOOD BLOCKBUSTER HINDI MOVIE - राजेश '
                                 'खन्ना ,शबाना आजमी और सचिन की सुपरहिट हिंदी '
                                 'मूवी - AVTAAR'}}]}



> 2.b  Provide the following statistics for query string “avatar movie” of top 50 videos sorted by relevance in the US region 

> Output expected: video ID, title, no of views, no of likes,no of comments exported to CSV file






Reference link: https://developers.google.com/youtube/v3/docs/videos/list

In [4]:
#(2.b)
# Python import statements
import csv
import googleapiclient.discovery
from googleapiclient.errors import HttpError
import pandas as pd

#If code executed sucessfully
try:
    youtube_statistics = googleapiclient.discovery.build(api_service_name, api_version, developerKey=api_key)
    response = youtube_statistics.search().list(
        q="avatar movie",
        type='video',
        part="id,snippet",
        order="relevance",
        regionCode="US",
        maxResults=50,
        fields="items(id(videoId),snippet(channelId,description,channelTitle,title))"
    ).execute()
    
    items = response.get("items", [])
    
    video_details= []
    
    for i in items: 
        video_ids = i['id']['videoId']
        video_details_list = youtube_statistics.videos().list(
            part='statistics',
            id= video_ids,
            fields='items(statistics)'
        ).execute()

        video_statistics = video_details_list.get('items', [{}])[0].get('statistics', {})

        video_details.append({
            "VideoID": video_ids,
            "Video_Title": i['snippet']["title"],
            "No_Of_Views": video_statistics.get("viewCount", 0),
            "No_Of_Likes": video_statistics.get("likeCount", 0),
            "No_Of_Comments": video_statistics.get("commentCount", 0)
        })

        
    # Data Exporting to CSV File
    df = pd.DataFrame(video_details)
    csv_file_name = "Aavatar_Movie_Youtube_Data.csv"
    df.to_csv(csv_file_name,  index=False)
    
    # Final Output
    print(f"Data exported to CSV file format: {csv_file_name}")

# If code doesn't exeute sucessfully
except HttpError as e:
    print(f"An error occurred: {e}")

Data exported to CSV file format: Aavatar_Movie_Youtube_Data.csv


 3. Analyze the exported data obtained in 2.b and carry out the following tasks 



> 3.a Sort the data 2.b  by top 10 comments in descending order and consider the video IDs and Titles of top 10 videos which have highest comments. 



In [5]:
# (3.a)
# Sorting the Data
sorting_data = sorted(video_details, key=lambda x: int(x["No_Of_Comments"]), reverse=True)
Top_10_Comments = sorting_data[:10]

# Converting data into DataFrame
sorted_df = pd.DataFrame(Top_10_Comments)

# Printing Top 10 Comments
print(sorted_df)

       VideoID                                        Video_Title No_Of_Views  \
0  d9MyW72ELq0        Avatar: The Way of Water | Official Trailer    58429272   
1  waJKJW_XU90  Avatar: The Last Airbender | Official Teaser |...    19258153   
2  a8Gx8wiNbs8  Avatar: The Way of Water | Official Teaser Tra...    27886757   
3  2r71I8lvTIA  The Last Airbender Film: How it Disrespected a...     5166303   
4  5PSNL1qE6VY  Avatar | Official Trailer (HD) | 20th Century FOX    12867096   
5  0sJeBiUCIt4    AVATAR Clip - Final Battle (2009) James Cameron    32541122   
6  kHNCaWjB-98  Zoe Saldana Performance Capture | AVATAR (2009...    24258498   
7  X8SVkfbt8cs                           Avatar: The Way of Water           0   
8  oFErWcXJLdw            TRAINING TO BE IN THE NEXT AVATAR MOVIE     2661850   
9  PLtgIILX7E8  AVATAR Full Movie 2023: Fallen Kingdom | Super...    49429460   

  No_Of_Likes No_Of_Comments  
0     1042594          43029  
1      443968          39686  
2      682270  


> 3.b Use a suitable method to retrieve comments of those top 10 videos from 3.a. For doing this, write a program to loop through each video id from 3.a and pass in the part parameter set to "snippet", to retrieve basic details about the comments. Execute this request and print the response using the pprint() method.
 - Note: pprint() will print out the response from the API in a more human-readable format.
- Reference link:  [link](https://developers.google.com/youtube/v3/docs )


> **Output expected** : Use the python library “ pprint “ to print the output of the program with the following properties  etag, items, id , kind, snippet and snippet to have the text display field which represents the comment of videos.






In [6]:
# (3.b)
# Collecting data for top 10 video id's from (3.a)
Comments_video_ids = []
for item in Top_10_Comments:
    video_ids = item['VideoID']
    Comments_video_ids.append(video_ids)

# Creating a new list to store comments data
comments_data = []
for video_ids in Comments_video_ids:
    # Retrieving basic details about the comments of a video
    comment_threads = youtube_statistics.commentThreads().list(
        part='snippet',
        videoId=video_ids
    ).execute()

    # Storing comment details in a variable
    comments_data.append(comment_threads)
    
    # Printing Final Output
    pprint(comments_data)
    

[{'etag': 'moc9p-oKWzvIt6qv7tmM8-qjqFs',
  'items': [{'etag': 'ltEdk7ilIzsnzndggavzIT5kQ5k',
             'id': 'Ugza1y8fWhk6yvJtQwR4AaABAg',
             'kind': 'youtube#commentThread',
             'snippet': {'canReply': True,
                         'channelId': 'UCgjxQJ6TlKqhHax8742ZMdA',
                         'isPublic': True,
                         'topLevelComment': {'etag': 'AzQGRynxgFpj1sMtHiulheENX-g',
                                             'id': 'Ugza1y8fWhk6yvJtQwR4AaABAg',
                                             'kind': 'youtube#comment',
                                             'snippet': {'authorChannelId': {'value': 'UCD3Y8NgOfBlkPnTijLbDC0A'},
                                                         'authorChannelUrl': 'http://www.youtube.com/channel/UCD3Y8NgOfBlkPnTijLbDC0A',
                                                         'authorDisplayName': '@mdgouse6729',
                                                         'authorProfileImageUr

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)





> 3.c Write a program to export the output of question 3.b in JSON file format and submit the file as part of the assignment 



In [7]:
# (3.c)

import json

# Storing the (3.b) data into a json file
json_file_name = "Avatar_Movie_Data.json"
with open(json_file_name, 'w') as json_file:
    json.dump(comments_data, json_file, indent=4)

# Printing Final Output
print(f"Data exported to JSON file format: '{json_file_name}'")



Data exported to JSON file format: 'Avatar_Movie_Data.json'


>3.d Write a function to get  the likes vs views ratio of the top 10 videos obtained in 3.a with the highest comments 




In [8]:
#  (3.d)
# Creating a function using (2.b) video details
def likes_vs_views(video_details):
    ratio_data = []
    
    # Creating loop to include details(Views & Likes) of every Video ID
    for i in video_details:
        Views = int(i["No_Of_Views"])
        Likes = int(i["No_Of_Likes"])
        
        if Views > 0:
            ratio = Likes / Views
        else:
            ratio = 0.0
        
        ratio_data.append({
            'Video ID': i['VideoID'],
            'Likes-vs-Views': ratio
        })
    
    return ratio_data

# Calculating ratios for top 10 comments from(3.a)
data = likes_vs_views(Top_10_Comments)

# Printing the Final Output
pprint(data)

[{'Likes-vs-Views': 0.017843693140657306, 'Video ID': 'd9MyW72ELq0'},
 {'Likes-vs-Views': 0.023053508817797844, 'Video ID': 'waJKJW_XU90'},
 {'Likes-vs-Views': 0.024465734757182413, 'Video ID': 'a8Gx8wiNbs8'},
 {'Likes-vs-Views': 0.030133346805249324, 'Video ID': '2r71I8lvTIA'},
 {'Likes-vs-Views': 0.006337949137862965, 'Video ID': '5PSNL1qE6VY'},
 {'Likes-vs-Views': 0.005919709836679879, 'Video ID': '0sJeBiUCIt4'},
 {'Likes-vs-Views': 0.04227978170783698, 'Video ID': 'kHNCaWjB-98'},
 {'Likes-vs-Views': 0.0, 'Video ID': 'X8SVkfbt8cs'},
 {'Likes-vs-Views': 0.018957116291301163, 'Video ID': 'oFErWcXJLdw'},
 {'Likes-vs-Views': 0.004569724209004104, 'Video ID': 'PLtgIILX7E8'}]
