# Data extraction and analysis from social media platform Youtube.

**Steps to be performed**: I have performed few operations to get hands-on and learning purposes.

1. Connect to the Youtube API using a Python client



> 1.a Create a YouTube API key





> 1.b Install the Google API python client



refer to the [supporting](https://developers.google.com/youtube/v3/getting-started) link on how to create YouTube API Key

Reference link : https://developers.google.com/youtube/v3/quickstart/python

In [1]:
# 1a 
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = 'API_KEY' #Enter the generated Youtube API key here

In [2]:
#1b
! pip install google-api-python-client

Collecting google-api-python-client
  Obtaining dependency information for google-api-python-client from https://files.pythonhosted.org/packages/15/ef/e5515c6eab9eb5dda9b33ec17b8d43c1e71eb063642f5684bbfc4ddc038d/google_api_python_client-2.117.0-py2.py3-none-any.whl.metadata
  Using cached google_api_python_client-2.117.0-py2.py3-none-any.whl.metadata (6.6 kB)
Using cached google_api_python_client-2.117.0-py2.py3-none-any.whl (12.0 MB)
Installing collected packages: google-api-python-client
Successfully installed google-api-python-client-2.117.0


2. Search and extract the data



> 2.a Search videos related to the query string  “avatar movie”
(For this part, choose/search one video of your choice and perform data collection steps on that specific video )

> Output expected : ID, Snippet with following attributes Channel ID, Video Description, Channel Title, Video Title






Reference link:  https://developers.google.com/youtube/v3/docs/search/list

In [3]:
#2a
import googleapiclient.discovery as gd
from pprint import pprint as pp
youtube = gd.build(
        api_service_name, api_version, developerKey = DEVELOPER_KEY)
request = youtube.search().list(
        part="id,snippet",
        type='video',
        q="avatar movie",
        maxResults=1,
        fields="items(id(videoId),snippet(channelId,description,channelTitle,title))"
).execute()
pp(request)

{'items': [{'id': {'videoId': 'PLtgIILX7E8'},
            'snippet': {'channelId': 'UC0A86RKLCqTEUna3hPlEpzg',
                        'channelTitle': 'Superhero FXL Games',
                        'description': 'AVATAR Full Movie 2023: Fallen Kingdom '
                                       '| Superhero FXL Action Movies 2023 in '
                                       'English (Game Movie). Best Action Game '
                                       '...',
                        'title': 'AVATAR Full Movie 2023: Fallen Kingdom | '
                                 'Superhero FXL Action Movies 2023 in English '
                                 '(Game Movie)'}}]}


In [4]:
if request['items']:
    video_id = request['items'][0]['id']['videoId']
    channel_iD = request['items'][0]['snippet']['channelId']
    video_decription = request['items'][0]['snippet']['description']
    video_title = request['items'][0]['snippet']['title']
    channel_title = request['items'][0]['snippet']['channelTitle']
    print("Video ID:", video_id)
    print("Channel ID:", channel_iD)
    print("Video Description:", video_decription)
    print("Channel Title:", channel_title)
    print("Video Title:", video_title)
else:
    print("No video was found.")        



Video ID: PLtgIILX7E8
Channel ID: UC0A86RKLCqTEUna3hPlEpzg
Video Description: AVATAR Full Movie 2023: Fallen Kingdom | Superhero FXL Action Movies 2023 in English (Game Movie). Best Action Game ...
Channel Title: Superhero FXL Games
Video Title: AVATAR Full Movie 2023: Fallen Kingdom | Superhero FXL Action Movies 2023 in English (Game Movie)



> 2.b  Provide the following statistics for query string “avatar movie” of top 50 videos sorted by relevance in the US region.

> Output expected: video ID, title, no of views, no of likes,no of comments exported to CSV file






Reference link: https://developers.google.com/youtube/v3/docs/videos/list

In [5]:
#2b
import pandas as pd
search_response = youtube.search().list(
    type='video',
    q='avatar movie',
    part='id,snippet',
    maxResults=50,
    regionCode='US',
    order='relevance'  # Sort by relevance
).execute()

#Method to get statistics data for each video
video_response_list=[]
def get_video_statistics(video_id):
    video_response = youtube.videos().list(
        part='statistics',
        id=video_id
    ).execute()
    video_response_list.append(video_response)
    for video_result in video_response.get('items', []):
        statistics = video_result['statistics']
        no_of_views = int(statistics.get('viewCount', 0))
        no_of_likes = int(statistics.get('likeCount', 0))
        no_of_comments = int(statistics.get('commentCount', 0))

        return no_of_views, no_of_likes, no_of_comments

#Method to create a list containing video ids,video titles,number of views,number of likes and number of comments
video_data = []
for search_result in search_response.get('items', []):
    video_id = search_result['id']['videoId']
    title = search_result['snippet']['title']
    # Get video statistics
    views, likes, comments = get_video_statistics(video_id)
    video_data.append({
            'video_id': video_id,
            'title': title,
            'Number of views': views,
            'Number of likes': likes,
            'Number of comments': comments
        }) 

# Convert the data to a DataFrame
df = pd.DataFrame(video_data)
df.head(10)    

Unnamed: 0,video_id,title,Number of views,Number of likes,Number of comments
0,ByAn8DF8Ykk,Avatar: The Last Airbender | Official Trailer ...,6956654,275095,23623
1,waJKJW_XU90,Avatar: The Last Airbender | Official Teaser |...,20176259,459446,41028
2,PLtgIILX7E8,AVATAR Full Movie 2023: Fallen Kingdom | Super...,52268071,234694,1773
3,5PSNL1qE6VY,Avatar | Official Trailer (HD) | 20th Century FOX,12914944,81708,8950
4,-egQ79OrYCs,THE LAST AIRBENDER (2010) | Hollywood.com Movi...,10660713,22287,11070
5,3y8a_TFL_KQ,AVATAR Full Movie 2024: The Way of Water | Fin...,1175021,5023,66
6,d9MyW72ELq0,Avatar: The Way of Water | Official Trailer,58581848,1042388,42982
7,3G6J-ITWekA,Jake Wakes up in his Avatar Body - AVATAR (4k ...,5992421,25021,706
8,l7uZtHJP958,Bạn có biết: Bom tấn Avatar 2 chờ đợi 1 thập k...,2759,39,0
9,2r71I8lvTIA,The Last Airbender Film: How it Disrespected a...,5207140,156505,27339


In [6]:
df.shape

(50, 5)

In [7]:
# Export the DataFrame to a CSV file
df.to_csv('pooja_avatar_movie.csv', index=False)
print("CSV file 'pooja_avatar_movie.csv' created successfully with the each video statistics.")

CSV file 'pooja_avatar_movie.csv' created successfully with the each video statistics.


 3. Analyze the exported data obtained in 2.b and carry out the following tasks.



> 3.a Sort the data 2.b  by top 10 comments in descending order and consider the video IDs and Titles of top 10 videos which have highest comments.


In [8]:
#3a
avatar_movie_df=pd.read_csv('pooja_avatar_movie.csv')
sorted_avatar_movie_dataframe=avatar_movie_df.sort_values(by='Number of comments',ascending=False)

sorted_avatar_movie_dataframe.head(10).reset_index()

Unnamed: 0,index,video_id,title,Number of views,Number of likes,Number of comments
0,6,d9MyW72ELq0,Avatar: The Way of Water | Official Trailer,58581848,1042388,42982
1,1,waJKJW_XU90,Avatar: The Last Airbender | Official Teaser |...,20176259,459446,41028
2,9,2r71I8lvTIA,The Last Airbender Film: How it Disrespected a...,5207140,156505,27339
3,0,ByAn8DF8Ykk,Avatar: The Last Airbender | Official Trailer ...,6956654,275095,23623
4,43,MgV9vymLIdQ,The Last Airbender is the Worst Film Ever Made...,6217942,116693,14718
5,4,-egQ79OrYCs,THE LAST AIRBENDER (2010) | Hollywood.com Movi...,10660713,22287,11070
6,47,bDHD1ueL4a4,The Weeknd - Nothing Is Lost (You Give Me Stre...,29679871,650830,9651
7,3,5PSNL1qE6VY,Avatar | Official Trailer (HD) | 20th Century FOX,12914944,81708,8950
8,49,RGx8rYbRVR4,Why People Hate Avatar: A Lesson In Lazy Comme...,551863,31012,4742
9,20,bpI3oxTEd_I,The Secret Airbender in Avatar the Last Airben...,8879311,754251,3926


In [9]:
#Considering the video IDs and Titles of top 10 videos which have highest comments.
df_top10_videos=sorted_avatar_movie_dataframe[['video_id', 'title']].head(10)
df_top10_videos

Unnamed: 0,video_id,title
6,d9MyW72ELq0,Avatar: The Way of Water | Official Trailer
1,waJKJW_XU90,Avatar: The Last Airbender | Official Teaser |...
9,2r71I8lvTIA,The Last Airbender Film: How it Disrespected a...
0,ByAn8DF8Ykk,Avatar: The Last Airbender | Official Trailer ...
43,MgV9vymLIdQ,The Last Airbender is the Worst Film Ever Made...
4,-egQ79OrYCs,THE LAST AIRBENDER (2010) | Hollywood.com Movi...
47,bDHD1ueL4a4,The Weeknd - Nothing Is Lost (You Give Me Stre...
3,5PSNL1qE6VY,Avatar | Official Trailer (HD) | 20th Century FOX
49,RGx8rYbRVR4,Why People Hate Avatar: A Lesson In Lazy Comme...
20,bpI3oxTEd_I,The Secret Airbender in Avatar the Last Airben...



> 3.b Use a suitable method to retrieve comments of those top 10 videos from 3.a. For doing this, write a program to loop through each video id from 3.a and pass in the part parameter set to "snippet", to retrieve basic details about the comments. Execute this request and print the response using the pprint() method.







In [10]:
#3b
# Function to retrieve comments for a video
def get_video_comments(video_id):
    response = youtube.commentThreads().list(
        part="snippet",
        videoId=video_id
    ).execute()
    return response
all_comments_responses=[]
for video_id in df_top10_videos['video_id']:
    response = get_video_comments(video_id)
    all_comments_responses.append(response)
pp(all_comments_responses)

[{'etag': 'X8vEkaOeuZDbwCOmf8tGx3J4N0E',
  'items': [{'etag': 'Cj81Xe85Xi7dPzCxsNe48pRe5YA',
             'id': 'UgzJc-jfmGiJRSe5Rz14AaABAg',
             'kind': 'youtube#commentThread',
             'snippet': {'canReply': True,
                         'channelId': 'UCgjxQJ6TlKqhHax8742ZMdA',
                         'isPublic': True,
                         'topLevelComment': {'etag': 'UZgp0i8y8rbCL7fQZ9PsyJ6GodU',
                                             'id': 'UgzJc-jfmGiJRSe5Rz14AaABAg',
                                             'kind': 'youtube#comment',
                                             'snippet': {'authorChannelId': {'value': 'UCeXQCG7Ykj1RGWs8RjeruLA'},
                                                         'authorChannelUrl': 'http://www.youtube.com/@LisaLisa-qc8xe',
                                                         'authorDisplayName': '@LisaLisa-qc8xe',
                                                         'authorProfileImageUrl': 'https://y

 {'etag': 'Femx06bCoHzZjfTJOz1H2L8Wc1Y',
  'items': [{'etag': 'V8OVAyBlq6xEoSw7eOB21ZeLhGA',
             'id': 'Ugz4Hw1ZrPQ3ol6N4tl4AaABAg',
             'kind': 'youtube#commentThread',
             'snippet': {'canReply': True,
                         'channelId': 'UCF_fDSgPpBQuh1MsUTgIARQ',
                         'isPublic': True,
                         'topLevelComment': {'etag': 'hVrsZZ-3iHDRN2z_yeC_FKUuFow',
                                             'id': 'Ugz4Hw1ZrPQ3ol6N4tl4AaABAg',
                                             'kind': 'youtube#comment',
                                             'snippet': {'authorChannelId': {'value': 'UCuRLflc8VbR07xK5mi8efSw'},
                                                         'authorChannelUrl': 'http://www.youtube.com/@ReeseStone7',
                                                         'authorDisplayName': '@ReeseStone7',
                                                         'authorProfileImageUrl': 'https://yt3.ggp



> 3.c Write a program to export the output of question 3.b in JSON file format.


In [11]:
#3c
import json
# Define the filename for the JSON file
output_file = "pooja_avatar_movie_comments.json"

# Write to the json file
with open(output_file, 'w') as f:
    json.dump(all_comments_responses, f)

print(f"Comments successfully exported to '{output_file}'")

Comments successfully exported to 'pooja_avatar_movie_comments.json'


>3.d Write a function to get  the likes vs views ratio of the top 10 videos obtained in 3.a with the highest comments.




In [12]:
#3d
import numpy as np
top_10_videos_data=sorted_avatar_movie_dataframe.head(10).copy()
def calculate_likes_views_ratio(top_10_videos_data):
    # Calculate likes vs views ratio for each video
    top_10_videos_data[ 'Likes Views Ratio']= top_10_videos_data['Number of likes'] / top_10_videos_data['Number of views'] 

    # Create a new dataframe with video ID, title, likes, views, and the ratio
    ratio_df = top_10_videos_data[['video_id','title','Number of likes', 'Number of views', 'Likes Views Ratio']].copy()


    return ratio_df
top10_videos_with_likes_views_ratio = calculate_likes_views_ratio(top_10_videos_data)

top10_videos_with_likes_views_ratio['Likes Views Ratio'] = top10_videos_with_likes_views_ratio['Likes Views Ratio'].replace(np.inf, 0)
top10_videos_with_likes_views_ratio.reset_index(drop=True)

Unnamed: 0,video_id,title,Number of likes,Number of views,Likes Views Ratio
0,d9MyW72ELq0,Avatar: The Way of Water | Official Trailer,1042388,58581848,0.017794
1,waJKJW_XU90,Avatar: The Last Airbender | Official Teaser |...,459446,20176259,0.022772
2,2r71I8lvTIA,The Last Airbender Film: How it Disrespected a...,156505,5207140,0.030056
3,ByAn8DF8Ykk,Avatar: The Last Airbender | Official Trailer ...,275095,6956654,0.039544
4,MgV9vymLIdQ,The Last Airbender is the Worst Film Ever Made...,116693,6217942,0.018767
5,-egQ79OrYCs,THE LAST AIRBENDER (2010) | Hollywood.com Movi...,22287,10660713,0.002091
6,bDHD1ueL4a4,The Weeknd - Nothing Is Lost (You Give Me Stre...,650830,29679871,0.021928
7,5PSNL1qE6VY,Avatar | Official Trailer (HD) | 20th Century FOX,81708,12914944,0.006327
8,RGx8rYbRVR4,Why People Hate Avatar: A Lesson In Lazy Comme...,31012,551863,0.056195
9,bpI3oxTEd_I,The Secret Airbender in Avatar the Last Airben...,754251,8879311,0.084945
