In [1]:
import requests
import pandas as pd
import time
import os
import csv
import config

# API Key
The first step is to grab an API key which can be different for different API's. Since we want to grab data from a Youtube  we'll get an youtube API key from [here](https://www.slickremix.com/docs/get-api-key-for-youtube/)

## Store, Hide and get API key
One option to hide an API key is to store is as an environment variable. By accessing Control Panel > System and Security > System > Advanced Settings. Then click on the option <Environment Variables> and select "Add New". Then give the varibale a descriptive name **XXX_API_KEY** and past the variable value.

[Follow a tuturial here](https://rapidapi.com/blog/how-to-hide-an-api-key-with-python/)

To call the API_KEY just
``` 
Import os 
API_KEY = os.environ.get('XXX_API_KEY')
```

For the **Channel ID** use the following method:
- Go to the channel page you want > click on a video from the "VIDEOS" section > then access the channel again clicking on the channel icon under that video. In the URL you will have `"https://www.youtube.com/channel/{channel ID}"`

In [18]:
# retrieve API key from environment variables
API_KEY = os.environ.get('YOUTUBE_API_KEY')

# Just using a config.py and .gitignore to hide the API key

In [2]:
API_KEY = config.YOUTUBE_API_KEY
Channel_id = 'UCsG5dkqFUHZO6eY9uOzQqow'
page_token = ''

In [3]:
API_KEY

'AIzaSyBZTJj6lwxItLINdWM5VCPqnZ9xUn5eSoo'

## Testing a random API request
The get() method grabs the data located at the "github url" and the json() method returns a JSON object in the response

In [4]:
# Make a random API call
response = requests.get('https://api.github.com').json()
response

{'current_user_url': 'https://api.github.com/user',
 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
 'authorizations_url': 'https://api.github.com/authorizations',
 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
 'emails_url': 'https://api.github.com/user/emails',
 'emojis_url': 'https://api.github.com/emojis',
 'events_url': 'https://api.github.com/events',
 'feeds_url': 'https://api.github.com/feeds',
 'followers_url': 'https://api.github.com/user/followers',
 'following_url': 'https://api.github.com/user/following{/target}',
 'gists_url': 'https://api.github.com/gists{/gist_id}',
 'hub_url': 'https://api.github.com/hub',
 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
 'issues_url': 'https://api.github.com/issues',
 'keys_url': '

# Working with the Youtube API
This is the location of our data, now we just need to define what type of data we want to collect, the best way of figuring out which parameters and properties to add to the URL is read the [Official documentation](https://developers.google.com/youtube/v3/docs/search)

In [5]:
# Root URL
root_url = "https://www.googleapis.com/youtube/v3/"

In [6]:
search_params = "search?key=" + API_KEY + "&channelId=" + Channel_id + "&part=snippet,id&order=date&maxResults=10000"+ page_token

We’re performing a “search” through the YouTube API. Everything to the right of the '?' is parameters we add to request specific information.

 - First, we add our API key that’s stored in the API_KEY variable in this key parameter.
 - We specify the channel ID we want to collect information from.
 - Next is the part parameter where we’re specifying that we want snippet and ID data. From the documentation, it tells us what data we can expect to get when we ask for snippet and ID data.
 - Order the data by date and then we want the maxResults of 10000 videos in our API call.
 - Lastly, the pageToken is a token, which is a code, that is needed to get to the next page of the search results. We’ll deal with this later when we try to extract all the data.
 > As described in https://www.stratascratch.com/blog/working-with-python-apis-for-data-science-project/

In [7]:
url = root_url + search_params

In [8]:
response = requests.get(url).json()

In [9]:
response

{'kind': 'youtube#searchListResponse',
 'etag': 'k3BO6O0tfr6AMl-9ifPskk6zKYM',
 'nextPageToken': 'CDIQAA',
 'regionCode': 'PT',
 'pageInfo': {'totalResults': 1005, 'resultsPerPage': 50},
 'items': [{'kind': 'youtube#searchResult',
   'etag': 'RMAyq97JwQAMlOqsySJrIR_VJOY',
   'id': {'kind': 'youtube#video', 'videoId': 'R4mebR724Pg'},
   'snippet': {'publishedAt': '2021-09-17T21:47:42Z',
    'channelId': 'UCsG5dkqFUHZO6eY9uOzQqow',
    'title': 'Mason Ho&#39;s 4 Favorite Electric Acid Boards REVEALED',
    'description': 'With ten Electric Acid Surfboards in his quiver, Mason has to decide which four boards he will hold onto.',
    'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/R4mebR724Pg/default.jpg',
      'width': 120,
      'height': 90},
     'medium': {'url': 'https://i.ytimg.com/vi/R4mebR724Pg/mqdefault.jpg',
      'width': 320,
      'height': 180},
     'high': {'url': 'https://i.ytimg.com/vi/R4mebR724Pg/hqdefault.jpg',
      'width': 480,
      'height': 360}},
    

In [10]:
print(f"We retrieved {len(response['items'])} videos")
response['items']

We retrieved 50 videos


[{'kind': 'youtube#searchResult',
  'etag': 'RMAyq97JwQAMlOqsySJrIR_VJOY',
  'id': {'kind': 'youtube#video', 'videoId': 'R4mebR724Pg'},
  'snippet': {'publishedAt': '2021-09-17T21:47:42Z',
   'channelId': 'UCsG5dkqFUHZO6eY9uOzQqow',
   'title': 'Mason Ho&#39;s 4 Favorite Electric Acid Boards REVEALED',
   'description': 'With ten Electric Acid Surfboards in his quiver, Mason has to decide which four boards he will hold onto.',
   'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/R4mebR724Pg/default.jpg',
     'width': 120,
     'height': 90},
    'medium': {'url': 'https://i.ytimg.com/vi/R4mebR724Pg/mqdefault.jpg',
     'width': 320,
     'height': 180},
    'high': {'url': 'https://i.ytimg.com/vi/R4mebR724Pg/hqdefault.jpg',
     'width': 480,
     'height': 360}},
   'channelTitle': 'Stab: We like to surf',
   'liveBroadcastContent': 'none',
   'publishTime': '2021-09-17T21:47:42Z'}},
 {'kind': 'youtube#searchResult',
  'etag': 'HFzuqyenyypw_V3aCqKB5gmMAtk',
  'id': {'kind': '

In [11]:
# To get info from a single video
response['items'][0]

{'kind': 'youtube#searchResult',
 'etag': 'RMAyq97JwQAMlOqsySJrIR_VJOY',
 'id': {'kind': 'youtube#video', 'videoId': 'R4mebR724Pg'},
 'snippet': {'publishedAt': '2021-09-17T21:47:42Z',
  'channelId': 'UCsG5dkqFUHZO6eY9uOzQqow',
  'title': 'Mason Ho&#39;s 4 Favorite Electric Acid Boards REVEALED',
  'description': 'With ten Electric Acid Surfboards in his quiver, Mason has to decide which four boards he will hold onto.',
  'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/R4mebR724Pg/default.jpg',
    'width': 120,
    'height': 90},
   'medium': {'url': 'https://i.ytimg.com/vi/R4mebR724Pg/mqdefault.jpg',
    'width': 320,
    'height': 180},
   'high': {'url': 'https://i.ytimg.com/vi/R4mebR724Pg/hqdefault.jpg',
    'width': 480,
    'height': 360}},
  'channelTitle': 'Stab: We like to surf',
  'liveBroadcastContent': 'none',
  'publishTime': '2021-09-17T21:47:42Z'}}

# Parsing through the videos and store specific information

In [12]:
video_id = response['items'][0]['id']['videoId']
video_title = str(response['items'][0]['snippet']['title']).replace('&', ' ')
# to grab just the timestamp we can split on the 'T'
upload_date = str(response['items'][0]['snippet']['publishedAt']).split('T')[0]
print("Video ID: {}\nVideo Title: {}\nUpload Date: {}".format(video_id,video_title,upload_date))


Video ID: R4mebR724Pg
Video Title: Mason Ho #39;s 4 Favorite Electric Acid Boards REVEALED
Upload Date: 2021-09-17


## Creating the Loop

In [13]:
# we just want to look at videos, so we add the following parameter:
# 'kind':'youtube#video'
for video in response['items']:
    if video['id']['kind'] == 'youtube#video':
        video_id = video['id']['videoId']
        video_title = str(video['snippet']['title']).replace('&', ' ')
        # to grab just the timestamp we can split on the 'T'
        upload_date = str(video['snippet']['publishedAt']).split('T')[0]

# Making a second API call using the retrieved ´video_id´
## Other Metrics we want to retrieve from the API
- Video ID
- Video Title
- Publish Data
## Specific video metrics:
- View count
- Link count
- Dislike Count
- Comment Count

In [14]:
new_search_params = "videos?key=" + API_KEY + "&id=" + video_id + "&part=statistics"
new_url = root_url + new_search_params

In [15]:
response_video_stats = requests.get(new_url).json()
response_video_stats

{'kind': 'youtube#videoListResponse',
 'etag': 'P95iJUX9-e5BhSQ-6Jnt2NkLEL4',
 'items': [{'kind': 'youtube#video',
   'etag': 'JR3LiFBLJPcJeYrmhWf0tRI3kD8',
   'id': '5hgVSjzvDBA',
   'statistics': {'viewCount': '9604',
    'likeCount': '92',
    'dislikeCount': '14',
    'favoriteCount': '0',
    'commentCount': '4'}}],
 'pageInfo': {'totalResults': 1, 'resultsPerPage': 1}}

In [16]:
view_count = response_video_stats['items'][0]['statistics']['viewCount']
like_count = response_video_stats['items'][0]['statistics']['likeCount']
dislike_count = response_video_stats['items'][0]['statistics']['dislikeCount']
comment_count = response_video_stats['items'][0]['statistics']['commentCount']
print("Video Count: {}\nLike Count: {}\nDislike Count: {}\nComment Count: {}".format(view_count,like_count,dislike_count,comment_count))



Video Count: 9604
Like Count: 92
Dislike Count: 14
Comment Count: 4


## Create the Full loop

In [84]:
df = pd.DataFrame()
for video in response['items']:
    if video['id']['kind'] == 'youtube#video':
        video_id = video['id']['videoId']
        video_title = str(video['snippet']['title']).replace('&', ' ')
        # to grab just the timestamp we can split on the 'T'
        upload_date = str(video['snippet']['publishedAt']).split('T')[0]

         #colleccting view, like, dislike, comment counts
        new_search_params = "videos?key=" + API_KEY + "&id=" + video_id + "&part=statistics"
        new_url = root_url + new_search_params

        response_video_stats = requests.get(new_url).json()

        view_count = response_video_stats['items'][0]['statistics']['viewCount']
        like_count = response_video_stats['items'][0]['statistics']['likeCount']
        dislike_count = response_video_stats['items'][0]['statistics']['dislikeCount']
        try: 
            comment_count = response_video_stats['items'][0]['statistics']['commentCount']
        except KeyError:
            comment_count = 0
        
        video_data = {'video_id': video_id, 'video_title': video_title, 'upload_date': upload_date,
                      'view_count':view_count, 'like_count':like_count, 'dislike_count':dislike_count,
                      'comment_count':comment_count}

        df = df.append(video_data, ignore_index=True)             

In [86]:
df.head()

Unnamed: 0,comment_count,dislike_count,like_count,upload_date,video_id,video_title,view_count
0,10,5,365,2021-09-17,R4mebR724Pg,Mason Ho #39;s 4 Favorite Electric Acid Boards...,19837
1,10,7,235,2021-09-16,RF_w8EQVFnY,Kolohe Andino vs Griffin Colapinto vs Ian Cran...,17249
2,81,11,1024,2021-09-13,LUdqckbhzwY,A 100% Authentic New Zealand Surf Film | #39;...,34806
3,37,22,751,2021-09-12,9rd31N1EOwQ,"Dane Reynolds, Mason Ho, And Mikey February Fa...",51184
4,104,6,476,2021-09-11,w6CTQfDjW9o,More Raw Footage Of The WSL Finalists Warming ...,39312


# Improving the Code

In [17]:
def get_video_details(video_id):
    new_search_params = "videos?key=" + API_KEY + "&id=" + video_id + "&part=statistics"
    new_url = root_url + new_search_params

    response = requests.get(new_url).json()

    view_count = response['items'][0]['statistics']['viewCount']
    like_count = response['items'][0]['statistics']['likeCount']
    dislike_count = response['items'][0]['statistics']['dislikeCount']
    try: 
        comment_count = response['items'][0]['statistics']['commentCount']
    except KeyError:
        comment_count = 0
    
    return view_count, like_count, dislike_count, comment_count

In [25]:
def get_videos():
    page_token = ""
    while 1:
        # Create URL
        url = "https://www.googleapis.com/youtube/v3/search?key="+API_KEY+"&channelId="+Channel_id+"&part=snippet,id&order=date&maxResults=10000&"+ page_token
        records = []

        response = requests.get(url).json()
        # Wait 1 second before starting the loop
        time.sleep(1)
        
        for video in response['items']:
            if video['id']['kind'] == 'youtube#video':
                video_id = video['id']['videoId']
                video_title = str(video['snippet']['title']).replace('&amp;', '')
                upload_date = str(video['snippet']['publishedAt']).split('T')[0]

                view_count, like_count, dislike_count, comment_count = get_video_details(video_id)
                
                record = (video_id, video_title, upload_date, view_count, like_count, dislike_count, comment_count)
                records.append(record)

        if os.path.exists('Stab_youtube_data.csv') == False:
            with open('Stab_youtube_data.csv', 'w', newline='', encoding='utf-8') as f:
                writer = csv.writer(f)
                writer.writerow(["Video_id","Video_title","Upload_date","View_count","Like_count","Dislike_count","Comment_count"])
                writer.writerows(records)
            f.close()
        else:
            with open('Stab_youtube_data.csv', 'a+', newline='', encoding='utf-8') as f:
                writer = csv.writer(f)
                writer.writerows(records)
            f.close()
        # Changing page
        try:
            if response['nextPageToken'] != None:
                page_token = "pageToken=" + response['nextPageToken']
        except:
            break

In [26]:
get_videos()

In [28]:
df = pd.read_csv('stab_youtube_data.csv')

In [29]:
df

Unnamed: 0,Video_id,Video_title,Upload_date,View_count,Like_count,Dislike_count,Comment_count
0,R4mebR724Pg,Mason Ho&#39;s 4 Favorite Electric Acid Boards...,2021-09-17,19986,365,5,10
1,RF_w8EQVFnY,Kolohe Andino vs Griffin Colapinto vs Ian Cran...,2021-09-16,17373,235,7,10
2,LUdqckbhzwY,A 100% Authentic New Zealand Surf Film | &#39;...,2021-09-13,34858,1026,11,81
3,9rd31N1EOwQ,"Dane Reynolds, Mason Ho, And Mikey February Fa...",2021-09-12,51184,751,22,37
4,w6CTQfDjW9o,More Raw Footage Of The WSL Finalists Warming ...,2021-09-11,39323,476,6,104
...,...,...,...,...,...,...,...
495,X9tU8ybzcFs,The Dock,2017-07-15,9653162,119128,2650,2805
496,ryCwJXsOVp0,All You Need To See From The Last Few Days On ...,2017-06-26,120829,791,20,56
497,bnVkEqCYD94,"Drum Circles, Claims and Matt Wilkinson&#39;s ...",2017-05-02,19137,113,2,4
498,WD4ol1BRRWE,The 2017 Rip Curl Pro Warm Ups,2017-04-27,4219,35,1,1
