# Fetching Data with an API and Preparing the Data
If you want to type along with me, use [this notebook](https://humboldt.cloudbank.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fbethanyj0%2Fdata271_sp25&branch=main&urlpath=tree%2Fdata271_sp25%2Flectures%2Fdata271_lec25_live.ipynb) instead. 
If you don't want to type and want to follow along just by executing the cells, stay in this notebook. 

In [None]:
import numpy as np
import pandas as pd

## Requesting data from Google's Youtube API

First we have to create credentials. Go to https://console.cloud.google.com/. Sign in with your Google account if you haven't already. Click on "Create Project" and name a new project whatever you want. When it asks you to enable API's look through the API library, and select the Youtube Data API. Once you've enabled the API, you should be able to access your API Key. 

In [None]:
# Paste your youtube API here
api_key = 

We will just request data from Cal Poly Humboldt's Youtube Channel ([HumboldtOnline](https://www.youtube.com/@CalPolyHumboldt)).

In [None]:
# whenever we want to request data with an API
import requests

Read the Youtube API documentation [here](https://developers.google.com/youtube/v3/docs). We first want to get data about the Youtube channel. Navigate to the Channel endpoint, and read the documentation.

In [None]:
# the url for making the request 
url = "https://www.googleapis.com/youtube/v3/channels?key="+api_key+"&part=snippet&forHandle=CalPolyHumboldt"

In [None]:
# Request the data with a "GET" request
response = requests.get(url)
response

In [None]:
# check out response
type(response)

In [None]:
# learn more about response
help(response)

In [None]:
# use the json() method to access the data
response.json()

In [None]:
# Save the response data
payload = response.json()

In [None]:
# inpsect the payload
payload.keys()

In [None]:
# Inspect the data
payload['items'][0]['id']

In [None]:
# Save the channel id
channel_id = payload['items'][0]['id']

In [None]:
payload['items'][0]['snippet']

In [None]:
# Use this information to get video information (other notation)
search_url = 'https://www.googleapis.com/youtube/v3/search'
parameters = {'key':api_key,
         'part':'snippet',
         'channelId':channel_id,
         'order':'date',
          'maxResults':'50'}

search_response = requests.get(search_url, params = parameters)

In [None]:
# check to make sure it was a successful request
search_response.status_code

In [None]:
payload = search_response.json()

In [None]:
payload.keys()

## Parsing/Preparing the Data

In [None]:
# Put the data in a pandas dataframe
payload_df = pd.DataFrame(payload['items'])
payload_df.head()

In [None]:
payload_normalized_df = pd.json_normalize(payload['items'])
payload_normalized_df.head()

In [None]:
payload_normalized_df.drop(columns = 'kind',inplace=True)

In [None]:
# inspect the id data
payload_normalized_df.columns = ['_'.join(i.split('.')[-2:]) if 'snippet.thumbnails' in i 
                                 else i.split('.')[-1] for i in payload_normalized_df.columns]
payload_normalized_df.head()

In [None]:
# Video title
clean_df = payload_normalized_df.copy()
clean_df.head()

## Enhancing the data with video-specific info

In [None]:
# Test getting data for a specific video
video_id = "Wkbj2V8CQTw"
video_url = "https://www.googleapis.com/youtube/v3/videos"
video_params = {'key':api_key,
               'part':'statistics',
               'id':video_id}

In [None]:
response_video_stats_test = requests.get(video_url,params = video_params)

In [None]:
response_video_stats_test.json()

Get data for multiple videos

In [None]:
# get data for multiple videos
ids = ','.join(clean_df.videoId)
ids

In [None]:
# Create parameters for more video requests
more_video_params = {'key':api_key,
               'part':'statistics',
               'id':ids}

In [None]:
# Request the data
response_stats = requests.get(video_url, params = more_video_params).json()

In [None]:
# Inspect the result
response_stats.keys()

In [None]:
# Access the statistics
response_stats['items'][0]['statistics']

In [None]:
# Add to the dataframe
clean_df['viewCount'] = [i['statistics']['viewCount'] for i in response_stats['items']]
clean_df.head()

In [None]:
# Add to the dataframe
clean_df['likeCount'] = [i['statistics']['likeCount'] for i in response_stats['items']]
clean_df.head()

## Activity

**Activity 1:** Explore other endpoints or parts of the Youtube API to get more information about Cal Poly Humboldt's channel or a specific video.

**Activity 2:** With a partner, choose a Youtube channel of your choice, and use the `requests` module to fetch basic video data from the YouTube API (e.g. videoId, publishedAt, title).

**Activity 3:** Put the response object in a Pandas DataFrame and use it to create two new columns `date` and `time` to show the date the video was published, and the time the video was published. 