# Fetching Data from an API and Preparing the Data
If you want to type along with me, use [this notebook](https://humboldt.cloudbank.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fbethanyj0%2Fdata271_sp24&branch=main&urlpath=tree%2Fdata271_sp24%2Fdemos%2Fdata271_demo26_live.ipynb) instead. 
If you don't want to type and want to follow along just by executing the cells, stay in this notebook. 

In [None]:
import numpy as np
import pandas as pd

## Requesting data from Google's Youtube API

First we have to create credentials. Go to https://console.cloud.google.com/. Sign in with your Google account if you haven't already. Click on "Create Project" and name a new project whatever you want. When it asks you to enable API's look through the API library, and select the Youtube Data API. Once you've enabled the API, you should be able to access your API Key. 

In [None]:
# Paste your youtube API here
api_key = 

We will just request data from Cal Poly Humboldt's Youtube Channel (@HumboldtOnline). For this, we will need the channel's channel ID. Go to Youtube and find one of Cal Poly Humboldt's youtube videos. Right-click on your browser and click View Page Source. Search (Ctrl-F) for https://www.youtube.com/channel/ in the page source. The channel ID will appear directly after the /channel/ text in the URL path.

In [None]:
# Paste the channel ID here
channel_id = "UCg7Fdhrmwi8ZqakqiO3xPkg"

In [None]:
# whenever we want to request data from an API
import requests

Read the Youtube API documentation [here](https://developers.google.com/youtube/v3/docs). 

In [None]:
# The url to request from
url = "https://www.googleapis.com/youtube/v3/search?key="+api_key+"&channelId="+channel_id+"&part=snippet,id&order=date&maxResults=10000"

In [None]:
# Request the data with a "GET" request
response = requests.get(url)
response

In [None]:
# check out response
type(response)

In [None]:
# learn more about response
help(response)

In [None]:
# use the json() method to access the data
response.json()

In [None]:
# Save the response data
payload = response.json()

In [None]:
# inpsect the payload
payload.keys()

In [None]:
# Inspect the data
payload['items'][0]['snippet']

## Parsing/Preparing the Data

In [None]:
# Put the data in a pandas dataframe
payload_df = pd.DataFrame(payload['items'])
payload_df.head()

In [None]:
# inspect the id data
payload_df.id[0]

In [None]:
# inspect the snippet data
payload_df.snippet[0]

In [None]:
# Create a new dataframe to store the clean data
clean_df = pd.DataFrame()

In [None]:
# add video ID
clean_df['videoID'] = [i['videoId'] for i in payload_df.id]
clean_df.head()

In [None]:
# add published date/time
clean_df['publishedAt'] = [i['publishedAt'] for i in payload_df.snippet]
clean_df.head()

In [None]:
# Video title
clean_df['title'] = [i['title'] for i in payload_df.snippet]
clean_df.head()

## Enhancing the data with video-specific info

In [None]:
# Test getting data for a specific video
video_id = "GpOplrOC7X0"
video_stats_test = "https://www.googleapis.com/youtube/v3/videos?id="+video_id+"&part=statistics&key="+api_key

In [None]:
response_video_stats_test = requests.get(video_stats_test)

In [None]:
response_video_stats_test.json()

Get data for multiple videos

In [None]:
# get data for multiple videos
ids = ','.join(clean_df.videoID)
ids

In [None]:
# Create url for the video statistics
url_for_stats = "https://www.googleapis.com/youtube/v3/videos?id="+ids+"&part=statistics&key="+api_key

In [None]:
# Request the data
response_stats = requests.get(url_for_stats).json()

In [None]:
# Inspect the result
response_stats.keys()

In [None]:
# Access the statistics
response_stats['items'][0]['statistics']

In [None]:
# Add to the dataframe
clean_df['viewCount'] = [i['statistics']['viewCount'] for i in response_stats['items']]
clean_df.head()

In [None]:
# Add to the dataframe
clean_df['likeCount'] = [i['statistics']['likeCount'] for i in response_stats['items']]
clean_df.head()

## Activity

**Activity 1:** With a partner, choose a Youtube channel of your choice, and use the `requests` module to fetch basic video data from the YouTube API (e.g. videoId, publishedAt, title).

**Activity 2:** Put the response object in a Pandas DataFrame and use it to create two new columns `date` and `time` to show the date the video was published, and the time the video was published. 