# Ingesting each Video in Playlist

Want likes, viewcounts, etc. for each video in the playlist.

## Setting up the directories

Access modules for indistinguishable processing.

In [1]:
from pathlib import Path
import sys

parent_path = Path().absolute().parent # not the same as parent and then absolute. `.` directory does not have a parent
sys.path.append(str(parent_path))
sys.path

['/home/phood/Documents/GitHub/Reviewing-Game-Reviews/Ingestion/video',
 '/home/phood/Documents/Anaconda/Installation/anaconda3/envs/slit/lib/python311.zip',
 '/home/phood/Documents/Anaconda/Installation/anaconda3/envs/slit/lib/python3.11',
 '/home/phood/Documents/Anaconda/Installation/anaconda3/envs/slit/lib/python3.11/lib-dynload',
 '',
 '/home/phood/Documents/Anaconda/Installation/anaconda3/envs/slit/lib/python3.11/site-packages',
 '/home/phood/Documents/GitHub/Reviewing-Game-Reviews/Ingestion']

## Requesting videos

The YouTube API allows the user to pass in `videoId`s to get data for each video with that `videoId`.

### Getting the `videoId`s

`videoId`s can be found in the `snippet` of each `item` from the ingested playlist.

Goal: get `videoIds`

In [6]:
ingested_playlist_path = Path().absolute().parent.joinpath("playlist/results")
ingested_playlist_path

PosixPath('/home/phood/Documents/GitHub/Reviewing-Game-Reviews/Ingestion/playlist/results')

In [19]:
pd.DataFrame(result['items'])['snippet']

0     {'publishedAt': '2017-03-20T16:41:48Z', 'chann...
1     {'publishedAt': '2017-03-18T01:00:10Z', 'chann...
2     {'publishedAt': '2017-03-17T18:00:21Z', 'chann...
3     {'publishedAt': '2017-03-17T00:30:21Z', 'chann...
4     {'publishedAt': '2017-03-15T02:15:10Z', 'chann...
5     {'publishedAt': '2017-03-09T02:30:19Z', 'chann...
6     {'publishedAt': '2017-03-08T01:21:25Z', 'chann...
7     {'publishedAt': '2017-03-07T18:00:14Z', 'chann...
8     {'publishedAt': '2017-03-07T13:00:10Z', 'chann...
9     {'publishedAt': '2017-03-07T08:01:34Z', 'chann...
10    {'publishedAt': '2017-03-07T02:00:14Z', 'chann...
11    {'publishedAt': '2017-03-06T14:00:13Z', 'chann...
12    {'publishedAt': '2017-03-02T12:00:25Z', 'chann...
13    {'publishedAt': '2017-03-02T11:00:21Z', 'chann...
14    {'publishedAt': '2017-03-02T09:00:24Z', 'chann...
15    {'publishedAt': '2017-03-01T14:09:00Z', 'chann...
16    {'publishedAt': '2017-02-28T21:24:19Z', 'chann...
17    {'publishedAt': '2017-02-22T02:08:05Z', 'c

In [21]:
import pandas as pd
import json

def get_videoId(playlistItems_list_result):
    '''
    Returns the videoId of a playlistItems_result, the response one gets from a successful youtube.playlistItems().list call.
    '''
    items = playlistItems_list_result['items']
    videoIds = pd.DataFrame(items)['snippet'].apply(lambda d: d['resourceId']['videoId'])
    return list(videoIds)

videoIds = []
for j in ingested_playlist_path.glob('*.json'):
    with open(j, 'r') as fp:
        result = json.load(fp)
        videoIds.extend(get_videoId(result))
        
len(videoIds)

2178

### Calling the API

Goal: a bunch of JSON files in the results folder from calling `youtube.videos().list`

In [25]:
import api_key

def download_videos():
    youtube = api_key.get_youtube()

    video_count = 0
    request_number = 0
    for start_index in range(0, len(videoIds), 50):
        videoIds_to_process = list(videoIds)[start_index:(start_index + 50)]

        request = youtube.videos().list(
            part="contentDetails,id,liveStreamingDetails,localizations,player,recordingDetails,snippet,statistics,status,topicDetails",
            id=','.join(videoIds_to_process),
            maxResults=50
        )
        response = request.execute()

        with open(f"results/{request_number}.json", "w") as fp:
            json.dump(response, fp)

        request_number += 1
        video_count += len(videoIds_to_process)

        print(f"{round(video_count / len(videoIds) * 100, 2)}%", end="\r")
        
# wrap inside function to avoid accidentally calling it
# download_videos()

100.0%