# How to use the Youtube Data API: YoutubeDataApi
[Youtube Package Documentation](https://youtube-data-api.readthedocs.io/en/latest/index.html) <br>

[Here is a placeholder to links where this stuff will live]()<br>

Authors: Megan Brown & Leon Yin <br>
Presented on: 2018/10/09
<hr>

## Agenda
Today we will discuss:

1. A brief overview of data available in the YouTube Data API
2. How to install the package
3. How to create an API key
4. A brief look at how to use the package
5. A cool example that I have yet to come up with

Necessary packages for this tutorial are in `requirements.txt`
```
pip install -r requirements.txt
```


# FAQs

### So what kind of data can you get?

* Short answer: a lot

* Comprehensive answer: [here](https://developers.google.com/youtube/v3/docs/)

* What is included in the package:
    * video metadata: [api ref](https://developers.google.com/youtube/v3/docs/videos/list)
    * channel metadata: [api ref](https://developers.google.com/youtube/v3/docs/channels/list)
    * playlist metadata: [api ref](https://developers.google.com/youtube/v3/docs/playlistItems/list)
    * subscription metadata: [api ref](https://developers.google.com/youtube/v3/docs/subscriptions/list)
    * featured channel metadata: [api ref](https://developers.google.com/youtube/v3/docs/channels/list) 
    * comment metadata: [api ref](https://developers.google.com/youtube/v3/docs/commentThreads/list)
    * captions metadata: [api ref](https://python-pytube.readthedocs.io/en/latest/)
    * search results: [api ref](https://developers.google.com/youtube/v3/docs/search/list)
    * recommended video results: [api ref](https://developers.google.com/youtube/v3/docs/search/list)

### What is the difference between a user and a channel?
* Essentially: how YouTube stores the data internally.<br>
* A user is the name that a content creator registers (ex: **LastWeekTonight**). You cannot use this value to get more information from a user.<br>
* The channel id is the internal ID for a given user (ex: **UC3XTzVzaHQEd30rQbuvCtTQ**). You can use this value to get more data about a channel. <br>

###### But fear not, there is a solution!
Use `yt.get_channel_id_from_user(username)` to get the channel id for a given user.

### What is the difference between a featured channel and a subscription?
* A subscription is a channel that a user opts into getting updates for
* A featured channel is a feature a channel can use to direct their viewers towards other channels

## How to Install

   The software is on PyPI, so you can download it via `pip`
   
   
   `pip install youtube-data-api`

## How to get an API key

### A quick guide: https://developers.google.com/youtube/v3/getting-started

1. You need a Google Account to access the Google API Console, request an API key, and register your application. You already have this as an NYU student/affiliate.

2. Create a project in the <a href="https://console.developers.google.com/apis/">Google Developers Console</a> and <a href="https://developers.google.com/youtube/registering_an_application">obtain authorization credentials</a> so your application can submit API requests.

3. After creating your project, make sure the YouTube Data API is one of the services that your application is registered to use.

    a. Go to the <a href="https://console.developers.google.com/apis/">API Console</a> and select the project that you just registered.

    b. Visit the <a href="https://console.developers.google.com/apis/enabled">Enabled APIs page</a>. In the list of APIs, make sure the status is ON for the YouTube Data API v3. You do not need to enable OAuth 2.0 since ther are no methods in the package that require it.

## A brief overview of how to use the package

In [42]:
import os
import datetime
import pandas as pd

In [31]:
from youtube_api import YoutubeDataApi
from youtube_api.youtube_api_utils import *

yt = YoutubeDataApi(os.environ.get('YT_KEY'))

### Starting with a channel name and getting some basic metadata

In [29]:
channel_id = yt.get_channel_id_from_user('LastWeekTonight')
print(channel_id)

UC3XTzVzaHQEd30rQbuvCtTQ


You can get more information from this `channel_id`

In [35]:
yt.get_channel_metadata(channel_id)

OrderedDict([('channel_id', 'UC3XTzVzaHQEd30rQbuvCtTQ'),
             ('title', 'LastWeekTonight'),
             ('account_creation_date',
              datetime.datetime(2014, 3, 18, 17, 41, 39)),
             ('keywords', None),
             ('description',
              'Breaking news on a weekly basis. Sundays at 11PM - only on HBO.\nSubscribe to the Last Week Tonight channel for the latest videos from John Oliver and the LWT team.'),
             ('view_count', '1712963061'),
             ('video_count', '252'),
             ('subscription_count', '6471835'),
             ('playlist_id_likes', 'LL3XTzVzaHQEd30rQbuvCtTQ'),
             ('playlist_id_uploads', 'UU3XTzVzaHQEd30rQbuvCtTQ'),
             ('topic_ids',
              'https://en.wikipedia.org/wiki/Entertainment|https://en.wikipedia.org/wiki/Television_program|https://en.wikipedia.org/wiki/Humour'),
             ('country', None),
             ('collection_date',
              datetime.datetime(2018, 10, 8, 11, 51, 41, 17

The default paerser returns the items in the JSON as an `OrderedDict`. Passing `parser = None` returns the raw JSON.

In [49]:
yt.get_channel_metadata(channel_id, parser=None)

{'kind': 'youtube#channel',
 'etag': '"XI7nbFXulYBIpL0ayR_gDh3eu1k/SXEcsdWHVw8gnl25pvHqBtGzoHo"',
 'id': 'UC3XTzVzaHQEd30rQbuvCtTQ',
 'snippet': {'title': 'LastWeekTonight',
  'description': 'Breaking news on a weekly basis. Sundays at 11PM - only on HBO.\nSubscribe to the Last Week Tonight channel for the latest videos from John Oliver and the LWT team.',
  'customUrl': 'LastWeekTonight',
  'publishedAt': '2014-03-18T17:41:39.000Z',
  'thumbnails': {'default': {'url': 'https://yt3.ggpht.com/a-/AN66SAxIEUI6f-101_t2Dy8703mNjD8eikQOVffxBw=s88-mo-c-c0xffffffff-rj-k-no',
    'width': 88,
    'height': 88},
   'medium': {'url': 'https://yt3.ggpht.com/a-/AN66SAxIEUI6f-101_t2Dy8703mNjD8eikQOVffxBw=s240-mo-c-c0xffffffff-rj-k-no',
    'width': 240,
    'height': 240},
   'high': {'url': 'https://yt3.ggpht.com/a-/AN66SAxIEUI6f-101_t2Dy8703mNjD8eikQOVffxBw=s800-mo-c-c0xffffffff-rj-k-no',
    'width': 800,
    'height': 800}},
  'localized': {'title': 'LastWeekTonight',
   'description': 'Breaking

In [39]:
yt.get_subscriptions(channel_id)[:2]

[OrderedDict([('subscription_title', 'HBOBoxing'),
              ('subscription_channel_id', 'UCWPQB43yGKEum3eW0P9N_nQ'),
              ('subscription_kind', 'youtube#channel'),
              ('subscription_publish_date',
               datetime.datetime(2014, 3, 20, 19, 5, 54)),
              ('collection_date',
               datetime.datetime(2018, 10, 8, 11, 54, 45, 192640))]),
 OrderedDict([('subscription_title', 'Real Time with Bill Maher'),
              ('subscription_channel_id', 'UCy6kyFxaMqGtpE3pQTflK8A'),
              ('subscription_kind', 'youtube#channel'),
              ('subscription_publish_date',
               datetime.datetime(2014, 12, 11, 18, 55, 41)),
              ('collection_date',
               datetime.datetime(2018, 10, 8, 11, 54, 45, 192640))])]

You can convert the `channel_id` into a playlist id to get all the videos ever posted by a channel using a function from the `youtube_api_utils` in the package.

In [50]:
from youtube_api.youtube_api_utils import *

playlist_id = get_upload_playlist_id(channel_id)
print(uploads_playlist)

UU3XTzVzaHQEd30rQbuvCtTQ


You can now get the videos from this `playlist_id`

In [37]:
videos = yt.get_videos_from_playlist_id(playlist_id)
videos[:5]

[OrderedDict([('publish_date', datetime.datetime(2018, 10, 8, 6, 30)),
              ('video_id', 'FsZ3p9gOkpY'),
              ('channel_id', 'UC3XTzVzaHQEd30rQbuvCtTQ'),
              ('collection_date',
               datetime.datetime(2018, 10, 8, 11, 53, 51, 246426))]),
 OrderedDict([('publish_date', datetime.datetime(2018, 10, 1, 6, 30, 1)),
              ('video_id', 'opi8X9hQ7q8'),
              ('channel_id', 'UC3XTzVzaHQEd30rQbuvCtTQ'),
              ('collection_date',
               datetime.datetime(2018, 10, 8, 11, 53, 51, 246426))]),
 OrderedDict([('publish_date', datetime.datetime(2018, 9, 24, 6, 30)),
              ('video_id', 'OjPYmEZxACM'),
              ('channel_id', 'UC3XTzVzaHQEd30rQbuvCtTQ'),
              ('collection_date',
               datetime.datetime(2018, 10, 8, 11, 53, 51, 246426))]),
 OrderedDict([('publish_date', datetime.datetime(2018, 9, 10, 6, 30, 2)),
              ('video_id', 'NpPyLcQ2vdI'),
              ('channel_id', 'UC3XTzVzaHQEd30rQbuvCt

In [48]:
df = pd.DataFrame(videos)
df.head(2)

Unnamed: 0,publish_date,video_id,channel_id,collection_date
0,2018-10-08 06:30:00,FsZ3p9gOkpY,UC3XTzVzaHQEd30rQbuvCtTQ,2018-10-08 11:53:51.246426
1,2018-10-01 06:30:01,opi8X9hQ7q8,UC3XTzVzaHQEd30rQbuvCtTQ,2018-10-08 11:53:51.246426


From here we can get the full video metadata from the videos

In [47]:
video_meta = yt.get_video_metadata(df.video_id.tolist()[:5])
video_meta[:2]

[OrderedDict([('video_id', 'FsZ3p9gOkpY'),
              ('channel_title', 'LastWeekTonight'),
              ('channel_id', 'UC3XTzVzaHQEd30rQbuvCtTQ'),
              ('video_publish_date', datetime.datetime(2018, 10, 8, 6, 30)),
              ('video_title',
               'Brazilian Elections: Last Week Tonight with John Oliver (HBO)'),
              ('video_description',
               'Brazil is about to elect a new president during a turbulent period of political corruption and economic uncertainty. John Oliver urges the people of Brazil not to figuratively fingerbang their democracy.\n\nConnect with Last Week Tonight online...\n \nSubscribe to the Last Week Tonight YouTube channel for more almost news as it almost happens: www.youtube.com/user/LastWeekTonight\n \nFind Last Week Tonight on Facebook like your mom would: http://Facebook.com/LastWeekTonight\n \nFollow us on Twitter for news about jokes and jokes about news: http://Twitter.com/LastWeekTonight\n \nVisit our official si