<a href="https://colab.research.google.com/github/michalszczecinski/data-driven-notebooks/blob/master/subjects/tools/tools_get_list_of_videos_from_youtube_playlist.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Extracting metadata for all videos in a youtube playlist
Simple code that can be used to get a full list of videos from the youtube playlist. Such output can then be used for example as a checklist for tracking of the learning progress.

**Input**
* youtube playlist url

**Output**
* dataframe with list of videos: | title | description | length | url |

**Extensions**

Pytube package supports downloading of youtube videos, so if needed this notebook could be extended to download all videos from the playlist.

In [5]:
# uncomment to install library for interacting with youtube
# https://pytube.io/en/latest/user/quickstart.html
# !pip install pytube

In [15]:
import numpy as np
import pandas as pd
from IPython.display import display, Markdown, HTML, Image
from pytube import Playlist

## Playlist details

In [45]:
# configuration
# set the url of the channel you want to download videos from

# playlist for neural networks 3blue1brown
url = 'https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi'
p = Playlist(url)

In [46]:
# this section is optional, you can comment this out

# at this moment getting description for playlist in pytube does not work
# it is likely because the metadata structure of objects in sidebar can change
# so this code fixes it by using the current schema and extends by getting url of image

try:
    # get basic information about playlist
    title = p.title
    print(f'title: {title}')
    print('\n')

    # get additional information about playlist
    sidebar_info = p.sidebar_info[0]

    # display description
    description = sidebar_info['playlistSidebarPrimaryInfoRenderer']['description']['runs'][0]['text']
    print(f'description: {description}')

    # display image
    thumbnail = sidebar_info['playlistSidebarPrimaryInfoRenderer']['thumbnailRenderer']['playlistVideoThumbnailRenderer']['thumbnail']['thumbnails'][0]['url']
    image_url = thumbnail.split('jpg')[0] +'jpg'
    Image(image_url)

except Exception as e:
    print("An error occurred:", e)

title: Neural networks


An error occurred: 'runs'


## Getting details of all videos in playlist

In [53]:
# download all urls for the playlist
urls = p.video_urls
total_videos = len(urls)

# download title, description and length
l = []
for i,video in enumerate(p.videos):
  # for description I just need the first line, that might differ for other playlists
  try:
    video_id, title, description, length = video.video_id, video.title, video.description.split('\n')[0], video.length
    print(f'extracted video {i+1} of {total_videos}, {video.video_id}')
  except Exception as e:
    # print the original error message
    print("An error occurred:", e)
    print(f'{i+1} of {total_videos}, {video.video_id,}')
    video_id, title, description, length = video.video_id, None, None, 0
  l.append([video_id, title, description, length])
print('done')

extracted video 1 of 4, aircAruvnKk
extracted video 2 of 4, IHZwWFHWa-w
extracted video 3 of 4, Ilg3gGewQ5U
extracted video 4 of 4, tIeHLnjs5U8
done


In [54]:
# creating dataframe
cols = ['video_id','title', 'description', 'length']
df = pd.DataFrame(data = l, columns = cols)
df['url'] = urls
# round up to minutes 
df['length_minutes'] = np.ceil(df['length']/60).astype(int)
df

Unnamed: 0,video_id,title,description,length,url,length_minutes
0,aircAruvnKk,"But what is a neural network? | Chapter 1, Dee...","What are the neurons, why are there layers, an...",1153,https://www.youtube.com/watch?v=aircAruvnKk,20
1,IHZwWFHWa-w,"Gradient descent, how neural networks learn | ...",Enjoy these videos? Consider sharing one or two.,1261,https://www.youtube.com/watch?v=IHZwWFHWa-w,22
2,Ilg3gGewQ5U,What is backpropagation really doing? | Chapte...,What's actually happening to a neural network ...,834,https://www.youtube.com/watch?v=Ilg3gGewQ5U,14
3,tIeHLnjs5U8,"Backpropagation calculus | Chapter 4, Deep lea...",Help fund future projects: https://www.patreon...,617,https://www.youtube.com/watch?v=tIeHLnjs5U8,11


## Get high-level stats of the playlist

In [56]:
# get playlist length including number of videos and duration stats
df['length_minutes'].agg(['count','sum','median','mean'])

count      4.00
sum       67.00
median    17.00
mean      16.75
Name: length_minutes, dtype: float64

### Export formatted links

In [63]:
# output list of titles as hyperlinks for pasting into checklist (google doc)
hyperlink_format = '<a href="{link}">{text}</a>'

titles = df['title'].values
links = df['url'].values

for i,a in enumerate(zip(titles, links)):
  display(HTML(hyperlink_format.format(link=a[1], text=f'{i+1}:{a[0]}')))

## References

1) https://pytube.io/en/latest/api.html#pytube.YouTube

2) https://stackoverflow.com/questions/54710982/using-pytube-to-download-playlist-from-youtube