Skip to content

jim-schwoebel/youtube_scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

youtube_scrape

This is a library for building playlists and scraping youtube videos.

All you need to do is insert the youtube playlist name and url and it will download the playlist to a folder. All videos are converted to .mp4 format for further processing.

These scripts are good for human-in-the-loop labeling videos on YouTube (angry, happy, sad, etc.) to be further processed by machine learning models.

Install dependencies

If on a linux computer:

sudo apt-get install youtube-dl

Making playlists

To begin making a playlist:

cd ~
cd youtube_scrape 
python3 make_playlist.py
what is the playlist id or URL?
...

Also, you can stop building your playlist and have it written to json by typing in nothing ('') or 'n'. The make_playlist.py script then makes a playlist from all the playlist ids or URLs that you have entered.

What is the playlist ID?

Note that playlist IDs are readily accessible on YouTube as the id part of the URL. For example, https://www.youtube.com/watch?v=xPU8OAjjS4k&list=PLpoUYdDxb6P56t8lnxnA412k_H5EMHd-8 --> Playlist id is PLpoUYdDxb6P56t8lnxnA412k_H5EMHd-8.

You don't need to necessarily put in the playlist ID for this script to work; you can also put the full playlist URL (e.g. https://www.youtube.com/playlist?list=PL1v-PVIZFDsqbzPIsEPZPnvcgIQ8bNTKS).

Also, only the first 100 in each playlist will be added to the master playlist. Don't worry about duplicate video links in similar playlists (e.g. cnn videos); the script takes care of this by making sure no duplicate links go into the playlist.

Downloading playlists generated

Once you make a playlist, you can easily download it by:

cd ~ 
cd youtube_scrape
python3 download_playlist.py 

You can then state the playlist in the /playlists folder that you'd like to download as either the name or .JSON file (e.g. yc_podcast or yc_podcast.json will work).

The script will then download the playlist and format it according to the style needed to train machine learning models.

Feedback

Any feedback on this repository is greatly appreciated.

  • If you find something that is missing or doesn't work, please consider opening a GitHub issue.
  • If you want to talk to me directly, please send me an email @ js@neurolex.co.

License

This repository is licensed under the Apache 2.0 License.

References

About

📹 Library for making playlists and scraping youtube videos - alternative to pafy, pytube, and youtube-dl.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages