Scraping Youtube comments with Selenium

This repository holds a basic script to scrape comments from a youtube video page.

Selenium is a convenient way to simulate an end user and thereby get access to dynamically generated web content that is difficult to scrape with tools that rely on static pages (e.g. BeautifulSoup).

Running the script saves all comment text to a JSON file for further processing.

TASKS

find the title of the video (to later map comments to in a JSON structure)
replace the title's punctuation chars and spaces with underscores (for file title)
combine both scrolling approaches to always scroll to the bottom without needing to pass an arbitrary magic number to the code
grab the complete comment element xpath('//*[@id="body"]')
- contains text plus additionally author, likes etc.
add capability to run the script for multiple URLs
transform the output into a JSON dict that maps file names to comment lists
append to output file instead of overwriting, in order to create a youtube comment corpus

NOTES

implementing posting to youtube is still an unsolved challenge
posting to twitter works using Selenium - so there's no absoulte need for that API key

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
chromedriver		chromedriver
collect_comments.py		collect_comments.py
post_comments.py		post_comments.py
requirements.txt		requirements.txt
simple_comment.py		simple_comment.py
yt_comments.json		yt_comments.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping Youtube comments with Selenium

TASKS

NOTES

About

Releases

Packages

Contributors 2

Languages

martin-martin/scrape-youtube-comments

Folders and files

Latest commit

History

Repository files navigation

Scraping Youtube comments with Selenium

TASKS

NOTES

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages