Skip to content

Automated the web scraping of Youtube videos from top channels

Notifications You must be signed in to change notification settings

j-hendricks/Webscraping-Youtube

Repository files navigation

Webscraping-Youtube

Overview

Using Beatiful Soup and Splinter, the titles of Youtube videos were scraped, along with the number of views and time of posting. The videos chosen were the most recent videos posted by the top 50 most subscribed YouTube channels. The goal was to determine the most frequently used words used by the most popular channels. A word cloud was constructed to visualize this distribution.

Images

The following image shows the table from the wikipedia website. This table contains the names of the top 50 subscribed channels along with a link to their Youtube homepage.

wiki_table

The two images below are examples of the webpages the program visits.

image1

image2

Results

Once the video titles were scraped, a wordcloud was constructed in Python. Here is an example created on September 14th, 2022. The most common words were "short" and "song", along with phrases such as "nursery rhymes".

wordcloud

Scatter plots and regression lines were plotted for views against 1) number of characters in title and 2) number of days since posted. Both linear regressions are statistically significant, but the effect sizes, measured by r-squared, are relatively small. The p-value for views vs number of characters in title is less than 0.05, but the r-squared value is 0.037. This confirms that vews and length of title is negatively correlated, but not by a large amount.

The p-value for views vs number of days is less than 0.05, but the r-squared values is 0.08. This low effect size suggests that most of the views are obtained in the first few days and then cease to increase afterwards.

views_chars_scatter

views_days_scatter

On average, Blackpink and T-Series videos were the most viral, when measured in views per day.

bar

Discussion

This data is scraped from the web in real-time, meaning a new word cloud can be generated to visualize the most current words populating YouTube video titles.

As with most webscraping projects, the program may fail in the future if either Wikipedia or YouTube change the html formats of their website. Nonetheless, the project was successful and provided insight into how Youtube channels market their content.

About

Automated the web scraping of Youtube videos from top channels

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published