Webscraping-Youtube

Overview

Using Beatiful Soup and Splinter, the titles of Youtube videos were scraped, along with the number of views and time of posting. The videos chosen were the most recent videos posted by the top 50 most subscribed YouTube channels. The goal was to determine the most frequently used words used by the most popular channels. A word cloud was constructed to visualize this distribution.

Images

The following image shows the table from the wikipedia website. This table contains the names of the top 50 subscribed channels along with a link to their Youtube homepage.

The two images below are examples of the webpages the program visits.

Results

Once the video titles were scraped, a wordcloud was constructed in Python. Here is an example created on September 14th, 2022. The most common words were "short" and "song", along with phrases such as "nursery rhymes".

Scatter plots and regression lines were plotted for views against 1) number of characters in title and 2) number of days since posted. Both linear regressions are statistically significant, but the effect sizes, measured by r-squared, are relatively small. The p-value for views vs number of characters in title is less than 0.05, but the r-squared value is 0.037. This confirms that vews and length of title is negatively correlated, but not by a large amount.

The p-value for views vs number of days is less than 0.05, but the r-squared values is 0.08. This low effect size suggests that most of the views are obtained in the first few days and then cease to increase afterwards.

On average, Blackpink and T-Series videos were the most viral, when measured in views per day.

Discussion

This data is scraped from the web in real-time, meaning a new word cloud can be generated to visualize the most current words populating YouTube video titles.

As with most webscraping projects, the program may fail in the future if either Wikipedia or YouTube change the html formats of their website. Nonetheless, the project was successful and provided insight into how Youtube channels market their content.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.ipynb_checkpoints		.ipynb_checkpoints
images		images
tutorial		tutorial
README.md		README.md
data_viz.ipynb		data_viz.ipynb
video_info_2022_8_30.csv		video_info_2022_8_30.csv
video_info_2022_9_14.csv		video_info_2022_9_14.csv
video_info_2022_9_3.csv		video_info_2022_9_3.csv
web_scraping.ipynb		web_scraping.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webscraping-Youtube

Overview

Images

Results

Discussion

About

Releases

Packages

Contributors 2

Languages

j-hendricks/Webscraping-Youtube

Folders and files

Latest commit

History

Repository files navigation

Webscraping-Youtube

Overview

Images

Results

Discussion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages