Shake It Off

The critics don't know what the people want

We analyze Nielsen ratings and Rotten Tomatoes scores to find that critic and audience reviews don't have much to do with what people actually watch.

This is our Flatiron School (NYC Data Science) Module 1 project

See the presentation and conclusions on Google Slides or view the pdf shake-it-off.pdf in our repo.

The purpose of this project was to provide actionable insights to a hypothetical large company looking to enter the streaming wars (i.e. compete with Amazon Prime Video, Netflix, Hulu, Disney+, Apple TV+, etc.)
An ancillary purpose was to demonstrate and practice our new-found skills in web scraping, API usage, SQL, pandas, visualization (matplotlib/seaborn), and creation of ETL pipelines.
Data:
1. Nielsen Ratings (national overnights 18–49) on a daily basis for broadcast primetime and the top 25 cable shows. Scraped from TV By the Numbers. Available back to 2015
2. Rotten Tomatoes audience and critics scores for matched TV shows (766)
3. A list of Netflix and Amazon shows (via Wikipedia: Netflix, Amazon)
Tools (all in Python):
1. BeautifulSoup
2. pandas
3. SQLAlchemy
4. MySQL Server on AWS RDS
5. Seaborn/Matplotlib

data-extraction.ipynb does the extraction (use this).
tv_by_the_numbers.py scrapes TV By the Numbers. It also contains several scraping utilities
tv-show-extra-finding.ipynb works to improve matching to TV By the Numbers shows to Rotten Tomatoes
nflix_amaz_shows.ipynb loads the wikipedia data from a stored csv
rotten_tomatoes.py provides Rotten Tomatoes scraping

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
extract		extract
transform		transform
README.md		README.md
analysis.ipynb		analysis.ipynb
critical_reception.pdf		critical_reception.pdf
critical_reception.svg		critical_reception.svg
db.py		db.py
rt_score_by_service.pdf		rt_score_by_service.pdf
rt_score_by_service.svg		rt_score_by_service.svg
shake-it-off.pdf		shake-it-off.pdf
viewership_by_month.pdf		viewership_by_month.pdf
viewership_by_month.svg		viewership_by_month.svg