Skip to content

mspbannister/dand-p4-billboard

Repository files navigation

dand-p4-billboard

Udacity DAND Project 4 – Explore and Summarise Data

This is my submission for Project 4 ('Explore and Summarise Data') on Udacity's Data Analyst Nanodegree. The project brief asked the student to conduct exploratory data analysis and create an RMD file that explores the variables, structure, patterns, oddities, and underlying relationships of a data set of the student's choice. I chose to examine the relationships between musical and non-musical variables for every Billboard Hot 100 number one single.

Update: a trend I highlighted in this report formed the basis of a Newsweek article published on 13/4/2017.

The intended project outcomes were to demonstrate the student's ability to:

  • Understand the distribution of a variable and to check for anomalies and outliers
  • Learn how to quantify and visualise individual variables within a data set by using appropriate plots such as scatter plots, histograms, bar charts, and box plots
  • Explore variables to identify the most important variables and relationships within a data set before building predictive models; calculate correlations, and investigate conditional means
  • Learn powerful methods and visualisations for examining relationships among multiple variables, such as reshaping data frames and using aesthetics like colour and shape to uncover more information

List of files:

  • 'Billboard_analysis__100417_.md': Markdown output of project submission file
  • 'Billboard_analysis__100417__files': Plot images for use with 'Billboard_analysis__100417_.md'
  • 'Billboard_analysis__100417_.html': HTML output of project submission file
  • 'Billboard data description.txt': description of the underlying data set and variables
  • 'Spotify_API.py': Python script for collecting track information from Spotify API
  • 'Wikipedia_scraping.py': Python script for scraping Billboard Hot 100 data from Wikipedia
  • 'Matching_datasets.py': Python script for compiling Spotify and Wikipedia data sets