Skip to content

Racing bar chart animation of artists with the most Billboard Hot 100 Entries using the pandas and data visualization libraries in Python 3.

Notifications You must be signed in to change notification settings

joicodes/Top-100-Racing-Bar-Chart

Repository files navigation

Create a Race Bar in Python

Tutorial Prepared by: Joi Anderson (@joicodes) • View on Notion • Updated: Sept. 27, 2020


Step 1: Curate a Data File 📊


Finding an interesting data set:

A data set is a collection of data.

Data sets are created in many different ways. Some are based off of human observations or surveys, like the U.S. Census. Others may be machine-generated, like satellite forecast data.

The most common format for data sets is a spreadsheet or CSV. Let's aim to find a dataset that is formatted as a CSV.

Here is a list of sources for interesting data sets to explore:

👉🏽 For this workshop, we will be using: Hot 100 singles (1/1/2000 and 12/28/2019)


Understanding Your Data:

Before starting your analysis on the data set, let's take the time to first understand the data we are working with. So let's take a look at the data:


readme_assets/Untitled.png

Observations:

  • About: New Hot 100 singles from January 1, 2000 to December 28, 2019
  • Data Source: Web scrapped from Billboard.com
  • Size: 7,850 rows of data (i.e. 7,850 songs)
  • The first row of my data contains column names.
  • Columns:
    • Week - The week the song entered the Billboard Hot 100
    • EnterPosition - The position the song entered the Billboard Hot 100
    • Song - Name of the song
    • Performer - Name of the performer and features on the song.

Download Data

Export the Google Sheets file as a CSV and move it to our repository:

File > Download > Comma Separated Values (.csv, current sheet)

Rename the file hot100.csv and add it to your repository.

👀 Here is how your data looks as a raw CSV file: Preview




Step 2: Using Pandas 🐼


Meet Pandas (Python Data Analysis Library)

pandas is a Python library that give you a set of tool to do data analysis. If want to work with big data sets, then pandas is going to be your best friend. 👯‍♀️

readme_assets/Startimage.gif

Image from: Python Awesome


To install pandas, in your Terminal write:

pip3 install pandas

After it installs, we can import it into our [main.py](http://main.py) file:

import pandas as pd

Loading our data from CSV file


Now that we've imported pandas, we are ready to read the CSV file into Python using read_csv() from pandas:

data_frame = pd.read_csv("hot100.csv")

To see if it worked, we can see the first few rows of the data by adding the following to our code:

print( data_frame.head() ) 

head() gives us a snap shot of our data, by displaying the first few rows and columns of the data set.

You should an aggregated chart printed to the terminal like this:


readme_assets/Untitled%201.png

We can also see the last rows of the data by using tail()

print( data_frame.tail() )

readme_assets/Untitled%202.png


Step 3: Install Bar Chart Race 🏁


Meet Bar Chart Race


bar_chart_race is an open source Python library that can be used to create animated bar and line chart races in Python. It's is built on top of two popular Python data analytics library: matplotlib and plotly. This library simplifies creating racing graph animation!


👉🏽 See repo


readme_assets/covid19_horiz.gif

To install bar_chart_race, in your Terminal write:

pip3 install bar_chart_race

After it installs, we can import it into our [main.py](http://main.py) file:

import bar_chart_race as bcr

Install Dependency

brew install ffmpeg

If you decide that you want to create a gif animation, install Image Magick and Ghost Script

brew install imagemagick
brew install ghostscript


Step 4: Prepare Data for Bar Chart 🔧


Transform data into 'wide' data


In order to create a racing bar chart, our data set must be in 'wide' form where:

  • Each row represents a single period of time
  • Each column holds the value for a particular category
  • The index contains the time component

To transform our data set into wide form we would need:

  • The index would be the week — using Week
  • Each column has a name an artists who had a Top 100 hit — using Performer
  • Each row should represent the cumulative count of songs by that week.

Here a rough sketch of how it would look :


readme_assets/Untitled%203.png

We can transform to be 'wide' by creating a pivot table with pandas:

wide_data = data_frame.pivot_table(index='Week', columns='Performer', aggfunc='count', fill_value=0).cumsum()

Here is what wide_data.head() will print:

readme_assets/Untitled%204.png

If you want to see the full output, check it out here.


Remove header


The pivot table generated a header for us that is not useful to us.

We can remove this header by using drop level() :

wide_data.columns = wide_data.columns.droplevel(0)

Remove duplicate columns

If you look at the results, there are some duplicated columns:

wide_data = wide_data.loc[:,~wide_data.columns.duplicated()] 

Create a subset

There are hundreds of artists with Billboard Hot 100 hits. Our graph would be wayyyy too big if we decided to make all artists race. Let's shorten our table to 5 columns to compare.

Rather than deleting the columns we are currently not using, we can create a subset with the columns we need with pandas:

Let's choose 5 Performers (i.e. 5 columns of data) to race and store them in a list:


columns = [ "Mariah Carey", "Michael Jackson", "Drake", "Rihanna", "Lady Gaga"]

Using that list of column names, we can create a sub-dataset by doing the following:

sub_dataset = data_frame[columns]

Let's print the first few rows of sub_dataset to what data it contains:

print(sub_dataset.head())

Now that we have our data ready... let the games begin!


Step 5: Create Your Animation 🏁


Create .mp4 with Racing Bar Chart Animation

bcr.bar_chart_race(sub_dataset, filename='hot100.mp4')

Check out your video

Once your program has finished, check your repo for hot100.mp4 and watch your 5 artists race!

Which artists did you choose? Were you surprised about who won?

Here is mine (watch in 5x speed):

https://youtu.be/mgFmybMTnXs

Check the docs for Bar Chart Race to customize your animation!

About

Racing bar chart animation of artists with the most Billboard Hot 100 Entries using the pandas and data visualization libraries in Python 3.

Topics

Resources

Stars

Watchers

Forks

Languages