## Bar Chart Race - Premier League

Download the dataset and load the libraries

[https://www.kaggle.com/lynuhs/premier-league-19922017/version/1?select=premierLeague_tables_1992-2017.csv]

In [1]:
import pandas as pd
import numpy as np

Since we’re using a custom dataset, we need to, obviously, preprocess our data to the correct format. If you take a look at the official documentation of bar-chart-race library, you need to preprocess your data into a specific format before you can generate your own bar chart race animation.

To be able to generate a bar chart race, you need to convert your data into wide-format where:

- The row represents the time period. Each row represents a single period of time.
- The column represents the different categories that we want to visualize. Each column holds the value of the categories.
- Use the time component as the index.

In [2]:
premierLeague = pd.read_csv("premierLeague_tables_1992-2017.csv")

In [3]:
premierLeague.head()

Unnamed: 0,season,team,points,w,d,l,gf,ga,gd,pld,...,d_h,d_a,l_h,l_a,gf_h,gf_a,ga_h,ga_a,gd_h,gd_a
0,2017-18,Manchester City,100,32,4,2,106,27,79,38,...,2,2,1,1,61,45,14,13,47,32
1,2017-18,Manchester United,81,25,6,7,68,28,40,38,...,2,4,2,5,38,30,9,19,29,11
2,2017-18,Tottenham Hotspur,77,23,8,7,74,36,38,38,...,4,4,2,5,40,34,16,20,24,14
3,2017-18,Liverpool,75,21,12,5,84,38,46,38,...,7,5,0,5,45,39,10,28,35,11
4,2017-18,Chelsea,70,21,7,10,62,38,24,38,...,4,3,4,6,30,32,16,22,14,10


In [4]:
premierLeague.shape

(526, 26)

Remove unnecessary features

In [7]:
premierLeague = premierLeague[['season', 'team', 'points']]
premierLeague.head()

Unnamed: 0,season,team,points
0,2017-18,Manchester City,100
1,2017-18,Manchester United,81
2,2017-18,Tottenham Hotspur,77
3,2017-18,Liverpool,75
4,2017-18,Chelsea,70


The bar-chart-race library demands a specific format for the dataset for it to be able to generate a bar chart race animation. To transform our data into the proper format, we can use a pivot table method from Pandas.

- data — the dataframe that we want to transform.
- index — the feature that we want to use as an index. In our case, it should be the date.
- columns — the feature that we want to transform where each of the unique value of the feature becomes the column. In our case, this should be the team name.
- values — the value that represents each column in each row. In our case, it should be the number of points.

In [6]:
df = premierLeague.pivot_table(values='points', index=['season'], columns='team')

Replace NaN values

In [8]:
df.fillna(0, inplace=True)
df.sort_values(list(df.columns), inplace=True)

df = df.sort_index()

In [9]:
df.head()

team,Arsenal,Aston Villa,Barnsley,Birmingham City,Blackburn Rovers,Blackpool,Bolton Wanderers,Bournemouth,Bradford City,Brighton and Hove Albion,...,Sunderland,Swansea City,Swindon Town,Tottenham Hotspur,Watford,West Bromwich Albion,West Ham United,Wigan Athletic,Wimbledon FC,Wolverhampton Wanderers
season,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1992-93,56.0,74.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,59.0,0.0,0.0,0.0,0.0,54.0,0.0
1993-94,71.0,57.0,0.0,0.0,84.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,30.0,45.0,0.0,0.0,52.0,0.0,65.0,0.0
1994-95,51.0,48.0,0.0,0.0,89.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,62.0,0.0,0.0,50.0,0.0,56.0,0.0
1995-96,63.0,63.0,0.0,0.0,61.0,0.0,29.0,0.0,0.0,0.0,...,0.0,0.0,0.0,61.0,0.0,0.0,51.0,0.0,41.0,0.0
1996-97,68.0,61.0,0.0,0.0,42.0,0.0,0.0,0.0,0.0,0.0,...,40.0,0.0,0.0,46.0,0.0,0.0,42.0,0.0,56.0,0.0


By replacing the NaN values with 0, we can interpret the data properly. If a club scores 0 point in any given season, it means that the club wasn’t competing in the Premier League.

Now, we need to aggregate the value by accumulating each club’s points over time.

To aggregate the data, we can use cumsum method from Pandas.

In [10]:
df.iloc[:, 0: -1] = df.iloc[:, 0: -1].cumsum()

In [11]:
df.head()

team,Arsenal,Aston Villa,Barnsley,Birmingham City,Blackburn Rovers,Blackpool,Bolton Wanderers,Bournemouth,Bradford City,Brighton and Hove Albion,...,Sunderland,Swansea City,Swindon Town,Tottenham Hotspur,Watford,West Bromwich Albion,West Ham United,Wigan Athletic,Wimbledon FC,Wolverhampton Wanderers
season,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1992-93,56.0,74.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,59.0,0.0,0.0,0.0,0.0,54.0,0.0
1993-94,127.0,131.0,0.0,0.0,155.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,30.0,104.0,0.0,0.0,52.0,0.0,119.0,0.0
1994-95,178.0,179.0,0.0,0.0,244.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,30.0,166.0,0.0,0.0,102.0,0.0,175.0,0.0
1995-96,241.0,242.0,0.0,0.0,305.0,0.0,29.0,0.0,0.0,0.0,...,0.0,0.0,30.0,227.0,0.0,0.0,153.0,0.0,216.0,0.0
1996-97,309.0,303.0,0.0,0.0,347.0,0.0,29.0,0.0,0.0,0.0,...,40.0,0.0,30.0,273.0,0.0,0.0,195.0,0.0,272.0,0.0


By this time, now we can actually generate a bar chart race already. However, we have in total of 49 clubs to visualize. This will lead to an overcrowded visualization that wouldn’t be pleasing to our eyes. To reduce the clutter, we will only visualize the top 10 clubs over time.

Due to this reason, there is no point to keep the information on clubs that will never make a top 10 at any given time. So, it’ll be better to remove them from the dataset completely. The below code will do the job.

In [12]:
top10 = set()

for index, row in df.iterrows():
    top10 |= set(row[row > 0].sort_values(ascending=False).head(10).index)

df = df[top10]

In [13]:
df.head()

team,Sheffield Wednesday,Chelsea,Manchester City,Blackburn Rovers,Norwich City,West Ham United,Tottenham Hotspur,Leeds United,Newcastle United,Manchester United,Arsenal,Everton,Wimbledon FC,Queens Park Rangers,Aston Villa,Liverpool
season,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1992-93,59.0,56.0,57.0,71.0,72.0,0.0,59.0,51.0,0.0,84.0,56.0,53.0,54.0,63.0,74.0,59.0
1993-94,123.0,107.0,102.0,155.0,125.0,52.0,104.0,121.0,77.0,176.0,127.0,97.0,119.0,123.0,131.0,119.0
1994-95,174.0,161.0,151.0,244.0,168.0,102.0,166.0,194.0,149.0,264.0,178.0,147.0,175.0,183.0,179.0,193.0
1995-96,214.0,211.0,189.0,305.0,168.0,153.0,227.0,237.0,227.0,346.0,241.0,208.0,216.0,216.0,242.0,264.0
1996-97,271.0,270.0,189.0,347.0,168.0,195.0,273.0,283.0,295.0,421.0,309.0,250.0,272.0,216.0,303.0,332.0


In [14]:
df.shape

(26, 16)

Let's generate the bar chart race!

In [15]:
import bar_chart_race as bcr

In [None]:
bcr.bar_chart_race(df=df, n_bars=10, sort='desc', title="Premier League 1992-2017", filename='PL_clubs.mp4')

- df — The name of our dataframe.
- n_bars — The number of bars that should be displayed in the visualization.
- sort — The sorting method in the visualization, can be in ascending order or descending order.
- title — The title of your bar chart race visualization.
- filename — The filename of your visualization if you want to save it into mp4 file.

Note that when you want to save the bar chart animation to mp4 file, you need to have a proper ffmpeg file in your computer as prerequisites. To install ffmpeg, you can go to [this site](https://ffmpeg.org/). After the installation process, you need to add the ffmpeg file to your path.