# Home Run Bar Chart Race

In this visualization we utilize a python package that creates bar chart races. A Bar Chart Race is a great and highly visual way to display data changing over time in the form of an animated bar chart. It is often a good way to visualize sports due to its very comprehensible representation of time-based changes in data.

We can apply this visualization package to home runs over the 2022 season. In 2022 Judge blew everyone away with 62 home runs, chasing down the American League single season record previously held by Roger Maris at 61. It is no surprise that the visualization also shows this dominance throughout the season. We show this by plotting home runs each week of the regular season to see how the home run leaders change throughout the season.

The data used is pulled from pybaseball, which is explained in the "Learn Python with Baseball" course. Specifically, we use statcast pitch by pitch data to calculate home runs per week for each player and cumulatively sum this over the season.

In [1]:
import pandas as pd
import pybaseball as pyb
import bar_chart_race as bcr
import warnings
pd.set_option('display.max_columns', None)
warnings.simplefilter('ignore')

In [2]:
df = pyb.statcast(start_dt='2022-04-07', end_dt='2022-10-06')
df = df[df['events']=='home_run']
data = df.copy()

This is a large query, it may take a moment to complete


100%|█████████████████████████████████████████| 183/183 [03:57<00:00,  1.30s/it]


In [18]:
data['Date'] = pd.to_datetime(data['game_date']) - pd.to_timedelta(7, unit='d')
data = data.groupby(['batter', pd.Grouper(key='Date', freq='W-MON')])['events'].count().reset_index().sort_values('Date')

player_ids = data['batter'].unique()
pid = pyb.playerid_reverse_lookup(player_ids, key_type='mlbam')

In [21]:
data = data.merge(pid[['name_last','name_first','key_mlbam']],left_on='batter',right_on='key_mlbam',how='left')
data['Player'] = data['name_first'] + ' ' + data['name_last']
data.Player = data.Player.str.title()

In [25]:
df_weekly = data.pivot_table(values = 'events',index=['Date'],columns='Player')
df_weekly.fillna(0, inplace=True)
df_weekly.iloc[:, 0:-1] = df_weekly.iloc[:, 0:-1].cumsum()

top_hr = set()
for index, row in df_weekly.iterrows():
    top_hr |= set(row[row > 0].sort_values(ascending=False).head(10).index)
df_weekly = df_weekly[top_hr]
df_weekly.index = df_weekly.index.astype(str)
#df_weekly.to_csv('HomeRun_Bar_Plot.csv')

In [31]:
bcr.bar_chart_race(df=df_weekly, 
                   n_bars=15, 
                   sort='desc',
                   title='HOME RUN LEADERS BY WEEK',
                   period_length=1000,
                   interpolate_period=False,
                   steps_per_period=10,
                   fixed_max=True)