# Starting Pitcher Interactive Scouting Report

**Background:**

The goal to gain competitive advantages for hitters in Major League Baseball through intel dates back historically to the game's inception. Today, MLB teams prepare detailed scouting packets and reports prior to every game capturing opposing pitchers’ tendencies attempting to provide their hitters an advantage while they are at the plate. 

Throughout an at-bat, hitters need to be ready to face several pitch types moving in multiple directions, with different movement patterns, thrown at different velocities and directions. This is not an easy task which is why the big leaguers are paid substantial multi-million-dollar salaries to hit a round white ball. For instance, in the case of a 90 mile per hour fastball, the pitch only takes .4 seconds to reach home plate upon release from the pitcher's hand. According to researchers at UC Berkeley, hitters only have about .25 seconds to see the ball as it is moving towards them [(source)](https://www.kqed.org/news/96462/how-can-anyone-hit-a-90-mph-fastball-science-explains). Only 24.4% of pitches thrown were hit in the 2021 season. Any intel enabling a hitter to better guess what pitch is coming next serves a potential advantage against the opposing team. 

Most MLB teams attempt to prepare their hitters prior to games with analysis on the opposing pitcher, typically including information on pitch locations, velocity, spin rates, and tendencies. Starting in 2015, video technology and cloud services help track pitch-level game events further enabling both in-game and out-of-game baseball analytics. Such tracking capabilities capture every pitch including pitch velocity, location, spin rate, and the result of the pitch (i.e., strike, ball, out, single, home run). Using the publicly available MLB data from [baseballsavant.mlb.com](https://baseballsavant.mlb.com/), this interactive visual scouting report produces valuable insights to hitters preparing pre-game to face MLB starting pitchers.

**Scope:** 

2021 pitch data from starting pitchers w/ over 100 pitches registered.

**Notes:**

* All pitch heights are normalized to the average MLB hitter top and bottom of the strike zone.
* Only the first 9 innings are included
* No postseason data is included
* Rare pitch types such as knuckleball, eephus, and screwballs are filtered out of the data. 

In [25]:
##### Import Libraries -----

# data manipulation
import numpy as np
import pandas as pd 
import os
import zipfile

# image insertion
from IPython.display import Image
from IPython.core.display import HTML
from IPython.display import Markdown as md

# widgets
import ipywidgets as widgets

# plotting
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *
from matplotlib import gridspec
import plotly.graph_objects as go
from math import pi

# math
import math

# pybaseball
from pybaseball import playerid_reverse_lookup
import pybaseball as pyb


##### Global Options -----

# do not show warnings
import warnings
warnings.filterwarnings('ignore')

# set pandas display options
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 2000)
pd.set_option('display.max_colwidth', 2000)
pd.options.display.float_format = '{:.5f}'.format

# set data directory
DATA_DIR = "C:/Users/13202/final-project-dataviz/data"

In [54]:
##### Define Functions -----

'''
define a function for loading in dataset
'''
def load_data(in_path, name):
    df = pd.read_csv(in_path, sep=';')
    return df

'''
Define a function to filter the original statcast data to the desired scope
'''
def statcast_df_filter(data,
                       pitcher_name_filter,
                       pitch_name_filter,
                       stand_filter,
                       batter_name_filter, 
                       count_filter,
                       count_advantage_filter,
                       outs_when_up_filter,
                       inning_filter,
                       runners_on_base_filter,
                       run_differential_filter):
    
    # filter pitcher name
    statcast_df_filtered = data[data['pitcher_name'] == pitcher_name_filter]
    
    # filter pitch name
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['pitch_name'].isin(pitch_name_filter)]
    
    # filter batter stance
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['stand'].isin(stand_filter)]
    
    # filter batter name
    if batter_name_filter != 'All':
        statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['batter_name'] == batter_name_filter]
        
    # filter count
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['count'].isin(count_filter)]
    
    # filter count advantage
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['count_advantage'].isin(count_advantage_filter)]
    
    # filter outs
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['outs_when_up'].isin(outs_when_up_filter)]
    
    # filter inning
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['inning'].isin(inning_filter)]
    
    # filter runners on base
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['runners_on_base'].isin(runners_on_base_filter)]
    
    # filter run differential
    statcast_df_filtered = statcast_df_filtered[(statcast_df_filtered['run_differential'] >= run_differential_filter[0]) &
                                                (statcast_df_filtered['run_differential'] <= run_differential_filter[1])]
    
    return statcast_df_filtered


'''
Define a function to filter the original statcast data to the desired scope.
Does not filter on pitcher.
'''
def statcast_df_non_pitcher_filter(data,
                                   pitcher_name_filter,
                                   pitch_name_filter,
                                   stand_filter,
                                   batter_name_filter, 
                                   count_filter,
                                   count_advantage_filter,
                                   outs_when_up_filter,
                                   inning_filter,
                                   runners_on_base_filter,
                                   run_differential_filter):
    
    # filter pitch name
    statcast_df_filtered = data[data['pitch_name'].isin(pitch_name_filter)]
    
    # filter batter stance
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['stand'].isin(stand_filter)]
    
    # filter batter name
    if batter_name_filter != 'All':
        statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['batter_name'] == batter_name_filter]
        
    # filter count
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['count'].isin(count_filter)]
    
    # filter count advantage
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['count_advantage'].isin(count_advantage_filter)]
    
    # filter outs
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['outs_when_up'].isin(outs_when_up_filter)]
    
    # filter inning
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['inning'].isin(inning_filter)]
    
    # filter runners on base
    statcast_df_filtered = statcast_df_filtered[statcast_df_filtered['runners_on_base'].isin(runners_on_base_filter)]
    
    # filter run differential
    statcast_df_filtered = statcast_df_filtered[(statcast_df_filtered['run_differential'] >= run_differential_filter[0]) &
                                                (statcast_df_filtered['run_differential'] <= run_differential_filter[1])]
    
    return statcast_df_filtered


'''
Define a function to print number of pitches given dashboard filters
'''
def number_of_pitches(data,
                      pitcher_name_filter,
                      pitch_name_filter,
                      stand_filter,
                      batter_name_filter, 
                      count_filter,
                      count_advantage_filter,
                      outs_when_up_filter,
                      inning_filter,
                      runners_on_base_filter,
                      run_differential_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)
    
    num_observations = len(statcast_df_filtered)
    
    display(md("**<font size='8'>{}</font>**".format(num_observations)))
    

'''
Define a function to print bar plot of pitch selection percents given dashboard filters
'''
def pitch_selection_bar(data,
                        pitcher_name_filter,
                        pitch_name_filter,
                        stand_filter,
                        batter_name_filter, 
                        count_filter,
                        count_advantage_filter,
                        outs_when_up_filter,
                        inning_filter,
                        runners_on_base_filter,
                        run_differential_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)
    
    # create a dataframe with counts and relative frequency of pitch name
    temp_df1 = pd.DataFrame(statcast_df_filtered.groupby(['pitch_name'])['pitch_name'].count())
    temp_df1 = pd.DataFrame(temp_df1['pitch_name'] / temp_df1.groupby([True]*len(temp_df1))['pitch_name'].transform('sum')).add_suffix('_percent').reset_index()

    # determine order and create a categorical type of pitch name
    pitch_name_list = temp_df1.sort_values(by = 'pitch_name_percent').pitch_name.tolist()
    pitch_name_cat = pd.Categorical(temp_df1['pitch_name'], categories = pitch_name_list)

    # assign category to a new column
    temp_df1 = temp_df1.assign(pitch_name_cat = pitch_name_cat)

    # plot a bar chart of the relative frequency of pitch selection
    (ggplot(temp_df1) +
      aes(x = 'pitch_name_cat', y = 'pitch_name_percent', fill = 'pitch_name') +
      geom_bar(size = 20, stat = 'identity') +
      coord_flip() +
      geom_text(aes(label = 'pitch_name_percent*100'), format_string = '{:.1f}%', position = position_stack(vjust = 0.5)) +
      scale_y_continuous(labels = lambda l: ["%d%%" % (v * 100) for v in l]) +
      labs(x = "Pitch Type", y = "% of Pitches", title = "Pitch Selection Breakdown", fill = "")  +
      theme_minimal() +
      theme(legend_position = "none",
            panel_grid_major = element_blank(),
	    panel_grid_minor = element_blank(),
            axis_title_y = element_text(angle = 0, margin = dict([('r', 40)])),
            axis_title_x = element_text(angle = 0, margin = dict([('t', 15)])),
            figure_size = (8, 5))).draw();
    
    
'''
Define a function to plot pitch frequency by count given dashboard filters
'''
def pitch_count_sankey(data,
                       pitcher_name_filter,
                       pitch_name_filter,
                       stand_filter,
                       batter_name_filter, 
                       count_filter,
                       count_advantage_filter,
                       outs_when_up_filter,
                       inning_filter,
                       runners_on_base_filter,
                       run_differential_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)

    # calculate number of pitches in each source/target
    count_df = statcast_df_filtered.groupby(['count', 'lead_count'])['pitch_number'].agg('count').reset_index()

    # specify a reference table with the count labels and their locations in the sankey chart
    d = {'count': ['0-0', '0-1',  '0-2', '1-1', '1-0',  '2-0', '1-2', '2-1', '2-2', '3-0', '3-1', '3-2'],
         'x':     ['0',   '0.2',  '0.4', '0.4', '0.2',  '0.4', '0.6', '0.6', '0.8', '0.6', '0.8', '1'],
         'y':     ['0.5', '0.75', '1',   '0.5', '0.25', '0',   '1',   '0.5', '0.75', '0', '0.25', '0.5'],
         'count_index': [*range(12)]}

    count_ref_df = pd.DataFrame(data=d)

    # merge the count index
    tmp_df = count_df.merge(count_ref_df, how='left', on='count')

    # merge the next count index
    final_count_df = tmp_df.merge(count_ref_df, how='left', left_on='lead_count', right_on='count', suffixes=('', '_lead'))

    # plot the sankey chart
    fig = go.FigureWidget(data=[go.Sankey(
        arrangement = "snap",
        valueformat = ".0f",
        valuesuffix = " pitches",
        node = dict(
          pad = 15,
          thickness = 20,
          line = dict(color = "black", width = 0.5),
          label = count_ref_df['count'],
          x = list(count_ref_df['x']),
          y = list(count_ref_df['y']),
          color = "blue"
        ),
        link = dict(
          source = list(final_count_df['count_index']),
          target = list(final_count_df['count_index_lead']),
          value = list(final_count_df['pitch_number'])
      ))])

    fig.update_layout(title_text="Pitch Count Flow", 
                      font_size=12, 
                      autosize=False,
                      width=900,
                      height=600)

    fig.show(renderer="png")
    
    
'''
Define a function to plot pitch type percent by count given dashboard filters
'''    
def pitch_count_stacked_bar(data,
                            pitcher_name_filter,
                            pitch_name_filter,
                            stand_filter,
                            batter_name_filter, 
                            count_filter,
                            count_advantage_filter,
                            outs_when_up_filter,
                            inning_filter,
                            runners_on_base_filter,
                            run_differential_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)
    
    # create a dataframe with counts and relative frequency of count
    temp_df1 = pd.DataFrame(statcast_df_filtered.groupby(['count'])['count'].count())
    temp_df1 = pd.DataFrame(temp_df1['count'] / temp_df1.groupby([True]*len(temp_df1))['count'].transform('sum')).add_suffix('_percent').reset_index()

    # determine order and create a categorical type of count
    count_list = temp_df1.sort_values(by = 'count_percent')['count'].tolist()
    count_cat = pd.Categorical(temp_df1['count'], categories = count_list)

    # assign category to a new column
    temp_df1 = temp_df1.assign(count_cat = count_cat)

    # create a dataframe with counts and relative frequency of count and pitch name
    temp_df2 = pd.DataFrame(statcast_df_filtered.groupby(['count', 'pitch_name'])['count'].count()).add_suffix('_group').reset_index()
    temp_df3 = pd.DataFrame(temp_df2['count_group'] / temp_df2.groupby('count')['count_group'].transform('sum')).add_suffix('_percent')
    temp_df4 = pd.concat([temp_df2.reset_index(drop = True), temp_df3], axis = 1)

    # determine order and create a categorical type of count
    count_list = temp_df1.sort_values(by = 'count_percent')['count'].tolist()
    count_cat = pd.Categorical(temp_df4['count'], categories = count_list)

    # assign category to a new column
    temp_df4 = temp_df4.assign(count_cat = count_cat)

    # plot a bar chart of the relative frequency of pitch selection by count
    (ggplot(temp_df4) +
      aes(x = 'count_cat', y = 'count_group_percent', fill = 'pitch_name') +
      geom_bar(size = 20, stat = 'identity', position = "stack") +
      coord_flip() +
      geom_text(aes(label = 'count_group_percent*100'), format_string = '{:.1f}%', position = position_stack(vjust = 0.5)) +
      scale_y_continuous(labels = lambda l: ["%d%%" % (v * 100) for v in l]) +
      labs(x = "Count", y = "% of Pitches", title = "Pitch Count Breakdown", fill = "Pitch Type") +
      theme_minimal() +
      theme(panel_grid_major = element_blank(),
            panel_grid_minor = element_blank(),
            axis_title_y = element_text(angle = 0, margin = dict([('r', 40)])),
            axis_title_x = element_text(angle = 0, margin = dict([('t', 15)])),
            figure_size = (8, 5))).draw();
    
    
'''
Define a function to plot pitch type percent by count advantage given dashboard filters
'''
def pitch_count_advantage_stacked_bar(data,
                                      pitcher_name_filter,
                                      pitch_name_filter,
                                      stand_filter,
                                      batter_name_filter, 
                                      count_filter,
                                      count_advantage_filter,
                                      outs_when_up_filter,
                                      inning_filter,
                                      runners_on_base_filter,
                                      run_differential_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)

    # create a dataframe with counts and relative frequency
    temp_df2 = pd.DataFrame(statcast_df_filtered.groupby(['count_advantage', 'pitch_name'])['count_advantage'].count()).add_suffix('_group').reset_index()
    temp_df3 = pd.DataFrame(temp_df2['count_advantage_group'] / temp_df2.groupby('count_advantage')['count_advantage_group'].transform('sum')).add_suffix('_percent')
    temp_df4 = pd.concat([temp_df2.reset_index(drop=True), temp_df3], axis = 1)

    # plot a bar chart of the relative frequency of pitch selection by count
    (ggplot(temp_df4) +
      aes(x = 'count_advantage', y = 'count_advantage_group_percent', fill = 'pitch_name') +
      geom_bar(size = 20, stat = 'identity', position = "stack") +
      coord_flip() +
      geom_text(aes(label = 'count_advantage_group_percent*100'), format_string = '{:.1f}%', position = position_stack(vjust = 0.5)) +
      scale_y_continuous(labels = lambda l: ["%d%%" % (v * 100) for v in l]) +
      labs(x = "Pitcher\nCount\nAdvantage", y = "% of Pitches", title = "Pitch Count Breakdown", fill = "Pitch Type") +
      theme_minimal() +
      theme(panel_grid_major = element_blank(),
            panel_grid_minor = element_blank(),
            axis_title_y = element_text(angle = 0, margin = dict([('r', 40)])),
            axis_title_x = element_text(angle = 0, margin = dict([('t', 15)])),
            figure_size = (8, 2))).draw();
    

'''
Define a function to plot pitch location broken down by selection given dashboard filters
'''
def plot_pitch_location(data,
                        pitcher_name_filter,
                        pitch_name_filter,
                        stand_filter,
                        batter_name_filter, 
                        count_filter,
                        count_advantage_filter,
                        outs_when_up_filter,
                        inning_filter,
                        runners_on_base_filter,
                        run_differential_filter,
                        breakdown_var_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)

    # initialise a list which will store the breakdown plots
    plots = list()

    # initialize the breakdown variable
    if breakdown_var_filter == 'none':
        breakdown = statcast_df_filtered['pitcher_name']
    else:
        breakdown = statcast_df_filtered[breakdown_var_filter]

    # get breakdown variable categories
    categories = breakdown.value_counts().index.tolist()

    # create subplot for each level of category in breakdown variable
    for j, i in enumerate(categories):
        
        dat = statcast_df_filtered[breakdown == i]
        
        # if number of pitches is greater than 50
        # plot 2d density of pitch location
        if len(dat) >= 50:
            
            if j == 0:
                leg_pos = 'bottom'
            else:
                leg_pos = 'none'
            
            p1 = (ggplot(dat) +
                    aes(x = 'plate_x*-1', y = 'plate_z_norm') +
                    stat_density_2d(aes(fill = 'stat(..level..)'), geom = "polygon", levels = 30) +
                    geom_rect(aes(xmin = -0.83, xmax = 0.83, ymin = 1.574895560522476, ymax = 3.394016229859721), alpha = 0, color = 'white') +
                    coord_cartesian(ylim = [1, 4], xlim = [-1.66, 1.66]) +
                    labs(title = i, fill='           Density') +
                    theme_minimal() +
                    theme(legend_position = leg_pos,
                          panel_background = element_rect(fill = '#440154FF', color = '#440154FF'),
                          panel_grid_major = element_blank(),
                          panel_grid_minor = element_blank(),
                          axis_title_y = element_text(angle = 0, margin = dict([('r', 40)])),
                          axis_title_x = element_text(angle = 0, margin = dict([('t', 15)])),
                          axis_ticks = element_blank(),
                          axis_text = element_blank(),
                          figure_size = (10, 5)))
        else:
            p1 = (ggplot(dat) +
                    aes(x = 'plate_x*-1', y = 'plate_z_norm') +
                    geom_point(alpha=.3) +
                    geom_rect(aes(xmin = -0.83, xmax = 0.83, ymin = 1.574895560522476, ymax = 3.394016229859721), alpha = 0, color = 'grey') +
                    coord_cartesian(ylim = [1, 4], xlim = [-1.66, 1.66]) +
                    labs(title = i) +
                    theme_minimal() +
                    theme(legend_position = 'none',
                          panel_grid_major = element_blank(),
                          panel_grid_minor = element_blank(),
                          axis_title_y = element_text(angle = 0, margin = dict([('r', 40)])),
                          axis_title_x = element_text(angle = 0, margin = dict([('t', 15)])),
                          axis_ticks = element_blank(),
                          axis_text = element_blank(),
                          figure_size = (10, 5)))

        # append plot to list
        plots.append(p1)

    # create empty figure for subplots to be added
    fig = (ggplot() + 
              geom_blank(data = statcast_df_filtered) + 
              theme_void() + 
              theme(figure_size = (5*len(categories), 5))).draw()

    # create grid specification
    gs = gridspec.GridSpec(1, len(categories))

    # initialise a blank list for the subplots
    ax = list()

    # for each level of the category assign subplot to grid
    for i in range(len(categories)):
        ax.append(fig.add_subplot(gs[0, i]))
        ax[i].set_title(categories[i])

        # add subplot to figure
        _ = plots[i]._draw_using_figure(fig, [ax[i]])

    # show figure
    fig.show()
    
    
'''
Define a function to plot scatterplot of pitch movement by pitch type given dashboard filters
'''
def pitch_movement_scatter(data,
                           pitcher_name_filter,
                           pitch_name_filter,
                           stand_filter,
                           batter_name_filter, 
                           count_filter,
                           count_advantage_filter,
                           outs_when_up_filter,
                           inning_filter,
                           runners_on_base_filter,
                           run_differential_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)

    # plot the pitch movement by pitch name
    (ggplot(statcast_df_filtered) +
       aes(x = 'pfx_x*-12', y = 'pfx_z*12', color = 'pitch_name') +
       geom_point() +
       scale_x_continuous(limits = [-36, 36], breaks = list(range(-36, 48, 12))) +
       labs(x = "Horizontal Movement", y = "Vertical \nMovement", title = "Pitch Movement (inches) by Pitch Type", color = "Pitch Type") +
       theme_minimal() +
       theme(panel_grid_major = element_blank(),
             panel_grid_minor = element_blank(),
             axis_title_y = element_text(angle = 0, margin = dict([('r', 40)])),
             axis_title_x = element_text(angle = 0, margin = dict([('t', 15)])),
             figure_size = (5, 5))).draw();
    

'''
Define a function to plot scatterplot of pitch release point by pitch type given dashboard filters
'''
def pitch_release_scatter(data,
                          pitcher_name_filter,
                          pitch_name_filter,
                          stand_filter,
                          batter_name_filter, 
                          count_filter,
                          count_advantage_filter,
                          outs_when_up_filter,
                          inning_filter,
                          runners_on_base_filter,
                          run_differential_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)

    # plot the pitch release position by pitch type
    (ggplot(statcast_df_filtered) +
       aes(x = 'release_pos_x', y = 'release_pos_z', color = 'pitch_name') +
       geom_point() +
       scale_x_continuous(limits = [-5, 5]) +
       scale_y_continuous(limits = [0, 7]) +
       labs(x = "Horizontal Release Point", y = "Vertical \nRelease\nPoint", title = "Pitch Release Point (feet) by Pitch Type", color = "Pitch Type") +
       theme_minimal() +
       theme(panel_grid_major = element_blank(),
             panel_grid_minor = element_blank(),
             axis_title_y = element_text(angle = 0, margin = dict([('r', 40)])),
             axis_title_x = element_text(angle = 0, margin = dict([('t', 15)])),
             figure_size = (5, 5))).draw();
    

'''
Define a function to plot bar plot of pitch event results given dashboard filters
'''
def pitch_result_bar(data,
                     pitcher_name_filter,
                     pitch_name_filter,
                     stand_filter,
                     batter_name_filter, 
                     count_filter,
                     count_advantage_filter,
                     outs_when_up_filter,
                     inning_filter,
                     runners_on_base_filter,
                     run_differential_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)

    # create a dataframe with counts and relative frequency of event
    temp_df1 = pd.DataFrame(statcast_df_filtered.groupby(['events'])['events'].count())
    temp_df1 = pd.DataFrame(temp_df1['events'] / temp_df1.groupby([True]*len(temp_df1))['events'].transform('sum')).add_suffix('_percent').reset_index()

    # order and create a categorical variable of event
    events_list = temp_df1.sort_values(by = 'events_percent').events.tolist()
    events_cat = pd.Categorical(temp_df1['events'], categories = events_list)

    # assign category to a new column
    temp_df1 = temp_df1.assign(events_cat = events_cat)

    # plot a bar chart of the relative frequency of at bat event
    (ggplot(temp_df1) +
      aes(x = 'events_cat', y = 'events_percent', fill = 'events') +
      geom_bar(size = 20, stat = 'identity') +
      coord_flip() +
      geom_text(aes(label = 'events_percent*100'), format_string = '{:.1f}%', position=position_stack(vjust = 0.5)) +
      scale_y_continuous(labels = lambda l: ["%d%%" % (v * 100) for v in l]) +
      labs(x = "Result Type", y = "% of At-bats", title = "Results Breakdown", fill = "")  +
      theme_minimal() +
      theme(legend_position = "none",
            panel_grid_major = element_blank(),
            panel_grid_minor = element_blank(),
            axis_title_y = element_text(angle = 0, margin = dict([('r', 40)])),
            axis_title_x = element_text(angle = 0, margin = dict([('t', 15)])),
            figure_size = (8, 5))).draw();
    

'''
Define a function to plot bar plot of batted ball type given dashboard filters
'''
def pitch_bb_type_bar(data,
                      pitcher_name_filter,
                      pitch_name_filter,
                      stand_filter,
                      batter_name_filter, 
                      count_filter,
                      count_advantage_filter,
                      outs_when_up_filter,
                      inning_filter,
                      runners_on_base_filter,
                      run_differential_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)
    
    # create a dataframe with counts and relative frequency of batted ball type
    temp_df1 = pd.DataFrame(statcast_df_filtered[statcast_df_filtered['batted_ball_type'] != "nan"].groupby(['batted_ball_type'])['batted_ball_type'].count())
    temp_df1 = pd.DataFrame(temp_df1['batted_ball_type'] / temp_df1.groupby([True]*len(temp_df1))['batted_ball_type'].transform('sum')).add_suffix('_percent').reset_index()

    # order and create a categorical variable of batted ball type
    batted_ball_type_list = temp_df1.sort_values(by = 'batted_ball_type_percent').batted_ball_type.tolist()
    batted_ball_type_cat = pd.Categorical(temp_df1['batted_ball_type'], categories = batted_ball_type_list)

    # assign category to a new column
    temp_df1 = temp_df1.assign(batted_ball_type_cat = batted_ball_type_cat)

    # plot a bar chart of the relative frequency of batted ball type
    (ggplot(temp_df1) +
      aes(x = 'batted_ball_type_cat', y = 'batted_ball_type_percent', fill = 'batted_ball_type') +
      geom_bar(size = 20, stat = 'identity') +
      coord_flip() +
      geom_text(aes(label = 'batted_ball_type_percent*100'), format_string = '{:.1f}%', position = position_stack(vjust = 0.5)) +
      scale_y_continuous(labels = lambda l: ["%d%%" % (v * 100) for v in l]) +
      labs(x = "Result Type", y = "% of At-bats", title = "Results Breakdown", fill = "")  +
      theme_minimal() +
      theme(legend_position = "none",
            panel_grid_major = element_blank(),
            panel_grid_minor = element_blank(),
            axis_title_y = element_text(angle = 0, margin = dict([('r', 40)])),
            axis_title_x = element_text(angle = 0, margin = dict([('t', 15)])),
            figure_size=(8, 5))).draw();
    
    
'''
Define a function to plot scatterplot of batted balls given dashboard filters
'''
def pitch_bb_location(data,
                      pitcher_name_filter,
                      pitch_name_filter,
                      stand_filter,
                      batter_name_filter, 
                      count_filter,
                      count_advantage_filter,
                      outs_when_up_filter,
                      inning_filter,
                      runners_on_base_filter,
                      run_differential_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)
    
    # plot a scatterplot of batted ball location
    pyb.spraychart(statcast_df_filtered, 'dodgers', title = 'Batted Ball Spray Chart', size = 50)
    

'''
Define a function to plot pitcher performance by times through order (tto) given dashboard filters
'''
def pitcher_tto_line(data,
                     pitcher_name_filter,
                     pitch_name_filter,
                     stand_filter,
                     batter_name_filter, 
                     count_filter,
                     count_advantage_filter,
                     outs_when_up_filter,
                     inning_filter,
                     runners_on_base_filter,
                     run_differential_filter,
                     breakdown_tto_var_filter):
    
    # filter data
    statcast_df_filtered = statcast_df_filter(data,
                                              pitcher_name_filter,
                                              pitch_name_filter,
                                              stand_filter,
                                              batter_name_filter, 
                                              count_filter,
                                              count_advantage_filter,
                                              outs_when_up_filter,
                                              inning_filter,
                                              runners_on_base_filter,
                                              run_differential_filter)
    
    # group the data frame by pitcher and times through order and calculate a number of stats from each group
    tto_summary = statcast_df_filtered.groupby(
        ['pitcher_name', 'tto']
    ).agg(
        {
            'game_pk':'count',
            'strike_ind': "mean",
            'whiff_ind': "mean",
            'woba_value':"mean",
            'launch_speed':'mean',
            'release_spin_rate':'mean'
        }
    ).reset_index()

    # reformat output
    tto_summary['Strike %'] = round(tto_summary['strike_ind']*100, 1)
    tto_summary['Whiff %'] = round(tto_summary['whiff_ind']*100, 1)
    tto_summary['wOBA'] = tto_summary['woba_value']
    tto_summary['Exit Velocity'] = tto_summary['launch_speed']
    tto_summary['Spin Rate'] = tto_summary['release_spin_rate']
    
    # plot the statistic summary by times through order
    (ggplot(tto_summary) +
      aes(x = 'tto', y = breakdown_tto_var_filter) +
      geom_point(stat = 'identity') +
      geom_line(stat = 'identity') +
      geom_text(aes(label = breakdown_tto_var_filter), format_string = '{:.2f}', va = "bottom", ha = "left") +
      labs(x = "Times through order", title = "Statistics by Times Through Order")  +
      theme_minimal() +
      scale_x_continuous(limits = [1, 4.2]) +
      theme(legend_position = "none",
            panel_grid_major = element_blank(),
            panel_grid_minor = element_blank(),
            axis_title_y = element_text(angle = 0, margin = dict([('r', 40)])),
            axis_title_x = element_text(angle = 0, margin = dict([('t', 15)])),
            figure_size = (8, 5))).draw();
    

'''
Define a function to print table of pitcher statistics vs. MLB and
plot radar chart with MLB pitcher percentiles given dashboard filters
'''
def pitcher_compare(data,
                    pitcher_name_filter,
                    pitch_name_filter,
                    stand_filter,
                    batter_name_filter, 
                    count_filter,
                    count_advantage_filter,
                    outs_when_up_filter,
                    inning_filter,
                    runners_on_base_filter,
                    run_differential_filter):
    
    # filter data to all mlb pitchers
    statcast_df_non_pitcher_filtered = statcast_df_non_pitcher_filter(data,
                                                                      pitcher_name_filter,
                                                                      pitch_name_filter,
                                                                      stand_filter,
                                                                      batter_name_filter, 
                                                                      count_filter,
                                                                      count_advantage_filter,
                                                                      outs_when_up_filter,
                                                                      inning_filter,
                                                                      runners_on_base_filter,
                                                                      run_differential_filter)
    
    # group the data frame by pitcher and calculate a number of stats from each group
    statcast_pitcher_summary = statcast_df_non_pitcher_filtered.groupby(
        ['pitcher_name']
    ).agg(
        {
            'strike_ind': "mean",
            'whiff_ind': "mean",
            'woba_value':"mean",
            'launch_speed':'mean',
            'release_spin_rate':'mean'
        }
    ).reset_index()
    
    # calculate pitcher percentile for all statistics
    statcast_pitcher_summary['strike_ind_pct'] = statcast_pitcher_summary.strike_ind.rank(pct = True)*100
    statcast_pitcher_summary['whiff_ind_pct'] = statcast_pitcher_summary.whiff_ind.rank(pct = True)*100
    statcast_pitcher_summary['woba_value_pct'] = (1 - statcast_pitcher_summary.woba_value.rank(pct = True))*100
    statcast_pitcher_summary['launch_speed_pct'] = (1 - statcast_pitcher_summary.launch_speed.rank(pct = True))*100
    statcast_pitcher_summary['release_spin_rate_pct'] = statcast_pitcher_summary.release_spin_rate.rank(pct = True)*100
    
    # filter to pitcher of interest
    statcast_pitcher_summary_filtered = statcast_pitcher_summary[statcast_pitcher_summary['pitcher_name'] == pitcher_name_filter]
    
    # group all other pitchers outside of pitcher of interest to same group
    statcast_df_non_pitcher_filtered['pitcher_ind'] = np.where(statcast_df_non_pitcher_filtered['pitcher_name'] == pitcher_name_filter, pitcher_name_filter, 'Rest of MLB')

    # group the data frame by pitcher category and extract a number of stats from each group
    compare_df = statcast_df_non_pitcher_filtered.groupby(
        ['pitcher_ind']
    ).agg(
        {
            'strike_ind': "mean",
            'whiff_ind': "mean",
            'woba_value':"mean",
            'launch_speed':'mean',
            'release_spin_rate':'mean'
        }
    ).reset_index()
    
    # reformat output
    compare_df['strike_ind'] = round(compare_df['strike_ind']*100, 1).astype(str)
    compare_df['whiff_ind'] = round(compare_df['whiff_ind']*100, 1).astype(str)
    compare_df['woba_value'] = round(compare_df['woba_value'], 3).astype(str)
    compare_df['launch_speed'] = round(compare_df['launch_speed'], 1).astype(str)
    compare_df['release_spin_rate'] = round(compare_df['release_spin_rate'], 1).astype(str)
    
    # set the column names of the clean table output
    compare_df.columns = [" ", 'Strike %', 'Whiff %', 'wOBA', 'Avg. Exit Velocity', 'Avg. Spin Rate']
    
    # display statistics table
    display(compare_df)
    
    # set variables to select from summary data
    categories=['strike_ind_pct', 'whiff_ind_pct', 'woba_value_pct', 'launch_speed_pct', 'release_spin_rate_pct']
    
    # set clean names of variables for labelling
    categories_clean=['Strike %', 'Whiff %', 'wOBA', 'Avg. Exit Velocity', 'Avg. Spin Rate']
    
    # get number of categories
    N = len(categories)

    # plot the values - need to set the last value to the first value to close loop
    values = np.array(statcast_pitcher_summary_filtered[categories])[0].tolist()
    values += values[:1]

    # set angles for each statistic in radar chart
    angles = [n / float(N) * 2 * pi for n in range(N)]
    angles += angles[:1]
    
    # initialise the radar plot size
    plt.figure(figsize = (8, 8), facecolor='white')
    
    # initialise the sub plot
    ax = plt.subplot(111, polar = True)

    # set x ticks
    plt.xticks(angles[:-1], categories_clean, color = 'grey', size = 15)

    # set y labels
    ax.set_rlabel_position(30)
    plt.yticks([20, 40, 60, 80, 100], ["20", "40", "60", "80", "100"], color = "grey", size = 15)
    plt.ylim(0,100)

    # plot data
    ax.plot(angles, values, linewidth = 1, linestyle = 'solid')

    # fill area
    ax.fill(angles, values, 'b', alpha = 0.2)
    
    # set plot title
    ax.set_title("Pitcher Performance MLB Percentiles", size = 15)
    
    # set spin to invisible
    ax.spines['polar'].set_visible(False)
    
    # print space
    print("")
    
    # show plot
    plt.show()

In [27]:
##### Load Data -----

# set the input data set names we will load in
ds_name = "pitch_data"

# load in each dataset
statcast_df = load_data(os.path.join(DATA_DIR, f'{ds_name}.csv'), ds_name)

## Pitcher Filters

Search for the pitcher you are attempting to analyze the next baseball game. This interactive dashboard allows you to view the pitcher's tendencies including:

* What pitch types the pitcher throws
* What pitch types the pitcher throws based on the count
* Where the pitcher throws their pitches
* How much movement is typical for their pitches
* Where they release their pitches
* Historical batter results and location of batted balls
* The pitchers performance by times through the batting order
* The pitchers performance relative to other pitcher's in the MLB

You can also filter by the pitch type to see the same pitcher tendencies for a particular pitch type.

In [28]:
# pitcher name
pitcher_name_case = widgets.Combobox(
    value='Max Scherzer',
    options=list(statcast_df['pitcher_name'].unique()),
    description='Pitcher:',
    ensure_option=True,
    disabled=False
)

# pitch type
pitch_name_case = widgets.SelectMultiple(
    options=sorted(list(statcast_df['pitch_name'].dropna().unique())),
    value=sorted(list(statcast_df['pitch_name'].dropna().unique())),
    description='Pitch Type',
    disabled=False
)

# set the tabbed pitcher filters
tab_contents = ['Pitcher',
                'Pitch Type']

children = [pitcher_name_case,
            pitch_name_case]

tab_pitcher = widgets.Tab()
tab_pitcher.children = children
for i in range(len(tab_contents)):
    tab_pitcher.set_title(i, str(tab_contents[i]))
    
tab_pitcher

Tab(children=(Combobox(value='Max Scherzer', description='Pitcher:', ensure_option=True, options=('Robert Gsel…

## Batter Filters

Most starting pitcher tendencies vary based on the stance of the batter. You can filter to your hitting stance to see historical pitcher tendencies for right-handed or left-handed batters. 

If you are wanting to see historical data only for a particular hitter's at-bats against a pitcher, you can filter to a specific MLB hitter.

In [29]:
# stand (batter stance)
stand_case = widgets.SelectMultiple(
    options=sorted(list(statcast_df['stand'].dropna().unique())),
    value=sorted(list(statcast_df['stand'].dropna().unique())),
    description='Batter Stance',
    disabled=False
)

# batter name
batter_name_case = widgets.Combobox(
    value='All',
    options=list(statcast_df['batter_name'].unique()),
    description='Batter:',
    ensure_option=True,
    disabled=False
)

# set the tabbed batter filters
tab_contents = ['Batter Stance',
                'Batter']

children = [stand_case,
            batter_name_case]

tab_batter = widgets.Tab()
tab_batter.children = children
for i in range(len(tab_contents)):
    tab_batter.set_title(i, str(tab_contents[i]))
    
tab_batter

Tab(children=(SelectMultiple(description='Batter Stance', index=(0, 1), options=('L', 'R'), value=('L', 'R')),…

## Game Situation Filters

Pitcher tendencies are heavily influenced by the scenario of the game. For example, at-bats with runners on base may differ greatly from at-bats with no runners on base in terms of pitch selection, location, and results.

Use the filters below to see how pitcher tendencies vary by:

* Count
* Outs
* Inning
* Runners on base
* Score (run differential: pitching team - batting team)

In [30]:
# count
count_case = widgets.SelectMultiple(
    options=sorted(list(statcast_df['count'].dropna().unique())),
    value=sorted(list(statcast_df['count'].dropna().unique())),
    description='Pitch Count',
    disabled=False
)

# count_advantage
count_advantage_case = widgets.SelectMultiple(
    options=sorted(list(statcast_df['count_advantage'].dropna().unique())),
    value=sorted(list(statcast_df['count_advantage'].dropna().unique())),
    description='Pitch Count Description',
    disabled=False
)

# innings
inning_case = widgets.SelectMultiple(
    options=sorted(list(statcast_df['inning'].dropna().unique())),
    value=sorted(list(statcast_df['inning'].dropna().unique())),
    description='Inning',
    disabled=False
)

# outs
outs_when_up_case = widgets.SelectMultiple(
    options=sorted(list(statcast_df['outs_when_up'].dropna().unique())),
    value=sorted(list(statcast_df['outs_when_up'].dropna().unique())),
    description='Outs',
    disabled=False
)

# runners on base
allowed_tags = ['Empty',
                '1B', 
                '2B', 
                '3B',
                '1B & 2B',
                '1B & 3B',
                '2B & 3B',
                'Bases Loaded']

runners_on_base_case = widgets.SelectMultiple(
    options=allowed_tags,
    value=allowed_tags,
    description='Runners On Base',
    disabled=False
)

# run differential
options = [i for i in range(-20, 21)]
run_differential_case = widgets.SelectionRangeSlider(
    options=options,
    index=(0, 40),
    description='Run Differential',
    disabled=False
)

# set the tabbed situation filters
tab_contents = ['Pitch Count',
                'Pitch Count Advantage',
                'Outs',
                'Inning',
                'Runners On Base',
                'Run Differential']

children = [count_case,
            count_advantage_case,
            outs_when_up_case,
            inning_case,
            runners_on_base_case,
            run_differential_case]

tab_situation = widgets.Tab()
tab_situation.children = children
for i in range(len(tab_contents)):
    tab_situation.set_title(i, str(tab_contents[i]))
    
tab_situation

Tab(children=(SelectMultiple(description='Pitch Count', index=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), options=…

## Number of Pitches:

In [31]:
widgets.interactive_output(number_of_pitches, {'data': widgets.fixed(statcast_df),
                                               'pitcher_name_filter':pitcher_name_case,
                                               'pitch_name_filter':pitch_name_case,
                                               'stand_filter':stand_case,
                                               'batter_name_filter':batter_name_case,
                                               'count_filter':count_case,
                                               'count_advantage_filter':count_advantage_case,
                                               'outs_when_up_filter':outs_when_up_case,
                                               'inning_filter':inning_case,
                                               'runners_on_base_filter':runners_on_base_case,
                                               'run_differential_filter':run_differential_case})

Output()

## Pitcher Tendencies

### Pitch Selection

Most pitchers mix their pitches substantially during at-bats. Below is the distribution of pitches thrown of a particular pitch type. 

Use the filters above to observe how this visual changes based on the game scenario.

In [32]:
widgets.interactive_output(pitch_selection_bar, {'data': widgets.fixed(statcast_df),
                                                 'pitcher_name_filter':pitcher_name_case,
                                                 'pitch_name_filter':pitch_name_case,
                                                 'stand_filter':stand_case,
                                                 'batter_name_filter':batter_name_case,
                                                 'count_filter':count_case,
                                                 'count_advantage_filter':count_advantage_case,
                                                 'outs_when_up_filter':outs_when_up_case,
                                                 'inning_filter':inning_case,
                                                 'runners_on_base_filter':runners_on_base_case,
                                                 'run_differential_filter':run_differential_case})

Output()

### Pitch Count Flow

Pitchers often perform better when they are in a pitch count advantage. One way to assess a pitcher's count advantage is to visualize the frequencies pitchers arrive at specific pitch counts. A sankey diagram illustrates the flow of the number of pitches in each pitch count. 

Use the filters to compare results based on pitch type and other scenarios.

In [55]:
widgets.interactive_output(pitch_count_sankey, {'data': widgets.fixed(statcast_df),
                                                'pitcher_name_filter':pitcher_name_case,
                                                'pitch_name_filter':pitch_name_case,
                                                'stand_filter':stand_case,
                                                'batter_name_filter':batter_name_case,
                                                'count_filter':count_case,
                                                'count_advantage_filter':count_advantage_case,
                                                'outs_when_up_filter':outs_when_up_case,
                                                'inning_filter':inning_case,
                                                'runners_on_base_filter':runners_on_base_case,
                                                'run_differential_filter':run_differential_case})

Output()

### Pitch Selection by Count

The pitch count heavily influences which pitch types a pitcher will throw in an at-bat. This is typically the most predictive input to determining which pitch will be thrown next in an at-bat.

In [35]:
widgets.interactive_output(pitch_count_stacked_bar, {'data': widgets.fixed(statcast_df),
                                                     'pitcher_name_filter':pitcher_name_case,
                                                     'pitch_name_filter':pitch_name_case,
                                                     'stand_filter':stand_case,
                                                     'batter_name_filter':batter_name_case,
                                                     'count_filter':count_case,
                                                     'count_advantage_filter':count_advantage_case,
                                                     'outs_when_up_filter':outs_when_up_case,
                                                     'inning_filter':inning_case,
                                                     'runners_on_base_filter':runners_on_base_case,
                                                     'run_differential_filter':run_differential_case})

Output()

It is easier to summarise the pitch selection based on groupings of the count situation. Most pitchers tendencies are similar based on their advantage in the count against a hitter. 

3 strikes and the hitter is 'out'. 4 balls and the hitter 'walks' and reached 1st base.

A pitcher either has a count advantage (ahead), a disadvantage (behind), or neither an advantage or disadvantage (even). Often pitchers choose to throw their "strikeout" or best pitch with 2 strikes. Naturally we split the ahead advantage into two categories: ahead w/ less than 2 strikes and ahead with 2 strikes.

* Even counts: 1-0, 2-1, 3-2
* Behind counts: 2-0, 3-0, 3-1
* Ahead (<2 strikes) counts: 0-0, 0-1, 1-1
* Ahead (2 strikes) counts: 0-2, 1-2, 2-2

In [36]:
widgets.interactive_output(pitch_count_advantage_stacked_bar, {'data': widgets.fixed(statcast_df),
                                                               'pitcher_name_filter':pitcher_name_case,
                                                               'pitch_name_filter':pitch_name_case,
                                                               'stand_filter':stand_case,
                                                               'batter_name_filter':batter_name_case,
                                                               'count_filter':count_case,
                                                               'count_advantage_filter':count_advantage_case,
                                                               'outs_when_up_filter':outs_when_up_case,
                                                               'inning_filter':inning_case,
                                                               'runners_on_base_filter':runners_on_base_case,
                                                               'run_differential_filter':run_differential_case})

Output()

### Pitch Location

Knowing where a pitch is likely to be thrown is highly important to increasing the hitter's chances of reaching base.

Pitch location is heavily influenced by the pitch type thrown, the batter's stance, the pitch count, and other in-game scenarios. 

Use the breakdown filter below to plot side-by-side heatmaps of the pitch locations by an option.

Notes:
* The strike zone is highlighted by the box in the middle of each heatmap.
* Heatmaps show the density of pitch location and is easier to interpret. The more yellow the chart, the higher the relative amount of pitches thrown in that location of the strike zone. The more purple, the lower the amount of pitches thrown in that area.
* Limited pitch counts for specific breakdowns create problems with heatmaps. Thus, any breakdown with less than 50 pitches, is shown as a simple scatterplot with the historical pitch locations.

In [37]:
# pitch location breakdown options
options = ['none',
           'pitch_name',
           'stand',
           'count_advantage',
           'outs_when_up',
           'runners_on_base',
           'run_differential']

# breakdown_var
breakdown_var_case = widgets.Dropdown(
    options=options,
    value='none',
    description='Pitch Location Breakdown',
    disabled=False
)

In [38]:
breakdown_var_case

Dropdown(description='Pitch Location Breakdown', options=('none', 'pitch_name', 'stand', 'count_advantage', 'o…

In [39]:
widgets.interactive_output(plot_pitch_location, {'data': widgets.fixed(statcast_df),
                                                 'pitcher_name_filter':pitcher_name_case,
                                                 'pitch_name_filter':pitch_name_case,
                                                 'stand_filter':stand_case,
                                                 'batter_name_filter':batter_name_case,
                                                 'count_filter':count_case,
                                                 'count_advantage_filter':count_advantage_case,
                                                 'outs_when_up_filter':outs_when_up_case,
                                                 'inning_filter':inning_case,
                                                 'runners_on_base_filter':runners_on_base_case,
                                                 'run_differential_filter':run_differential_case,
                                                 'breakdown_var_filter':breakdown_var_case})

Output()

### Pitch Movement

Differing pitches vary in spin rate and movement. Movement of pitches varies greatly by the pitch thrown. Pitcher's with more movement variation in their pitches tend to perform quite well compared to pitches whose pitches move similarly by pitch type.

Notes: 
* Pitch movement is calculated from the pitcher's perspective. There is no gravity taken into account when calculating vertical and horizontal movement. Thus where vertical movement is positive, this represents the pitch movement via the pitcher's point-of-view.
* Horizontal movement is shown relative to the pitcher's perspective. Thus a positive movement coincides with movement away from a right-handed batter and into a left-handed batter.

In [40]:
widgets.interactive_output(pitch_movement_scatter, {'data': widgets.fixed(statcast_df),
                                                    'pitcher_name_filter':pitcher_name_case,
                                                    'pitch_name_filter':pitch_name_case,
                                                    'stand_filter':stand_case,
                                                    'batter_name_filter':batter_name_case,
                                                    'count_filter':count_case,
                                                    'count_advantage_filter':count_advantage_case,
                                                    'outs_when_up_filter':outs_when_up_case,
                                                    'inning_filter':inning_case,
                                                    'runners_on_base_filter':runners_on_base_case,
                                                    'run_differential_filter':run_differential_case})

Output()

### Pitch Release Point

Where a pitcher releases pitches also sometimes varies greatly by the pitch type thrown.

Pitcher's releasing different pitch types in the same relative arm height and angle tend to perform quite well compared to pitches whose arm height and angle differ greatly by pitch type. Most major league hitter's perform well enough to identify pitch type based on the pitcher's arm angle if there are differences in release points.

The purpose of the visual below is to show if there is significant overlap in release point location by pitch type for a given pitcher.

In [41]:
widgets.interactive_output(pitch_release_scatter, {'data': widgets.fixed(statcast_df),
                                                   'pitcher_name_filter':pitcher_name_case,
                                                   'pitch_name_filter':pitch_name_case,
                                                   'stand_filter':stand_case,
                                                   'batter_name_filter':batter_name_case,
                                                   'count_filter':count_case,
                                                   'count_advantage_filter':count_advantage_case,
                                                   'outs_when_up_filter':outs_when_up_case,
                                                   'inning_filter':inning_case,
                                                   'runners_on_base_filter':runners_on_base_case,
                                                   'run_differential_filter':run_differential_case})

Output()

## Pitcher Performance

### At-bat Results Type

One way to evaluate a pitcher's ability is to analyze the results of their at-bats. Often pitcher's are evaluated by their strikeout and walk rates. This section provides such information in addition to home-run rate and other possible at-bat results.

In [42]:
widgets.interactive_output(pitch_result_bar, {'data': widgets.fixed(statcast_df),
                                              'pitcher_name_filter':pitcher_name_case,
                                              'pitch_name_filter':pitch_name_case,
                                              'stand_filter':stand_case,
                                              'batter_name_filter':batter_name_case,
                                              'count_filter':count_case,
                                              'count_advantage_filter':count_advantage_case,
                                              'outs_when_up_filter':outs_when_up_case,
                                              'inning_filter':inning_case,
                                              'runners_on_base_filter':runners_on_base_case,
                                              'run_differential_filter':run_differential_case})

Output()

### Batted Ball Type

A more advanced performance measure is determined based on hitter contact type for his respective batted balls. There are several types of contact defined by the MLB.

Typically high performing pitchers incur lower barrel and solid contact rates.

In [43]:
widgets.interactive_output(pitch_bb_type_bar, {'data': widgets.fixed(statcast_df),
                                               'pitcher_name_filter':pitcher_name_case,
                                               'pitch_name_filter':pitch_name_case,
                                               'stand_filter':stand_case,
                                               'batter_name_filter':batter_name_case,
                                               'count_filter':count_case,
                                               'count_advantage_filter':count_advantage_case,
                                               'outs_when_up_filter':outs_when_up_case,
                                               'inning_filter':inning_case,
                                               'runners_on_base_filter':runners_on_base_case,
                                               'run_differential_filter':run_differential_case})

Output()

### Batted Ball Location

In addition to results and contact types, pitcher's batted ball location may determine how defenses are going to position themselves against certain hitters. 

Use the filters above to select hitter stance and/or a hitter and view their resulting batter ball locations.

In [44]:
widgets.interactive_output(pitch_bb_location, {'data': widgets.fixed(statcast_df),
                                               'pitcher_name_filter':pitcher_name_case,
                                               'pitch_name_filter':pitch_name_case,
                                               'stand_filter':stand_case,
                                               'batter_name_filter':batter_name_case,
                                               'count_filter':count_case,
                                               'count_advantage_filter':count_advantage_case,
                                               'outs_when_up_filter':outs_when_up_case,
                                               'inning_filter':inning_case,
                                               'runners_on_base_filter':runners_on_base_case,
                                               'run_differential_filter':run_differential_case})

Output()

### MLB Comparison

Several modern day statistics used to assess pitcher performance are:
* Strike %: The percentage of pitches thrown deemed as strikes.
* Whiff %: The percentage of pitches thrown which resulted in a swing and miss.
* wOBA: version of on-base percentage accounting for how a player reached base. The value for each method of reaching base is determined by how much that event is worth in relation to projected runs scored (example: a double is worth more than a single) [(source)](https://www.mlb.com/glossary/advanced-stats/weighted-on-base-average).
* Exit Velocity: How fast, in miles per hour, a ball was hit by a batter (on average).
* Spin Rate: How much spin, in revolutions per minute, a pitch was thrown with (on average).

We can compare each pitcher to the rest of the league by analyzing their percentile rank amongst other MLB pitchers across these statistical categories. The higher the percentile, the better the pitcher performs in the respective statistical category against other MLB pitchers. The radar chart below displays a relative comparison of the selected pitcher across each category. The chart depicts which statistic(s) the player performs exceptional and/or poorly.

Use the filters above to determine percentile ranks for pitch types, hitter stance, scenarios, etc.

In [45]:
widgets.interactive_output(pitcher_compare, {'data': widgets.fixed(statcast_df),
                                             'pitcher_name_filter':pitcher_name_case,
                                             'pitch_name_filter':pitch_name_case,
                                             'stand_filter':stand_case,
                                             'batter_name_filter':batter_name_case,
                                             'count_filter':count_case,
                                             'count_advantage_filter':count_advantage_case,
                                             'outs_when_up_filter':outs_when_up_case,
                                             'inning_filter':inning_case,
                                             'runners_on_base_filter':runners_on_base_case,
                                             'run_differential_filter':run_differential_case})

Output()

In [46]:
from platform import python_version

print(python_version())

3.9.13


### Times Through Lineup

Often hitters perform better each additional chance they receive against a pitcher in a game. The highest performing pitchers typically do not decline in performance multiple times through the order.

Select a performance measure to see if there is drop off in the pitcher's average performance as the pitcher faces hitters multiple times throughout games. 

In [47]:
# pitch location breakdown options
options = ['Strike %',
           'Whiff %',
           'wOBA',
           'Exit Velocity',
           'Spin Rate']

# breakdown_var
breakdown_tto_var_case = widgets.Dropdown(
    options=options,
    value='Strike %',
    description='Pitcher Performance Indicator',
    disabled=False
)

In [48]:
breakdown_tto_var_case

Dropdown(description='Pitcher Performance Indicator', options=('Strike %', 'Whiff %', 'wOBA', 'Exit Velocity',…

In [49]:
widgets.interactive_output(pitcher_tto_line, {'data': widgets.fixed(statcast_df),
                                              'pitcher_name_filter':pitcher_name_case,
                                              'pitch_name_filter':pitch_name_case,
                                              'stand_filter':stand_case,
                                              'batter_name_filter':batter_name_case,
                                              'count_filter':count_case,
                                              'count_advantage_filter':count_advantage_case,
                                              'outs_when_up_filter':outs_when_up_case,
                                              'inning_filter':inning_case,
                                              'runners_on_base_filter':runners_on_base_case,
                                              'run_differential_filter':run_differential_case,
                                              'breakdown_tto_var_filter':breakdown_tto_var_case})

Output()