Calculating the best pitch framer

Methodolgy - Instead of using traditional framing methodolgies like comparing an exact location of a strike zone to where the pitch landed, I compare a pitch to its closest 50 pitches in terms of location to determine the probability of a ball being called a strike. 

Using the probabilities of pitches, we can compare if catchers are over/underperforming this average benchmark. For example, if the final result shows that Austin Hedges has a difference of 0.05. This means that for any given pitch, Hedges has a 5% higher chance of a strike being called than the average catcher. Pretty nice!

In [1]:
from pybaseball import statcast
import pandas as pd
from pybaseball import playerid_reverse_lookup
from tqdm import tqdm
import numpy as np

In [2]:
data_2021 = statcast(start_dt='2021-04-01', end_dt='2021-06-10')

This is a large query, it may take a moment to complete
Completed sub-query from 2021-04-01 to 2021-04-06
Completed sub-query from 2021-04-07 to 2021-04-12
Completed sub-query from 2021-04-13 to 2021-04-18
Completed sub-query from 2021-04-19 to 2021-04-24
Completed sub-query from 2021-04-25 to 2021-04-30
Completed sub-query from 2021-05-01 to 2021-05-06
Completed sub-query from 2021-05-07 to 2021-05-12
Completed sub-query from 2021-05-13 to 2021-05-18
Completed sub-query from 2021-05-19 to 2021-05-24
Completed sub-query from 2021-05-25 to 2021-05-30
Completed sub-query from 2021-05-31 to 2021-06-05
Completed sub-query from 2021-06-06 to 2021-06-10


In [3]:
data_2021 = data_2021[(data_2021['type'] != 'X') & ((data_2021['description'] == 'ball') | (data_2021['description'] == 'called_strike') )]

In [4]:
data_2021 = data_2021[(data_2021['plate_x'].notna()) & (data_2021['plate_z'].notna())]
data_2021.replace(to_replace ="S",
                 value =1, inplace=True)
data_2021.replace(to_replace ="B",
                 value =0, inplace=True)
data_2021.reset_index(inplace=True)

In [5]:
def closest_node(node, nodes):
    nodes = np.asarray(nodes)
    dist_2 = np.sum((nodes - node)**2, axis=1)
    return np.argsort(dist_2)[:50]

def mean_ball_strike(df, position):
    return df.iloc[closest_node(df['x_y'].iloc[position], df['x_y'].drop(position, axis=0).values.tolist())]['type'].mean()

def loop_impl(df):
    result = []
    for i in tqdm(range(len(df))):
        result.append(
          mean_ball_strike(df, i)
        )
    df2 = pd.concat([df, pd.DataFrame(result, columns=['str_pct'])], axis=1)
    return df2

In [6]:
data_2021['x_y'] = data_2021[['plate_x', 'plate_z']].values.tolist()

In [7]:
data_test = loop_impl(data_2021.head(10000))

100%|███████████████████████████████████████████████████████████████████████████| 10000/10000 [00:51<00:00, 193.08it/s]


In [11]:
data_test2 = data_test.groupby(['fielder_2.1']).filter(lambda x:len(x)>100)
data_test2['diff'] = data_test2['type'] - data_test2['str_pct']
framing = data_test2.groupby('fielder_2.1')['diff'].mean().sort_values()

In [12]:
# a list of mlbam ids
player_ids = framing.index.to_list()

# find the names of the players in player_ids, along with their ids from other data sources
data = playerid_reverse_lookup(player_ids, key_type='mlbam')
data.set_index('key_mlbam', inplace=True)

Gathering player lookup table. This may take a moment.


In [13]:
data_all = data.join(framing)
data_all.sort_values('diff')

Unnamed: 0_level_0,name_last,name_first,key_retro,key_bbref,key_fangraphs,mlb_played_first,mlb_played_last,diff
key_mlbam,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
608596,murphy,tom,murpt002,murphto04,13499,2015.0,2021.0,-0.076532
642851,wynns,austin,wynna001,wynnsau01,15271,2018.0,2021.0,-0.063846
543228,gomes,yan,gomey001,gomesya01,9627,2012.0,2021.0,-0.053619
519390,vogt,stephen,vogts001,vogtst01,5000,2012.0,2021.0,-0.051667
669257,smith,will,smitw003,smithwi05,19197,2019.0,2021.0,-0.048745
661388,contreras,william,contw002,contrwi02,-1,2020.0,2021.0,-0.031728
668670,rogers,jake,rogej004,rogerja03,19452,2019.0,2021.0,-0.029212
621512,nido,tomas,nidot001,nidoto01,13755,2017.0,2021.0,-0.028281
553882,narvaez,omar,narvo001,narvaom01,13338,2016.0,2021.0,-0.024479
624512,mcguire,reese,mcgur002,mcguire01,15674,2018.0,2021.0,-0.020388
