**Quantifying Pitcher Deception**

In addition to how nasty a pitch is in terms of velocity and movement and the location of the pitch the difference between what the batter expects to get and what they actually get can play a large role in the success of a pitch.

This difference between expectation and reality could be taken to mean what type of pitch is thrown as opposed to what pitch a pitcher tends to throw at that count, but we will leave that to the side for today. We will investigate the difference between the movement on a pitch that is expected from a pitchers release point and the actual movement on a pitch. We will call this the deception of the pitch. Intuitively a batter might expect a 4-seam fastball thrown with a more horizontal arm-slot to have less ride than one thrown from a more vertical arm-slot. If a pitcher can get a lot of ride on their 4-seam from a horizontal arm-slot they might find a lot of batters swinging under their 4-seam.

I will use Statcast data from the past MLB season to  try to model deception.

In [5]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from pybaseball import statcast
import numpy as np
import pybaseball
from pybaseball import cache
pybaseball.cache.enable()

In [6]:
stat_data = statcast(start_dt="2024-03-19", end_dt = "2024-10-16")

This is a large query, it may take a moment to complete


100%|██████████| 212/212 [00:21<00:00, 10.01it/s]


We will start by selecting the data we want. We also will mirror the horizontal movement and release points of lefties to make sure our RHP and LHP are on equal footing.

In [10]:
#restrict to data we care about
df = stat_data[['pitch_type', 'release_pos_x','release_pos_z', 'player_name' , 'pitcher', 'p_throws', 'pfx_x', 'pfx_z' ]].dropna().reset_index()
#mirror the pfx_x and release_pos_x of LHP to match RHP
df.loc[df['p_throws'] == 'L', 'pfx_x'] = df['pfx_x']*(-1)
df.loc[df['p_throws'] == 'L', 'release_pos_x'] = df['release_pos_x']*(-1)
#add horizontal arm angle feature
df['h_release_angle'] = np.arctan(df['release_pos_z'] / df['release_pos_x'])
#add vertical arm angle feature
df['v_release_angle'] = np.arctan(df['release_pos_x'] / df['release_pos_z'])

I have removed sweepers and combined curveballs and knuckle-curves into one category.

In [40]:
df_SL = df[(df['pitch_type'] == 'SL')]
df_FF = df[df['pitch_type'] == 'FF']
df_CH = df[df['pitch_type'] == 'CH']
df_CU = df[(df['pitch_type'] == 'CU') | (df['pitch_type'] == 'KC')]
df_SI = df[df['pitch_type'] == 'SI']
df_FC = df[df['pitch_type'] == 'FC']
df_FS = df[df['pitch_type'] == 'FS']


I selected a KNeighborsRegressor, because the batter is instinctively comparing what he sees from the release to other releases he has seen in the past. This is similar in spirit to the KNeighbors regressor that I have chosen. I chose the number of neighbors to be 10. I chose this k value as a way to balance the increased performance of higher values of k with the understanding that a batter cannot memorize every pitch he has ever seen, so to measure deceiving the batter we want to imitate the batter a bit.

In [23]:
from sklearn.neighbors import KNeighborsRegressor

def my_model(data):

    X = data[['release_pos_x', 'release_pos_z', 'h_release_angle', 'v_release_angle']]
    y = data[['pfx_x', 'pfx_z']]

    clf = KNeighborsRegressor(n_neighbors = 10)
    clf_name = clf.fit(X, y)
    return clf_name

In [87]:
#compute average release point for a given pitch for all pitchers
def averages(df1):
    df2 = pd.DataFrame()
    df2['release_pos_x'] = df1.groupby('player_name').release_pos_x.agg('mean')
    df2['release_pos_z'] = df1.groupby('player_name').release_pos_z.agg('mean')
    df2['h_release_angle'] = df1.groupby('player_name').h_release_angle.agg('mean')
    df2['v_release_angle'] = df1.groupby('player_name').v_release_angle.agg('mean')
    df2['pfx_x'] = df1.groupby('player_name').pfx_x.agg('mean')
    df2['pfx_z'] = df1.groupby('player_name').pfx_z.agg('mean')
    
    df2['tot_pitches'] = df.groupby('player_name').index.count()
    df2 = df2[df2['tot_pitches'] > 100]

    return df2



Now that we have computed the average release positions of all pitchers for a given pitch, we can use our model to compare the expected break on the ball to the actual average break on that pitch for that pitcher.

In [90]:
#fastball deception list
avg_FF = averages(df_FF)
pred_FF = pd.DataFrame(my_model(df_FF).predict(avg_FF[['release_pos_x', 'release_pos_z', 'h_release_angle', 'v_release_angle']]))
avg_FF['pred_pfx_x'] = pred_FF[0].tolist()
avg_FF['pred_pfx_z'] = pred_FF[1].tolist()
avg_FF['x_dec'] = avg_FF['pfx_x'] - avg_FF['pred_pfx_x']
avg_FF['z_dec'] = avg_FF['pfx_z'] - avg_FF['pred_pfx_z']
avg_FF['FF_dec'] = np.abs(avg_FF['x_dec']) + np.abs(avg_FF['z_dec'])
FF_Dec = avg_FF.sort_values(by = ['FF_dec'], ascending = False)

In [92]:
#slider deception list
avg_SL = averages(df_SL)
pred_SL = pd.DataFrame(my_model(df_SL).predict(avg_SL[['release_pos_x', 'release_pos_z', 'h_release_angle', 'v_release_angle']]))
avg_SL['pred_pfx_x'] = pred_SL[0].tolist()
avg_SL['pred_pfx_z'] = pred_SL[1].tolist()
avg_SL['x_dec'] = avg_SL['pfx_x'] - avg_SL['pred_pfx_x']
avg_SL['z_dec'] = avg_SL['pfx_z'] - avg_SL['pred_pfx_z']
avg_SL['SL_dec'] = np.abs(avg_SL['x_dec']) + np.abs(avg_SL['z_dec'])
SL_Dec = avg_SL.sort_values(by = ['SL_dec'], ascending = False)


In [94]:
#curveball deception list
avg_CU = averages(df_CU)
pred_CU = pd.DataFrame(my_model(df_CU).predict(avg_CU[['release_pos_x', 'release_pos_z', 'h_release_angle', 'v_release_angle']]))
avg_CU['pred_pfx_x'] = pred_CU[0].tolist()
avg_CU['pred_pfx_z'] = pred_CU[1].tolist()
avg_CU['x_dec'] = avg_CU['pfx_x'] - avg_CU['pred_pfx_x']
avg_CU['z_dec'] = avg_CU['pfx_z'] - avg_CU['pred_pfx_z']
avg_CU['CU_dec'] = np.abs(avg_CU['x_dec']) + np.abs(avg_CU['z_dec'])
CU_Dec = avg_CU.sort_values(by = ['CU_dec'], ascending = False)

In [96]:
#changeup deception
avg_CH = averages(df_CH)
pred_CH = pd.DataFrame(my_model(df_CH).predict(avg_CH[['release_pos_x', 'release_pos_z', 'h_release_angle', 'v_release_angle']]))
avg_CH['pred_pfx_x'] = pred_CH[0].tolist()
avg_CH['pred_pfx_z'] = pred_CH[1].tolist()
avg_CH['x_dec'] = avg_CH['pfx_x'] - avg_CH['pred_pfx_x']
avg_CH['z_dec'] = avg_CH['pfx_z'] - avg_CH['pred_pfx_z']
avg_CH['CH_dec'] = np.abs(avg_CH['x_dec']) + np.abs(avg_CH['z_dec'])
CH_Dec = avg_CH.sort_values(by = ['CH_dec'], ascending = False)


In [98]:
#sinker deception
avg_SI = averages(df_SI)
pred_SI = pd.DataFrame(my_model(df_SI).predict(avg_SI[['release_pos_x', 'release_pos_z', 'h_release_angle', 'v_release_angle']]))
avg_SI['pred_pfx_x'] = pred_SI[0].tolist()
avg_SI['pred_pfx_z'] = pred_SI[1].tolist()
avg_SI['x_dec'] = avg_SI['pfx_x'] - avg_SI['pred_pfx_x']
avg_SI['z_dec'] = avg_SI['pfx_z'] - avg_SI['pred_pfx_z']
avg_SI['SI_dec'] = np.abs(avg_SI['x_dec']) + np.abs(avg_SI['z_dec'])
SI_Dec = avg_SI.sort_values(by = ['SI_dec'], ascending = False)

In [100]:
#cutter deception list
avg_FC = averages(df_FC)
pred_FC = pd.DataFrame(my_model(df_FC).predict(avg_FC[['release_pos_x', 'release_pos_z', 'h_release_angle', 'v_release_angle']]))
avg_FC['pred_pfx_x'] = pred_FC[0].tolist()
avg_FC['pred_pfx_z'] = pred_FC[1].tolist()
avg_FC['x_dec'] = avg_FC['pfx_x'] - avg_FC['pred_pfx_x']
avg_FC['z_dec'] = avg_FC['pfx_z'] - avg_FC['pred_pfx_z']
avg_FC['FC_dec'] = np.abs(avg_FC['x_dec']) + np.abs(avg_FC['z_dec'])
FC_Dec = avg_FC.sort_values(by = ['FC_dec'], ascending = False)

In [102]:
#Splitter Deception list
avg_FS = averages(df_FS)
pred_FS = pd.DataFrame(my_model(df_FS).predict(avg_FS[['release_pos_x', 'release_pos_z', 'h_release_angle', 'v_release_angle']]))
avg_FS['pred_pfx_x'] = pred_FS[0].tolist()
avg_FS['pred_pfx_z'] = pred_FS[1].tolist()
avg_FS['x_dec'] = avg_FS['pfx_x'] - avg_FS['pred_pfx_x']
avg_FS['z_dec'] = avg_FS['pfx_z'] - avg_FS['pred_pfx_z']
avg_FS['FS_dec'] = np.abs(avg_FS['x_dec']) + np.abs(avg_FS['z_dec'])
FS_Dec = avg_FS.sort_values(by = ['FS_dec'], ascending = False)


**The most deceptive pitches**

Let's first look at the most deceptive pitches in the league last season. First we see which pitchers have the most ride over expected on their fastballs.

In [109]:
FF_Dec.sort_values(by = ['z_dec'], ascending = False).head(10)

Unnamed: 0_level_0,release_pos_x,release_pos_z,h_release_angle,v_release_angle,pfx_x,pfx_z,tot_pitches,pred_pfx_x,pred_pfx_z,x_dec,z_dec,FF_dec
player_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
"Sabrowski, Erik",-1.228333,5.986667,-1.368482,-0.202315,-0.62109,1.676859,278,-0.671,1.255,0.04991,0.421859,0.471769
"Henriquez, Ronny",-1.970112,5.487303,-1.226437,-0.344359,-0.62427,1.51427,303,-0.636,1.106,0.01173,0.40827,0.42
"Kopech, Michael",-2.294806,5.729372,-1.189777,-0.38102,-0.808848,1.518984,1225,-0.639,1.168,-0.169848,0.350984,0.520832
"Matsui, Yuki",-2.014739,6.01254,-1.247532,-0.323264,-0.353673,1.648367,1076,-0.722,1.301,0.368327,0.347367,0.715694
"Gillispie, Connor",-1.835517,5.419138,-1.244253,-0.326543,-0.761034,1.436724,138,-0.658,1.09,-0.103034,0.346724,0.449759
"Gore, MacKenzie",-1.94134,5.848304,-1.250263,-0.320533,-0.492842,1.465745,2994,-0.639,1.122,0.146158,0.343745,0.489903
"Hernandez, Nick",-2.188077,5.694231,-1.20405,-0.366746,-0.260192,1.581346,139,-0.764,1.262,0.503808,0.319346,0.823154
"Eisert, Brandon",-3.618197,5.116066,-0.955781,-0.615015,-0.717213,1.354262,149,-0.673,1.049,-0.044213,0.305262,0.349475
"Burke, Sean",-2.2424,6.1536,-1.221228,-0.349569,-0.5348,1.54776,309,-0.538,1.245,0.0032,0.30276,0.30596
"Pepiot, Ryan",-1.00655,5.945808,-1.402778,-0.168018,-0.731467,1.621415,2291,-0.405,1.328,-0.326467,0.293415,0.619882


These pitchers seem to be mostly guys who have a lot of ride on their fastball independent of their arm-slot.

We also see something interesting if we look at the guys with the least amount of ride on their fastballs. One thing that jumps out to me is Framber Valdez with the second most amount of drop on his fastball compared to expected.

If we dig into some of these pitchers' usage numbers we see that they rarely throw the fastball and throw a lot of sinkers. My hypothesis is that these pitches being classified as fastballs are actually intended to be sinkers rather than fastballs.

This also points to a problem with our model. We want to give credit to pitchers with more ride on their 4-seam fastball than expected, but we want to do the opposite if the pitcher throws 2-seamers or sinkers that get classified as fastballs by statcast.

In [112]:
FF_Dec.sort_values(by = ['z_dec']).head(10)

Unnamed: 0_level_0,release_pos_x,release_pos_z,h_release_angle,v_release_angle,pfx_x,pfx_z,tot_pitches,pred_pfx_x,pred_pfx_z,x_dec,z_dec,FF_dec
player_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
"Mayza, Tim",-1.575,6.33,-1.327003,-0.243793,-0.13,0.573333,761,-0.601,1.495,0.471,-0.921667,1.392667
"Valdez, Framber",-0.602,5.987,-1.470075,-0.100721,-0.842333,0.551667,2676,-0.46,1.472,-0.382333,-0.920333,1.302667
"McFarland, T.J.",-2.09,5.366,-1.199271,-0.371526,-1.498,0.256,836,-0.718,1.057,-0.78,-0.801,1.581
"King, John",-1.900769,5.436923,-1.234402,-0.336395,-0.950769,0.616154,878,-0.722,1.333,-0.228769,-0.716846,0.945615
"Blach, Ty",-2.990921,5.734501,-1.09013,-0.480667,-1.090716,0.498798,1133,-0.847,1.203,-0.243716,-0.704202,0.947918
"Kitchen, Austin",-1.487727,5.570909,-1.309764,-0.261033,-1.025,0.632273,158,-0.539,1.324,-0.486,-0.691727,1.177727
"Zulueta, Yosver",-1.39,5.910714,-1.339774,-0.231022,-0.77,0.722143,286,-0.578,1.353,-0.192,-0.630857,0.822857
"Hudson, Dakota",-2.240741,6.142551,-1.221284,-0.349513,-0.453086,0.673868,1595,-0.491,1.302,0.037914,-0.628132,0.666045
"Peralta, Sammy",-2.631603,4.79771,-1.068626,-0.502171,-1.211374,0.572443,282,-0.888,1.197,-0.323374,-0.624557,0.947931
"Herget, Jimmy",-2.05,5.203333,-1.195344,-0.375452,-1.280909,0.715758,189,-0.814,1.312,-0.466909,-0.596242,1.063152


Let's look at the most deceptive pitches in other categories now.

In [128]:
#Sliders that move more horizontally than expected
SL_Dec.sort_values(by = ['x_dec'], ascending = False).head(10)

Unnamed: 0_level_0,release_pos_x,release_pos_z,h_release_angle,v_release_angle,pfx_x,pfx_z,tot_pitches,pred_pfx_x,pred_pfx_z,x_dec,z_dec,SL_dec
player_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
"Castro, Miguel",-2.52172,4.948817,-1.099641,-0.471156,1.376559,0.209892,242,0.215,0.262,1.161559,-0.052108,1.213667
"Bazardo, Eduard",-2.016553,5.857961,-1.239281,-0.331515,1.325097,-0.111456,441,0.313,0.035,1.012097,-0.146456,1.158553
"Bibee, Tanner",-2.50265,6.0222,-1.177016,-0.39378,1.24075,-0.0002,3113,0.231,0.078,1.00975,-0.0782,1.08795
"Clevinger, Mike",-2.011379,5.703448,-1.231787,-0.339009,1.401207,-0.195517,303,0.481,0.287,0.920207,-0.482517,1.402724
"Strzelecki, Peter",-2.4,4.749706,-1.102573,-0.468223,1.269118,0.487941,192,0.37,0.26,0.899118,0.227941,1.127059
"Devenski, Chris",-0.23,6.36,-1.534649,-0.036148,0.97,-0.45,545,0.114,0.198,0.856,-0.648,1.504
"Bachar, Lake",-1.2102,5.635,-1.3594,-0.211396,1.0452,-0.466,161,0.207,0.104,0.8382,-0.57,1.4082
"Armstrong, Shawn",-1.384167,5.910556,-1.340794,-0.230002,0.995556,-0.130556,1118,0.195,0.054,0.800556,-0.184556,0.985111
"Wrobleski, Justin",-2.204082,5.521429,-1.190831,-0.379965,1.096735,-0.073673,592,0.33,0.235,0.766735,-0.308673,1.075408
"Blackburn, Paul",-2.178894,5.940096,-1.219325,-0.351472,1.23601,-0.357212,1286,0.472,0.054,0.76401,-0.411212,1.175221


In [134]:
#more drop than expected on curveball
CU_Dec.sort_values(by = ['z_dec']).head(10)

Unnamed: 0_level_0,release_pos_x,release_pos_z,h_release_angle,v_release_angle,pfx_x,pfx_z,tot_pitches,pred_pfx_x,pred_pfx_z,x_dec,z_dec,CU_dec
player_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
"Dunning, Dane",-2.221548,5.569405,-1.191163,-0.379633,0.921905,-1.400476,1755,0.525,-0.418,0.396905,-0.982476,1.379381
"France, J.P.",-1.244388,5.595918,-1.351981,-0.218816,0.763776,-1.601224,532,0.592,-0.718,0.171776,-0.883224,1.055
"Matsui, Yuki",-1.768333,6.0325,-1.285735,-0.285061,0.606667,-1.3975,1076,0.506,-0.594,0.100667,-0.8035,0.904167
"Bradley, Taj",-1.225707,6.240976,-1.376822,-0.193975,0.573902,-1.459463,2294,0.596,-0.683,-0.022098,-0.776463,0.798561
"Harris, Hogan",-2.440474,5.932947,-1.180475,-0.390321,0.875737,-1.487632,1213,0.662,-0.722,0.213737,-0.765632,0.979368
"Bloss, Jake",-2.217419,5.910645,-1.211987,-0.35881,0.764194,-1.323871,238,0.777,-0.607,-0.012806,-0.716871,0.729677
"Tyler, Kyle",-2.100121,5.899515,-1.228762,-0.342034,0.580182,-1.157212,580,0.394,-0.447,0.186182,-0.710212,0.896394
"Holton, Tyler",-1.314,6.1432,-1.360115,-0.210682,0.7128,-1.2392,1501,0.538,-0.557,0.1748,-0.6822,0.857
"Cole, Gerrit",-2.270912,6.025529,-1.210365,-0.360432,0.726559,-1.231588,1790,0.745,-0.552,-0.018441,-0.679588,0.698029
"Leiter Jr., Mark",-1.337616,5.836512,-1.345578,-0.225219,0.768547,-1.42,1015,0.555,-0.743,0.213547,-0.677,0.890547


In [138]:
#more drop on sinker than expected
SI_Dec.sort_values(by = ['z_dec'], ascending = True).head(10)

Unnamed: 0_level_0,release_pos_x,release_pos_z,h_release_angle,v_release_angle,pfx_x,pfx_z,tot_pitches,pred_pfx_x,pred_pfx_z,x_dec,z_dec,SI_dec
player_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
"Cano, Yennier",-2.218333,5.453731,-1.184404,-0.386393,-1.426686,-0.150038,1028,-1.237,0.62,-0.189686,-0.770038,0.959723
"Ramirez, Nick",-3.188349,5.763303,-1.065481,-0.505315,-1.329541,0.279817,228,-0.968,1.036,-0.361541,-0.756183,1.117725
"Little, Brendon",-1.731312,5.899282,-1.285326,-0.28547,-1.341955,-0.016238,710,-1.354,0.708,0.012045,-0.724238,0.736282
"Ellard, Fraser",-2.985,5.847778,-1.098799,-0.471997,-1.092778,0.306944,432,-1.029,0.989,-0.063778,-0.682056,0.745833
"Dobnak, Randy",-1.961463,5.509268,-1.228649,-0.342147,-0.970732,-0.065854,183,-1.245,0.523,0.274268,-0.588854,0.863122
"Raley, Brooks",-2.990625,5.600625,-1.080258,-0.490538,-1.20375,0.323125,137,-1.306,0.881,0.10225,-0.557875,0.660125
"Soriano, José",-1.47598,5.981538,-1.328896,-0.2419,-1.275608,0.3617,1737,-1.068,0.893,-0.207608,-0.5313,0.738908
"Hurter, Brant",-3.129507,5.870049,-1.08109,-0.489706,-1.322266,0.353079,797,-1.213,0.884,-0.109266,-0.530921,0.640187
"Emanuel, Kent",-2.234148,5.924148,-1.210212,-0.360585,-1.24233,0.228409,302,-1.155,0.758,-0.08733,-0.529591,0.61692
"Skenes, Paul",-2.323648,5.646186,-1.181165,-0.389631,-1.168027,0.055854,2125,-1.329,0.583,0.160973,-0.527146,0.688119


In [144]:
#more movement on cutter than expected
FC_Dec.sort_values(by = ['FC_dec'], ascending = False).head(10)

Unnamed: 0_level_0,release_pos_x,release_pos_z,h_release_angle,v_release_angle,pfx_x,pfx_z,tot_pitches,pred_pfx_x,pred_pfx_z,x_dec,z_dec,FC_dec
player_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
"Faucher, Calvin",-1.031941,6.007846,-1.400457,-0.170339,0.347154,0.080266,987,0.111,1.011,0.236154,-0.930734,1.166888
"Tinoco, Jesus",-2.28,5.31,-1.165223,-0.405573,-0.58,0.25,664,0.128,0.664,-0.708,-0.414,1.122
"McMillon, John",-1.14,5.74,-1.374741,-0.196055,-0.27,1.69,180,0.156,1.025,-0.426,0.665,1.091
"Lucchesi, Joey",-2.361667,5.96875,-1.193792,-0.377005,-0.424167,1.1375,179,0.275,0.777,-0.699167,0.3605,1.059667
"Banda, Anthony",-2.63,5.39,-1.116843,-0.453954,-0.41,0.08,875,0.153,0.523,-0.563,-0.443,1.006
"Kittredge, Andrew",-1.593333,5.81,-1.303201,-0.267596,0.68,0.076667,1064,0.214,0.614,0.466,-0.537333,1.003333
"Waguespack, Jacob",-0.474118,6.407059,-1.497011,-0.073785,0.469176,0.185529,261,0.103,0.816,0.366176,-0.630471,0.996647
"Bigge, Hunter",-1.0,5.606667,-1.394184,-0.176612,0.678333,-0.095,311,0.381,0.596,0.297333,-0.691,0.988333
"Pennington, Walter",-3.262584,5.806517,-1.058802,-0.511994,-0.072921,0.739326,330,0.658,0.508,-0.730921,0.231326,0.962247
"Sánchez, Cristopher",-1.971667,6.270833,-1.26626,-0.304536,-0.2975,0.354167,2957,0.254,0.745,-0.5515,-0.390833,0.942333


In [146]:
#more movement on splitter than expected
FS_Dec.sort_values(by = ['FS_dec'], ascending = False).head(10)

Unnamed: 0_level_0,release_pos_x,release_pos_z,h_release_angle,v_release_angle,pfx_x,pfx_z,tot_pitches,pred_pfx_x,pred_pfx_z,x_dec,z_dec,FS_dec
player_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
"Pfaadt, Brandon",-2.305,5.635625,-1.184973,-0.385823,-0.344375,0.6675,2864,-1.012,0.043,0.667625,0.6245,1.292125
"Mahle, Tyler",-2.107826,5.463768,-1.20289,-0.367906,-1.132754,1.03971,229,-0.822,0.181,-0.310754,0.85871,1.169464
"Hernández, Carlos",-1.245156,5.958437,-1.364988,-0.205809,-1.013594,0.860156,560,-0.531,0.244,-0.482594,0.616156,1.09875
"Falter, Bailey",-1.264615,5.660769,-1.350908,-0.219888,-0.585385,1.246923,2221,-0.599,0.27,0.013615,0.976923,0.990538
"Megill, Tylor",-1.835772,5.70878,-1.26141,-0.309387,-0.357724,-0.156179,1560,-0.984,0.134,0.626276,-0.290179,0.916455
"Civale, Aaron",-1.167407,6.185185,-1.384162,-0.186634,-1.343889,0.70537,2734,-0.68,0.461,-0.663889,0.24437,0.908259
"Curry, Xzavion",-1.704,5.956,-1.291929,-0.278867,-0.518,0.866,639,-0.983,0.432,0.465,0.434,0.899
"de Geus, Brett",-1.225556,5.978889,-1.368616,-0.20218,-1.235556,0.07,202,-0.658,0.391,-0.577556,-0.321,0.898556
"Sims, Lucas",-2.325,5.62,-1.178678,-0.392118,-0.915,0.71,885,-1.033,-0.069,0.118,0.779,0.897
"Kershaw, Clayton",-1.7,6.136667,-1.300546,-0.27025,-0.706667,1.13,504,-0.82,0.356,0.113333,0.774,0.887333


While it is clear that the total deception metric is not perfect (take for example the discussion about fastballs above) if we weight our pitcher deception by usage we should eliminate some of those problems.

We still may want to tweak the way we quantify deception for each pitch individually, but I think that the total deception metric is a good first approximation of what we want out of a deception metric.

In [148]:
#get pitch usage
df2 = pd.DataFrame()
df2['tot_pitches'] = df.groupby('player_name').index.count()
for p_type in ['FF', 'CH', 'SI', 'FS', 'FC']:
    df2[p_type + '_pitches'] = df[df['pitch_type'] == p_type].groupby('player_name').index.count()
    df2[p_type + '_use'] = df2[p_type + '_pitches'] / df2['tot_pitches']
#curveballs
df2['CU_pitches'] = df[(df['pitch_type'] == 'CU') | (df['pitch_type'] == 'KC')].groupby('player_name').index.count()
df2['CU_use'] = df2['CU_pitches'] / df2['tot_pitches']
#slider/sweeper
df2['SL_pitches'] = df[(df['pitch_type'] == 'SL')].groupby('player_name').index.count()
df2['SL_use'] = df2['SL_pitches'] / df2['tot_pitches']
df2 = df2.fillna(0)

In [150]:
#get usage and deception together
data = df2.copy()
for x in [FF_Dec[['FF_dec']], SL_Dec[['SL_dec']], CH_Dec[['CH_dec']], CU_Dec[['CU_dec']], SI_Dec[['SI_dec']], FC_Dec[['FC_dec']], FS_Dec[['FS_dec']]]:
    data = pd.merge(data, x, how = 'outer',  left_index = True, right_index = True).fillna(0)

In [152]:
#calculate total deception

data['total_deception'] = (data['FF_use']*data['FF_dec']) + (data['CU_use']*data['CU_dec']) + (data['SL_use']*data['SL_dec']) + (data['CH_use']*data['CH_dec']) + (data['SI_use']*data['SI_dec']) + (data['FS_use']*data['FS_dec']) + (data['FC_use']*data['FC_dec'])

data = data[data['tot_pitches'] > 500]

In [158]:
deception_leaders = data['total_deception'].sort_values(ascending = False)
#top 15
deception_leaders.head(15)

player_name
Hader, Josh        1.290838
Pallante, Andre    0.897672
Cano, Yennier       0.81082
Faucher, Calvin    0.782069
Smyly, Drew        0.775422
Skenes, Paul       0.748041
Barlow, Scott      0.700088
Farmer, Buck       0.697556
Ray, Robbie         0.69134
Ureña, José        0.675251
Keller, Brad       0.668225
Poche, Colin       0.665809
Ragans, Cole       0.661174
Blach, Ty          0.653538
Gore, MacKenzie    0.650841
Name: total_deception, dtype: Float64