# Project: Classify Pitches as Swinging Strikes

Swinging strikes are strongly correlated with pitcher effectiveness. This is likely because strikeout outs are strongly correlated with pitching effectiveness. Previous research of mine suggested that there is more variance in pitcher's swinging strikeout is more projectable (i.e., has higher $r^2$) from season to season than called strikeout rate. Futhermore, the majority of strikeouts are swinging strikeouts.

The goal of this project is to determine whether data about pitch movement, velocity, release point and location relative to the strike zone is sufficient to classify pitches as swinging strikes. If so, this suggests that pitches induce swinging strikes. It is possible that additional factors involving the pitcher, game state, and the batter are significant influences on whether a pitch will result in a swinging strike.



In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('data/statcast_dumps/statcast2017.csv').drop('Unnamed: 0', axis=1)

In [4]:
df1 = pd.read_csv('data/statcast_dumps/statcast2018.csv').drop('Unnamed: 0', axis=1)
df2 = pd.read_csv('data/statcast_dumps/statcast2019.csv').drop('Unnamed: 0', axis=1)
df3 = pd.read_csv('data/statcast_dumps/statcast2020.csv').drop('Unnamed: 0', axis=1)
df = pd.concat([df,df1,df2,df3])

In [5]:
print(df.shape)

(2355353, 91)


## What's a swinging strike?

The question isn't as simple as it sounds. A foul ball with fewer than 2 strikes has the effect of a swinging strike (unless it's also a popup); it's worse than not swinging at a ball out of the zone and no better than a called strike. With 2 strikes, a foul ball is worse than swinging at a ball out of the zone but better than a called strike. 

_I'm going to stipulate that a foul ball is not a swinging strike, a foul tip is._ This is mostly for simplicity at present, but if this were a multiple classificaiton problem, I would regard foul balls as a separate class.

In [6]:
def is_swinging_strike(row):
    if row['description'] in ['swingin_strike','swinging_strike_blocked','foul_tip']:
        return True
    else:
        return False

In [7]:
df['swinging_strike'] = df.apply(is_swinging_strike,axis=1)
df.columns