# Pitch Classification
**Author**: Eric Wehmueller
***

## Overview

This project is the final/Capstone project for Flatiron School's bootcamp program in Data Science.  We have created a hypothetical situation as a Data Scientist and are hoping to provide value to our business for the scenario.

## Business Problem

A hot topic in the 2021 Major League Baseball season surrounds discussion about certain substances being used by pitchers to increase their "spin rate"- an advanced metric now being recorded on every pitch by sophisticated cameras.  The argument is that a higher spin rate on pitches gives better results, and this substance is legal and used by a high percentage of pitchers around the league.  However, this is not the singular determining factor in throwing an effective pitch: namely, one that will cause a Major League better to swing and miss. Although typically regarded as an "old man's game", can we get a step ahead of the game and leverage this metric and a variety of other data on pitches to know what types of pitches will give us the best results?  

We have been hired as a hypothetical member of the Cardinals baseball organization: a member of the coaching staff.  As a coaching analyst, our job is to create a model that will give us insights into pitch quality and classify a pitch, given its metrics, as a "strike" or a red flag "hit" for our opponent.

## Project Setup

In [34]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# tensorflow/keras libraries
import keras
import tensorflow as tf
from sklearn import metrics
from keras import optimizers
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import classification_report, confusion_matrix

In [6]:
from pybaseball import playerid_lookup, statcast_batter, statcast_pitcher

## Data Exploration

To start, let's see if we can get some immediate value in our current season against one particular player giving us trouble. So far in the 2021 season, Jesse Winker, a member of the Cincinnati Reds, has proven himself as an elite hitter.  Since we are in the same division as this team (NL Central), he is going to be in the batters' box against our pitchers extremely often.  If we can find a way to mitigate the damage he does against our ball club, that would be ideal. Let's work towards creating a model specifically for this.

In [15]:
player_info_df = playerid_lookup('winker','jesse')
player_info_df.head()

Unnamed: 0,name_last,name_first,key_mlbam,key_retro,key_bbref,key_fangraphs,mlb_played_first,mlb_played_last
0,winker,jesse,608385,winkj002,winkeje01,13590,2017.0,2021.0


In [24]:
jwinker_id = 608385
#statcast data (data per pitch, goes back to 2015)
df = statcast_batter('2016-08-01','2021-08-01', jwinker_id)

Gathering Player Data


In [22]:
df.shape

(5805, 92)

In [45]:
print(df.columns)

Index(['pitch_type', 'game_date', 'release_speed', 'release_pos_x',
       'release_pos_z', 'player_name', 'batter', 'pitcher', 'events',
       'description', 'spin_dir', 'spin_rate_deprecated',
       'break_angle_deprecated', 'break_length_deprecated', 'zone', 'des',
       'game_type', 'stand', 'p_throws', 'home_team', 'away_team', 'type',
       'hit_location', 'bb_type', 'balls', 'strikes', 'game_year', 'pfx_x',
       'pfx_z', 'plate_x', 'plate_z', 'on_3b', 'on_2b', 'on_1b',
       'outs_when_up', 'inning', 'inning_topbot', 'hc_x', 'hc_y',
       'tfs_deprecated', 'tfs_zulu_deprecated', 'fielder_2', 'umpire', 'sv_id',
       'vx0', 'vy0', 'vz0', 'ax', 'ay', 'az', 'sz_top', 'sz_bot',
       'hit_distance_sc', 'launch_speed', 'launch_angle', 'effective_speed',
       'release_spin_rate', 'release_extension', 'game_pk', 'pitcher.1',
       'fielder_2.1', 'fielder_3', 'fielder_4', 'fielder_5', 'fielder_6',
       'fielder_7', 'fielder_8', 'fielder_9', 'release_pos_y',
       'estima

In [54]:
pd.set_option('max_columns', 93)
df.head(5)

Unnamed: 0,pitch_type,game_date,release_speed,release_pos_x,release_pos_z,player_name,batter,pitcher,events,description,spin_dir,spin_rate_deprecated,break_angle_deprecated,break_length_deprecated,zone,des,game_type,stand,p_throws,home_team,away_team,type,hit_location,bb_type,balls,strikes,game_year,pfx_x,pfx_z,plate_x,plate_z,on_3b,on_2b,on_1b,outs_when_up,inning,inning_topbot,hc_x,hc_y,tfs_deprecated,tfs_zulu_deprecated,fielder_2,umpire,sv_id,vx0,vy0,vz0,ax,ay,az,sz_top,sz_bot,hit_distance_sc,launch_speed,launch_angle,effective_speed,release_spin_rate,release_extension,game_pk,pitcher.1,fielder_2.1,fielder_3,fielder_4,fielder_5,fielder_6,fielder_7,fielder_8,fielder_9,release_pos_y,estimated_ba_using_speedangle,estimated_woba_using_speedangle,woba_value,woba_denom,babip_value,iso_value,launch_speed_angle,at_bat_number,pitch_number,pitch_name,home_score,away_score,bat_score,fld_score,post_away_score,post_home_score,post_bat_score,post_fld_score,if_fielding_alignment,of_fielding_alignment,spin_axis,delta_home_win_exp,delta_run_exp
0,FS,2021-08-01,86.5,-1.24,5.28,"Winker, Jesse",608385,573186,double,hit_into_play,,,,,14.0,Jesse Winker doubles (27) on a ground ball to ...,R,L,R,NYM,CIN,X,4.0,ground_ball,1,2,2021,-0.94,-0.1,0.28,0.64,,,,0,6,Top,126.0,204.5,,,621512,,,5.470293,-125.956504,-4.687578,-11.03261,22.430124,-32.57362,3.49,1.6,72.0,85.4,4.0,85.9,1767.0,5.6,633129,573186,621512,624413,643446,592273,595879,642086,607680,624424,54.91,0.363,0.35,1.25,1.0,1.0,1.0,2.0,37,5,Split-Finger,0,1,1,0,1,0,1,0,Infield shift,Standard,242.0,-0.068,0.707
1,FS,2021-08-01,88.3,-1.22,5.26,"Winker, Jesse",608385,573186,,foul,,,,,8.0,Jesse Winker doubles (27) on a ground ball to ...,R,L,R,NYM,CIN,S,,,1,2,2021,-0.69,-0.51,-0.15,1.84,,,,0,6,Top,,,,,621512,,,3.98888,-128.595114,-1.212061,-8.478128,25.878591,-37.828954,3.49,1.6,,,,87.4,1902.0,5.6,633129,573186,621512,624413,643446,592273,595879,642086,607680,624424,54.89,,,,,,,,37,4,Split-Finger,0,1,1,0,1,0,1,0,Infield shift,Standard,243.0,0.0,0.0
2,FC,2021-08-01,90.4,-1.37,5.24,"Winker, Jesse",608385,573186,,foul,,,,,2.0,Jesse Winker doubles (27) on a ground ball to ...,R,L,R,NYM,CIN,S,,,1,1,2021,0.23,0.85,0.06,3.21,,,,0,6,Top,,,,,621512,,,3.061059,-131.745523,-1.004938,2.110204,24.609554,-22.198202,3.49,1.6,219.0,73.4,28.0,90.2,2743.0,5.8,633129,573186,621512,624413,643446,592273,595879,642086,607680,624424,54.72,,,,,,,,37,3,Cutter,0,1,1,0,1,0,1,0,Infield shift,Standard,213.0,0.0,-0.056
3,SL,2021-08-01,86.2,-1.27,5.27,"Winker, Jesse",608385,573186,,ball,,,,,14.0,Jesse Winker doubles (27) on a ground ball to ...,R,L,R,NYM,CIN,B,,,0,1,2021,0.94,0.16,1.97,-0.32,,,,0,6,Top,,,,,621512,,,5.749211,-125.440565,-7.388813,9.023742,22.363264,-29.337223,3.32,1.52,,,,86.0,2937.0,5.9,633129,573186,621512,624413,643446,592273,595879,642086,607680,624424,54.63,,,,,,,,37,2,Slider,0,1,1,0,1,0,1,0,Infield shift,Standard,49.0,0.0,0.028
4,SI,2021-08-01,91.2,-1.23,5.29,"Winker, Jesse",608385,573186,,foul,,,,,4.0,Jesse Winker doubles (27) on a ground ball to ...,R,L,R,NYM,CIN,S,,,0,0,2021,-0.96,0.18,-0.64,2.45,,,,0,6,Top,,,,,621512,,,3.502166,-133.082664,-1.781584,-12.223426,21.352922,-29.919013,3.49,1.6,228.0,76.4,45.0,91.7,2482.0,5.9,633129,573186,621512,624413,643446,592273,595879,642086,607680,624424,54.63,,,,,,,,37,1,Sinker,0,1,1,0,1,0,1,0,Infield shift,Standard,218.0,0.0,-0.038


Needs Clarification:
***

pfx_x, pfx_z

plate_x, plate_z

vx0, vy0, vz0

ax, ay, az

sz_top, sz_bot

spin_axis

zone


***
***
Other relevant features for our model:
***
pitch_type/pitch_name, pitch_number

release_speed

release_pos_x, release_pos_z

events, description

stand, p_throws

balls, strikes

release_spin_rate, release_extension







In [31]:
df['description'].unique()
#looking for "intent_ball"

array(['hit_into_play', 'foul', 'ball', 'swinging_strike',
       'called_strike', 'foul_tip', 'blocked_ball',
       'swinging_strike_blocked', 'hit_by_pitch', 'foul_bunt',
       'missed_bunt'], dtype=object)

In [32]:
df['pitch_type'].unique()

array(['FS', 'FC', 'SL', 'SI', 'FF', 'CU', 'CH', 'KC', 'FT', nan, 'FO',
       'KN'], dtype=object)