# Statcast - Recording every pitch #

Name: Michael Dresser

Statcast is a high-speed camera and analytics system installed in MLB parks to analyze player movements. As of 2015, Statcast is installed in every MLB park. Prior to 2015, Statcast data was available in select parks only. The Statcast data gives us information on every pitch in a game. We'll use pybaseball to get Statcast pitching data. There is even more Statcast data available, but we'll get to that later.

A few things to note: pybaseball is open source and it appears to be supported by one person who loves baseball and python. It's sometimes a little buggy. When queries fail, or timeout, they just fail and there isn't always a good explanation of why it failed. The documentation might not be entirely perfect. Sometimes queries listed in the docs don't work exactly the way they're described. 

First, import statcast and get some data data.

In [1]:
from pybaseball import statcast
data = statcast(start_dt='2017-06-25', end_dt='2017-06-27')
data.head(2)

Unnamed: 0,index,pitch_type,game_date,release_speed,release_pos_x,release_pos_z,player_name,batter,pitcher,events,...,home_score,away_score,bat_score,fld_score,post_away_score,post_home_score,post_bat_score,post_fld_score,if_fielding_alignment,of_fielding_alignment
0,313,CU,2017-06-27,79.7,-1.3441,5.4075,Matt Bush,608070.0,456713.0,field_out,...,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0,Standard,Strategic
1,324,FF,2017-06-27,98.1,-1.3547,5.4196,Matt Bush,429665.0,456713.0,field_out,...,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0,Standard,Strategic


### Documentation on Statcast data ###
The query returns a considerable amount of data. There are 90 separate variables recorded for each pitch. There are multiple sources for Statcast data: 
* <a href="https://baseballsavant.mlb.com/csv-docs" target="_blank">Statcast docs on Baseball Savant</a>
* <a href="https://fastballs.wordpress.com/category/pitchfx-glossary/" target="_blank">PITCH f/X</a>

We can look at the list of variables names first and then it might be easier to think of them in different categories, such as variables that describe the state of the game, what happened on the play, the pitch, the pitcher, and the batter.

In [7]:
list(data)

['index',
 'pitch_type',
 'game_date',
 'release_speed',
 'release_pos_x',
 'release_pos_z',
 'player_name',
 'batter',
 'pitcher',
 'events',
 'description',
 'spin_dir',
 'spin_rate_deprecated',
 'break_angle_deprecated',
 'break_length_deprecated',
 'zone',
 'des',
 'game_type',
 'stand',
 'p_throws',
 'home_team',
 'away_team',
 'type',
 'hit_location',
 'bb_type',
 'balls',
 'strikes',
 'game_year',
 'pfx_x',
 'pfx_z',
 'plate_x',
 'plate_z',
 'on_3b',
 'on_2b',
 'on_1b',
 'outs_when_up',
 'inning',
 'inning_topbot',
 'hc_x',
 'hc_y',
 'tfs_deprecated',
 'tfs_zulu_deprecated',
 'fielder_2',
 'umpire',
 'sv_id',
 'vx0',
 'vy0',
 'vz0',
 'ax',
 'ay',
 'az',
 'sz_top',
 'sz_bot',
 'hit_distance_sc',
 'launch_speed',
 'launch_angle',
 'effective_speed',
 'release_spin_rate',
 'release_extension',
 'game_pk',
 'pitcher.1',
 'fielder_2.1',
 'fielder_3',
 'fielder_4',
 'fielder_5',
 'fielder_6',
 'fielder_7',
 'fielder_8',
 'fielder_9',
 'release_pos_y',
 'estimated_ba_using_speedangle',

### Game-state variables ###
There are some variables related to the state of the game at the time of the pitch. Game state variables include information such as:
* Who is batting
* Who is pitching
* How many outs
* What inning
* Runners on base
* Score

In [3]:
gameState = data.copy()
gameState = gameState[['index',
 'game_date',
 'player_name',
 'batter',
 'pitcher',
 'home_team',
 'away_team',
 'balls',
 'strikes',
 'on_3b',
 'on_2b',
 'on_1b',
 'outs_when_up',
 'inning',
 'inning_topbot',
 'home_score',
 'away_score',
 'bat_score',
 'fld_score',
 'if_fielding_alignment',
 'of_fielding_alignment']]

gameState.head(10)

Unnamed: 0,index,game_date,player_name,batter,pitcher,home_team,away_team,balls,strikes,on_3b,...,on_1b,outs_when_up,inning,inning_topbot,home_score,away_score,bat_score,fld_score,if_fielding_alignment,of_fielding_alignment
0,313,2017-06-27,Matt Bush,608070.0,456713.0,CLE,TEX,0.0,0.0,,...,488726.0,2.0,9.0,Bot,1.0,2.0,1.0,2.0,Standard,Strategic
1,324,2017-06-27,Matt Bush,429665.0,456713.0,CLE,TEX,1.0,1.0,,...,488726.0,1.0,9.0,Bot,1.0,2.0,1.0,2.0,Standard,Strategic
2,335,2017-06-27,Matt Bush,429665.0,456713.0,CLE,TEX,0.0,1.0,,...,488726.0,1.0,9.0,Bot,1.0,2.0,1.0,2.0,Standard,Strategic
3,346,2017-06-27,Matt Bush,429665.0,456713.0,CLE,TEX,0.0,0.0,,...,488726.0,1.0,9.0,Bot,1.0,2.0,1.0,2.0,Standard,Strategic
4,359,2017-06-27,Matt Bush,488726.0,456713.0,CLE,TEX,1.0,1.0,,...,,1.0,9.0,Bot,1.0,2.0,1.0,2.0,Standard,Strategic
5,367,2017-06-27,Matt Bush,488726.0,456713.0,CLE,TEX,1.0,0.0,,...,,1.0,9.0,Bot,1.0,2.0,1.0,2.0,Standard,Strategic
6,380,2017-06-27,Matt Bush,488726.0,456713.0,CLE,TEX,0.0,0.0,,...,,1.0,9.0,Bot,1.0,2.0,1.0,2.0,Standard,Strategic
7,389,2017-06-27,Matt Bush,596019.0,456713.0,CLE,TEX,1.0,1.0,,...,,0.0,9.0,Bot,1.0,2.0,1.0,2.0,Standard,Strategic
8,402,2017-06-27,Matt Bush,596019.0,456713.0,CLE,TEX,1.0,0.0,,...,,0.0,9.0,Bot,1.0,2.0,1.0,2.0,Standard,Strategic
9,413,2017-06-27,Matt Bush,596019.0,456713.0,CLE,TEX,0.0,0.0,,...,,0.0,9.0,Bot,1.0,2.0,1.0,2.0,Standard,Strategic


### Event variables ###
Event variables describe what happened on the play. It's debateable what goes in this category. I've included variables related to the outcome of the pitch and a little bit of information about the contact.
* What was the game state after the pitch happened?
* Did the number of outs change?
* Did the batter get on base?
* Did the score change?

In [4]:
eventVariables = data.copy()
eventVariables = eventVariables[['index',
 'batter',
 'pitcher',
 'events',
 'description',
 'des',
 'hit_location',
 'bb_type',
 'hit_distance_sc',
 'home_score',
 'away_score',
 'bat_score',
 'fld_score',
 'post_away_score',
 'post_home_score',
 'post_bat_score',
 'post_fld_score',
 ]]
eventVariables.head(10)

Unnamed: 0,index,batter,pitcher,events,description,des,hit_location,bb_type,hit_distance_sc,home_score,away_score,bat_score,fld_score,post_away_score,post_home_score,post_bat_score,post_fld_score
0,313,608070.0,456713.0,field_out,hit_into_play,Jose Ramirez lines out to left fielder Nomar M...,7.0,line_drive,314.0,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0
1,324,429665.0,456713.0,field_out,hit_into_play,Edwin Encarnacion pops out to second baseman R...,4.0,popup,190.0,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0
2,335,429665.0,456713.0,,ball,,,,,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0
3,346,429665.0,456713.0,,called_strike,,,,,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0
4,359,488726.0,456713.0,single,hit_into_play_no_out,Michael Brantley singles on a line drive to le...,7.0,line_drive,180.0,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0
5,367,488726.0,456713.0,,called_strike,,,,,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0
6,380,488726.0,456713.0,,ball,,,,,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0
7,389,596019.0,456713.0,field_out,hit_into_play,Francisco Lindor lines out to center fielder D...,8.0,line_drive,348.0,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0
8,402,596019.0,456713.0,,foul,,,,,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0
9,413,596019.0,456713.0,,ball,,,,,1.0,2.0,1.0,2.0,2.0,1.0,1.0,2.0


## Pitch variables ##
What did the pitch look like, including the 
* release position (x,y,z) from the pitcher
* the position (x,y,z) where the pitched crossed the plate
* zone
* speed
* spin
* pitch type.

In [5]:
pitchVariables = data.copy()
pitchVariables = pitchVariables[['index',
 'pitch_type',
 'release_speed',
 'release_pos_x',
 'release_pos_y',                                
 'release_pos_z',
 'player_name',
 'pitcher',
 'zone',
 'type',
 'pfx_x',
 'pfx_z',
 'plate_x',
 'plate_z',
 'hc_x',
 'hc_y',
 'sv_id',
 'vx0',
 'vy0',
 'vz0',
 'ax',
 'ay',
 'az',
 'effective_speed',
 'release_spin_rate',
 'release_extension',
 'sz_top',
 'sz_bot'
 ]]
print(pitchVariables.loc[3])

index                          346
pitch_type                      FC
release_speed                 90.9
release_pos_x              -1.4572
release_pos_y              54.7408
release_pos_z               5.2523
player_name              Matt Bush
pitcher                     456713
zone                            14
type                             S
pfx_x                        0.463
pfx_z                       0.7223
plate_x                      0.903
plate_z                     2.5355
hc_x                           NaN
hc_y                           NaN
sv_id                170628_020003
vx0                         4.8957
vy0                       -132.122
vz0                        -2.4148
ax                          4.3641
ay                         26.3037
az                         -23.593
effective_speed             90.183
release_spin_rate             2617
release_extension            5.759
sz_top                      3.7291
sz_bot                      1.7681
Name: 3, dtype: obje

### Questions: ###
Select two games that the Rockies played in 2017, one at home and one at Petco Park in San Diego. 
1. How many of each type of pitch did Rockies pitchers throw in each game?
2. If time, how did the movement on pitches at Coors Field compare to the movement at Petco? It's enough to compare the means of the movement in the x,z directions on different pitch types.

In [75]:
import pandas as pd
datahome = statcast(start_dt='2017-04-07', end_dt='2017-04-08')
dataaway = statcast(start_dt='2017-06-27', end_dt='2017-06-28')

In [76]:
datahome = datahome.loc[(datahome["home_team"] == "COL") & 
                        (datahome["game_date"] == pd.Timestamp("2017-04-7"))]
dataaway = dataaway.loc[(dataaway["away_team"] == "COL") &
                        (dataaway["game_date"] == pd.Timestamp("2017-06-27"))]

In [77]:
rockieshome = datahome.loc[datahome["inning_topbot"] == "Top"]
rockiespetco = dataaway.loc[dataaway["inning_topbot"] == "Bot"]

homepitches = rockieshome.groupby("pitch_type")["pitch_type"].count()
print(homepitches)
petcopitches = rockiespetco.groupby("pitch_type")["pitch_type"].count()
print(petcopitches)

pitch_type
CH    12
FF    61
FT    22
SL    32
Name: pitch_type, dtype: int64
pitch_type
CH    20
CU    22
FC     6
FF    91
SI    26
SL    28
Name: pitch_type, dtype: int64


In [78]:

datahome["delta_x"] = (datahome["release_pos_x"] - datahome["plate_x"]).abs()
datahome["delta_z"] = (datahome["release_pos_z"] - datahome["plate_z"]).abs()
dataaway["delta_x"] = (dataaway["release_pos_x"] - dataaway["plate_x"]).abs()
dataaway["delta_z"] = (dataaway["release_pos_z"] - dataaway["plate_z"]).abs()

homebytype = datahome.groupby("pitch_type")
awaybytype = dataaway.groupby("pitch_type")

print("home movement")
print(homebytype["delta_x"].mean())
print(homebytype["delta_z"].mean())

print("petco movement")
print(awaybytype["delta_x"].mean())
print(awaybytype["delta_z"].mean())

home movement
pitch_type
CH    2.083196
CU    2.630238
FC    1.885640
FF    2.377933
FT    2.410357
KC    2.131487
SL    2.908103
Name: delta_x, dtype: float64
pitch_type
CH    3.782716
CU    3.775938
FC    3.200107
FF    2.745113
FT    3.079289
KC    4.864813
SL    3.585143
Name: delta_z, dtype: float64
petco movement
pitch_type
CH    1.728160
CU    2.610424
FC    2.737756
FF    2.177034
FT    2.050956
KC    0.719020
SI    1.951157
SL    2.666175
UN    7.416400
Name: delta_x, dtype: float64
pitch_type
CH    3.743383
CU    4.466805
FC    4.151028
FF    3.811435
FT    3.382456
KC    5.579040
SI    3.423659
SL    4.235492
UN    0.160300
Name: delta_z, dtype: float64
