# Statcast - Recording every pitch #
Statcast is a high-speed camera and analytics system installed in MLB parks to analyze player movements. As of 2015, Statcast is installed in every MLB park. Prior to 2015, Statcast data was available in select parks only. The Statcast data gives us information on every pitch in a game. We'll use pybaseball to get Statcast pitching data. There is even more Statcast data available, but we'll get to that later.

A few things to note: pybaseball is open source and it appears to be supported by one person who loves baseball and python. It's sometimes a little buggy. When queries fail, or timeout, they just fail and there isn't always a good explanation of why it failed. The documentation might not be entirely perfect. Sometimes queries listed in the docs don't work exactly the way they're described. 

First, import statcast and get some data data.

In [1]:
from pybaseball import statcast
data = statcast(start_dt='2017-06-25', end_dt='2017-06-26')
data.head(2)

Unnamed: 0,index,pitch_type,game_date,release_speed,release_pos_x,release_pos_z,player_name,batter,pitcher,events,...,home_score,away_score,bat_score,fld_score,post_away_score,post_home_score,post_bat_score,post_fld_score,if_fielding_alignment,of_fielding_alignment
0,0,SL,2017-06-26,83.8,-1.434,5.6087,Nick Goody,518902.0,580792.0,strikeout,...,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0,Standard,Strategic
1,1,FF,2017-06-26,92.7,-0.9435,5.7528,Nick Goody,518902.0,580792.0,,...,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0,Standard,Standard


### Documentation on Statcast data ###
The query returns a considerable amount of data. There are 90 separate variables recorded for each pitch. There are multiple sources for Statcast data: 
* <a href="https://baseballsavant.mlb.com/csv-docs" target="_blank">Statcast docs on Baseball Savant</a>
* <a href="https://fastballs.wordpress.com/category/pitchfx-glossary/" target="_blank">PITCH f/X</a>

### Examples of pitch types ###
It's easier to visualize movement on a pitch if you're seen examples of different types of pitches. Here are a few YouTube videos that show examples of pitches.
* <a href="https://www.youtube.com/watch?v=v5oDTxcex0k" target="_blank">Fastballs</a>
* <a href="https://www.youtube.com/watch?v=1jSRELmxaqM" target="_blank">Breaking balls</a>
* <a href="https://www.youtube.com/watch?v=AAcupVhr8Kc" target="_blank">Sliders</a>

We can look at the list of variables names first and then it might be easier to think of them in different categories, such as variables that describe the state of the game, what happened on the play, the pitch, the pitcher, and the batter.

In [2]:
list(data)

['index',
 'pitch_type',
 'game_date',
 'release_speed',
 'release_pos_x',
 'release_pos_z',
 'player_name',
 'batter',
 'pitcher',
 'events',
 'description',
 'spin_dir',
 'spin_rate_deprecated',
 'break_angle_deprecated',
 'break_length_deprecated',
 'zone',
 'des',
 'game_type',
 'stand',
 'p_throws',
 'home_team',
 'away_team',
 'type',
 'hit_location',
 'bb_type',
 'balls',
 'strikes',
 'game_year',
 'pfx_x',
 'pfx_z',
 'plate_x',
 'plate_z',
 'on_3b',
 'on_2b',
 'on_1b',
 'outs_when_up',
 'inning',
 'inning_topbot',
 'hc_x',
 'hc_y',
 'tfs_deprecated',
 'tfs_zulu_deprecated',
 'fielder_2',
 'umpire',
 'sv_id',
 'vx0',
 'vy0',
 'vz0',
 'ax',
 'ay',
 'az',
 'sz_top',
 'sz_bot',
 'hit_distance_sc',
 'launch_speed',
 'launch_angle',
 'effective_speed',
 'release_spin_rate',
 'release_extension',
 'game_pk',
 'pitcher.1',
 'fielder_2.1',
 'fielder_3',
 'fielder_4',
 'fielder_5',
 'fielder_6',
 'fielder_7',
 'fielder_8',
 'fielder_9',
 'release_pos_y',
 'estimated_ba_using_speedangle',

### Game-state variables ###
There are some variables related to the state of the game at the time of the pitch. Game state variables include information such as:
* Who is batting
* Who is pitching
* How many outs
* What inning
* Runners on base
* Score

In [3]:
gameState = data.copy()
gameState = gameState[['index',
 'game_date',
 'player_name',
 'batter',
 'pitcher',
 'home_team',
 'away_team',
 'balls',
 'strikes',
 'on_3b',
 'on_2b',
 'on_1b',
 'outs_when_up',
 'inning',
 'inning_topbot',
 'home_score',
 'away_score',
 'bat_score',
 'fld_score',
 'if_fielding_alignment',
 'of_fielding_alignment']]

gameState.head(10)

Unnamed: 0,index,game_date,player_name,batter,pitcher,home_team,away_team,balls,strikes,on_3b,...,on_1b,outs_when_up,inning,inning_topbot,home_score,away_score,bat_score,fld_score,if_fielding_alignment,of_fielding_alignment
0,0,2017-06-26,Nick Goody,518902.0,580792.0,CLE,TEX,2.0,2.0,,...,462101.0,2.0,9.0,Top,15.0,9.0,9.0,15.0,Standard,Strategic
1,1,2017-06-26,Nick Goody,518902.0,580792.0,CLE,TEX,2.0,2.0,,...,462101.0,2.0,9.0,Top,15.0,9.0,9.0,15.0,Standard,Standard
2,2,2017-06-26,Nick Goody,518902.0,580792.0,CLE,TEX,1.0,2.0,,...,462101.0,2.0,9.0,Top,15.0,9.0,9.0,15.0,Standard,Standard
3,3,2017-06-26,Nick Goody,518902.0,580792.0,CLE,TEX,1.0,2.0,,...,462101.0,2.0,9.0,Top,15.0,9.0,9.0,15.0,Standard,Standard
4,4,2017-06-26,Nick Goody,518902.0,580792.0,CLE,TEX,1.0,2.0,,...,462101.0,2.0,9.0,Top,15.0,9.0,9.0,15.0,,
5,5,2017-06-26,Nick Goody,518902.0,580792.0,CLE,TEX,1.0,1.0,,...,462101.0,2.0,9.0,Top,15.0,9.0,9.0,15.0,Standard,Standard
6,6,2017-06-26,Nick Goody,518902.0,580792.0,CLE,TEX,1.0,0.0,,...,462101.0,2.0,9.0,Top,15.0,9.0,9.0,15.0,Standard,Standard
7,7,2017-06-26,Nick Goody,518902.0,580792.0,CLE,TEX,0.0,0.0,,...,462101.0,2.0,9.0,Top,15.0,9.0,9.0,15.0,Standard,Standard
8,8,2017-06-26,Nick Goody,608577.0,580792.0,CLE,TEX,1.0,1.0,,...,462101.0,1.0,9.0,Top,15.0,9.0,9.0,15.0,Standard,Standard
9,9,2017-06-26,Nick Goody,608577.0,580792.0,CLE,TEX,0.0,1.0,,...,462101.0,1.0,9.0,Top,15.0,9.0,9.0,15.0,Standard,Standard


### Event variables ###
Event variables describe what happened on the play. It's debateable what goes in this category. I've included variables related to the outcome of the pitch and a little bit of information about the contact.
* What was the game state after the pitch happened?
* Did the number of outs change?
* Did the batter get on base?
* Did the score change?

In [4]:
eventVariables = data.copy()
eventVariables = eventVariables[['index',
 'batter',
 'pitcher',
 'events',
 'description',
 'des',
 'hit_location',
 'bb_type',
 'hit_distance_sc',
 'home_score',
 'away_score',
 'bat_score',
 'fld_score',
 'post_away_score',
 'post_home_score',
 'post_bat_score',
 'post_fld_score',
 ]]
eventVariables.head(10)

Unnamed: 0,index,batter,pitcher,events,description,des,hit_location,bb_type,hit_distance_sc,home_score,away_score,bat_score,fld_score,post_away_score,post_home_score,post_bat_score,post_fld_score
0,0,518902.0,580792.0,strikeout,swinging_strike,Pete Kozma strikes out swinging.,2.0,,,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0
1,1,518902.0,580792.0,,foul,,,,9.0,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0
2,2,518902.0,580792.0,,blocked_ball,,,,,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0
3,3,518902.0,580792.0,,foul,,,,60.0,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0
4,4,518902.0,580792.0,,foul,,,,152.0,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0
5,5,518902.0,580792.0,,swinging_strike,,,,,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0
6,6,518902.0,580792.0,,foul,,,,7.0,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0
7,7,518902.0,580792.0,,ball,,,,,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0
8,8,608577.0,580792.0,field_out,hit_into_play,Nomar Mazara flies out to center fielder Bradl...,8.0,fly_ball,310.0,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0
9,9,608577.0,580792.0,,blocked_ball,,,,,15.0,9.0,9.0,15.0,9.0,15.0,9.0,15.0


## Pitch variables ##
What did the pitch look like, including the 
* release position (x,y,z) from the pitcher
* the position (x,y,z) where the pitched crossed the plate
* zone
* speed
* spin
* pitch type.

In [5]:
pitchVariables = data.copy()
pitchVariables = pitchVariables[['index',
 'pitch_type',
 'release_speed',
 'release_pos_x',
 'release_pos_y',                                
 'release_pos_z',
 'player_name',
 'pitcher',
 'zone',
 'type',
 'pfx_x',
 'pfx_z',
 'plate_x',
 'plate_z',
 'hc_x',
 'hc_y',
 'sv_id',
 'vx0',
 'vy0',
 'vz0',
 'ax',
 'ay',
 'az',
 'effective_speed',
 'release_spin_rate',
 'release_extension',
 'sz_top',
 'sz_bot'
 ]]
print(pitchVariables.loc[2])

index                            2
pitch_type                      SL
release_speed                 83.1
release_pos_x              -1.5837
release_pos_y              54.5722
release_pos_z               5.6419
player_name             Nick Goody
pitcher                     580792
zone                            14
type                             B
pfx_x                       0.4888
pfx_z                        0.219
plate_x                     0.9235
plate_z                     0.6739
hc_x                           NaN
hc_y                           NaN
sv_id                170627_024621
vx0                         4.7625
vy0                        -120.67
vz0                        -5.2814
ax                          3.7911
ay                         23.2674
az                        -29.2793
effective_speed             82.425
release_spin_rate             2285
release_extension            5.928
sz_top                      3.5145
sz_bot                      1.5375
Name: 2, dtype: obje

### Questions: ###
Select two games that the Rockies played in 2017, one at home and one at Petco Park in San Diego. 
1. How many of each type of pitch did Rockies pitchers throw in each game?
2. If time, how did the movement on pitches at Coors Field compare to the movement at Petco? It's enough to compare the means of the movement in the x,z directions on different pitch types.

In [6]:
gameCOL = data.loc[(data["away_team"]=='COL')&(data["inning_topbot"]=="Bot")].copy()
p = gameCOL.groupby(['game_date','pitch_type']).size()
print(p)

game_date   pitch_type
2017-06-25  CH              7
            CU              8
            FC             14
            FF            112
            FT             14
            SL             42
2017-06-26  CH              3
            CU             23
            FF            112
            FT              4
            SI              9
            SL             19
dtype: int64
