-
Notifications
You must be signed in to change notification settings - Fork 317
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Statcast pitcher spin rate fix (#64)
Add statcast_pitcher_spin method with testing statcast_pitcher_spin extends statcast_pitcher by adding spin data back into the file, replacing the deprecated spin columns. The math and physics behind the calculations were modeled off of Professor Alan Nathan's work at the University of Illinois. I have no impression of what information was in the original spin columns before they were deprecated, but they now have the magnitude of movement cause by spin in the X and Z directions ('Mx' and 'Mz') as well as the axis of rotation ('phi').
- Loading branch information
Showing
8 changed files
with
9,648 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Statcast Pitcher Spin | ||
`statcast_pitcher(start_dt=[yesterday's date], end_dt=None, player_id)` | ||
|
||
The statcast function retrieves pitch-level statcast data for a given date or range or dates and calculates spin related metrics. | ||
|
||
## Arguments | ||
`start_dt:` first day for which you want to retrieve data. Defaults to yesterday's date if nothing is entered. If you only want data for one date, supply a `start_dt` value but not an `end_dt` value. Format: YYYY-MM-DD. | ||
|
||
`end_dt:` last day for which you want to retrieve data. Defaults to None. If you want to retrieve data for more than one day, both a `start_dt` and `end_dt` value must be given. Format: YYYY-MM-DD. | ||
|
||
`player_id:` MLBAM player ID for the pitcher you want to retrieve data for. To find a player's MLBAM ID, see the function [playerid_lookup](http://github.com/jldbc/pybaseball/docs/playerid_lookup.md) or the examples below. | ||
|
||
### Added Return Columns | ||
`Mx`: The amount of movement in the x-direction due to the Magnus effect alone. (Positive is towards first base/catcher's right) | ||
|
||
`Mz`: The amount of movement in the z-direction due to the Magnus effect alone. (Positive is upwards) | ||
|
||
`theta`: The angle of the spin axis with respect to it's movement between 0 and 90. A 0 angle means the spin axis is perpendicular to it's movement (it's all 'useful' spin with regards to the Magnus effect); 90 means the spin axis is parallel to it's direction (like a gyroball). Pitches | ||
|
||
`phi`: The angle of the spin axis in the x-z plane oriented to the x-axis. More colloquially, the axis the ball is spinning from the catcher's eye. | ||
|
||
### Notes | ||
- This method piggybacks off of the `statcast_pitcher` method and is therefore prone to any issue or bug in it. | ||
- The method's calcuations were modeled from the work of Professor Alan Nathan of the University of Illinois. | ||
- The axes referred to are the PITCHf/x coordinate system, where the origin is home plate, the x-axis points to the catcher's right, the y-axis, towards the mound, and the z-axis, upward. So a pitch generally moves in the -y direction. | ||
- These calculations are sensitive to the environment (temperature, barometic pressure, humidity, altitude, windspeed, etc.). These calculations are all done as if the pitches were thrown at Tropicana Field, which has no wind and a constant temperature of 70 degrees. | ||
|
||
|
||
## Examples of valid queries | ||
|
||
```python | ||
from pybaseball import statcast_pitcher_spin | ||
from pybaseball import playerid_lookup | ||
|
||
# find Chris Sale's player id (mlbam_key) | ||
playerid_lookup('darvish','yu') | ||
|
||
# get all available data within date range | ||
data = statcast_pitcher_spin('2019-07-01', '2019-07-31', player_id = 506433) | ||
|
||
# get data for July 15th, 2017 | ||
data = statcast_pitcher_spin('2019-05-03', player_id = 543294) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
"""Statcast Pitcher Spin | ||
These calculations are based on the work by Prof. Alan Nathan of the University | ||
of Illinois. | ||
Article: http://baseball.physics.illinois.edu/trackman/SpinAxis.pdf | ||
Excel Workbook: http://baseball.physics.illinois.edu/trackman/MovementSpinEfficiencyTemplate-v2.xlsx | ||
""" | ||
|
||
from pybaseball import statcast_pitcher | ||
import pandas as pd | ||
import numpy as np | ||
|
||
K = .005383 # Environmental Constant | ||
DISTANCE_FROM_HOME_TO_MOUND = 60.5 | ||
DISTANCE_TO_PLATE_AT_VELOCITY_CAPTURE = 50 | ||
Y_VALUE_AT_FINAL_MEASUREMENT = 17/12 | ||
GRAVITATIONAL_ACCELERATION = 32.174 | ||
|
||
|
||
def statcast_pitcher_spin(start_dt=None, end_dt=None, player_id=None): | ||
pitcher_data = statcast_pitcher(start_dt, end_dt, player_id) | ||
|
||
spin_df = pitcher_data[[ | ||
'release_extension', 'vx0', 'vy0', 'vz0', 'ax', | ||
'ay', 'az', 'release_spin_rate']].copy() | ||
|
||
spin_df = find_intermediate_values(spin_df) | ||
|
||
pitcher_data[['Mx', 'Mz', 'phi', 'theta']] = spin_df[[ | ||
'Mx', 'Mz', 'phi', 'theta']].copy() | ||
|
||
return pitcher_data | ||
|
||
# def get_statcast_pither_test_data(): | ||
# df = pd.read_csv("tests/statcast_pitching_test_data.csv") | ||
# return df | ||
|
||
|
||
def find_intermediate_values(spin_df): | ||
"""Calls each intermediate function in sequence""" | ||
spin_df = find_release_point(spin_df) | ||
spin_df = find_release_time(spin_df) | ||
spin_df = find_release_velocity_components(spin_df) | ||
spin_df = find_flight_time(spin_df) | ||
spin_df = find_average_velocity_components(spin_df) | ||
spin_df = find_average_velocity(spin_df) | ||
spin_df = find_average_drag(spin_df) | ||
spin_df = find_magnus_acceleration_magnitude(spin_df) | ||
spin_df = find_average_magnus_acceleration(spin_df) | ||
spin_df = find_magnus_magnitude(spin_df) | ||
spin_df = find_phi(spin_df) | ||
spin_df = find_lift_coefficient(spin_df) | ||
spin_df = find_spin_factor(spin_df) | ||
spin_df = find_transverse_spin(spin_df) | ||
spin_df = find_spin_efficiency(spin_df) | ||
spin_df = find_theta(spin_df) | ||
|
||
return spin_df | ||
|
||
|
||
def find_release_point(df): | ||
df['yR'] = (DISTANCE_FROM_HOME_TO_MOUND - df['release_extension']) | ||
return df | ||
|
||
|
||
def find_release_time(df): | ||
df['tR'] = time_duration( | ||
df['yR'], | ||
df['vy0'], | ||
df['ay'], | ||
DISTANCE_TO_PLATE_AT_VELOCITY_CAPTURE, | ||
False) | ||
return df | ||
|
||
|
||
def find_release_velocity_components(df): | ||
df['vxR'] = (df['vx0'] + (df['ax'] * df['tR'])) | ||
df['vyR'] = (df['vy0'] + (df['ay'] * df['tR'])) | ||
df['vzR'] = (df['vz0'] + (df['az'] * df['tR'])) | ||
return df | ||
|
||
|
||
def find_flight_time(df): | ||
df['tf'] = time_duration( | ||
df['yR'], | ||
df['vyR'], | ||
df['ay'], | ||
Y_VALUE_AT_FINAL_MEASUREMENT, | ||
True) | ||
return df | ||
|
||
|
||
def find_average_velocity_components(df): | ||
df['vxbar'] = (2*df['vxR'] + df['ax']*df['tf'])/2 | ||
df['vybar'] = (2*df['vyR'] + df['ay']*df['tf'])/2 | ||
df['vzbar'] = (2*df['vzR'] + df['az']*df['tf'])/2 | ||
return df | ||
|
||
|
||
def find_average_velocity(df): | ||
df['vbar'] = three_comp_average(df['vxbar'], df['vybar'], df['vzbar']) | ||
return df | ||
|
||
|
||
def find_average_drag(df): | ||
df['adrag'] = (-(df['ax']*df['vxbar'] + df['ay']*df['vybar'] + (df['az'] + GRAVITATIONAL_ACCELERATION)*df['vzbar'])/ df['vbar']) | ||
return df | ||
|
||
|
||
def find_magnus_acceleration_magnitude(df): | ||
df['amagx'] = (df['ax'] + df['adrag']*df['vxbar']/df['vbar']) | ||
df['amagy'] = (df['ay'] + df['adrag']*df['vybar']/df['vbar']) | ||
df['amagz'] = (df['az'] + df['adrag']*df['vzbar']/df['vbar'] + GRAVITATIONAL_ACCELERATION) | ||
return df | ||
|
||
|
||
def find_average_magnus_acceleration(df): | ||
df['amag'] = three_comp_average(df['amagx'], df['amagy'], df['amagz']) | ||
return df | ||
|
||
|
||
def find_magnus_magnitude(df): | ||
df['Mx'] = (6 * df['amagx'] * (df['tf']**2)) | ||
df['Mz'] = (6 * df['amagz'] * (df['tf']**2)) | ||
return df | ||
|
||
|
||
def find_phi(df): | ||
df['phi'] = np.where( | ||
df['amagz'] > 0, | ||
np.arctan2(df['amagz'], df['amagx'])*180/np.pi, | ||
360 + np.arctan2(df['amagz'], df['amagx'])*180/np.pi) + 90 | ||
|
||
df['phi'] = df['phi'].round(0).astype('int64') | ||
return df | ||
|
||
|
||
def find_lift_coefficient(df): | ||
df['Cl'] = (df['amag']/(K*df['vbar']**2)) | ||
return df | ||
|
||
|
||
def find_spin_factor(df): | ||
"""Function to find spin factor | ||
Spin Factor formula was derived from a regression of experimental data. The | ||
formula below appears in the excel worksheet cited at the top of the file. | ||
No explanation is given for the constant values included. | ||
""" | ||
df['S'] = (0.166*np.log(0.336/(0.336-df['Cl']))) | ||
return df | ||
|
||
|
||
def find_transverse_spin(df): | ||
df['spinT'] = (78.92*df['S']*df['vbar']) | ||
return df | ||
|
||
|
||
def find_spin_efficiency(df): | ||
df['spin eff'] = df['spinT']/df['release_spin_rate'] | ||
return df | ||
|
||
|
||
def find_theta(df): | ||
df['theta'] = df['spin eff'].where( | ||
(df['spin eff'] >= -1.0) & (df['spin eff'] <= 1.0), | ||
np.nan) | ||
df['theta'] = df['theta'].where( | ||
df['theta'].isna(), | ||
np.arccos(df['theta']) * 180/np.pi).round(0) | ||
return df | ||
|
||
|
||
# HELPERS | ||
def time_duration(s, v, acc, adj, forward): | ||
""" | ||
Finds flight time given an original position, velocity, accelaration, and target position. | ||
Direction does not affect the time duration. It helps assign a positive or negative | ||
value to the flight time. | ||
s = (pd.Series) spacial point at known time | ||
v = (pd.Series) velocity at known time | ||
acc = (pd.Series) acceleration | ||
adj = (pd.Series) spatial difference between known and unknown points | ||
forward = (bool) indicating whether space_diff is in the positive or negative y direction | ||
""" | ||
return (-v - np.sqrt(v**2 - 2*acc*((1 if forward else -1) * (s-adj)))) / acc | ||
|
||
|
||
def three_comp_average(comp1, comp2, comp3): | ||
return np.sqrt(comp1**2 + comp2**2 + comp3**2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.