In [2]:
import pandas as pd
import numpy as np
import seaborn as sns


#### Goal

For our final project I would like to look at baseball data obtained from the Trackman radar system at Pomona-Pitzer baseball games. The data has every pitch tracked in every Pomona-Pitze home game plus some pitches in scrimmages which accounts for about 7000 pitches.

The point of interest from these datasets is predicting what pitch was thrown based on its movement, velocity, spinrate, and a few other determining features. Currently there is a person working the Trackman at every game who writes in what pitch they believe to be thrown from watching the game. This is not terrible in terms of accuracy the person tends to get the pitch type correct most times but it is a task that can be automated. There is also a built in AutoPitchType detector the Trackman produces from what it "thinks" is thrown, but these labels are even less accurate.


Using the data with the human provided labels we would like to test out decision trees, random forests, KNN, and nueral networks to find the best way to label a pitch.

In [11]:
bsbl = pd.read_csv('bsbl.csv')

#### Initial Cleaning

bsbl is a dataframe that contains data from every pitch from Pomona-Pitzer home games. The features include data about, who is pitching and hitting, the results of each pitch ie. ball, strike, or if the ball was hit, how fast the ball was pitched and how much it moved from release, and even the velocity of a hit ball and the rate at which the ball was spinning. In total there are 166 features that get tracked per pitch.

For our purposes we will look at only:
- *RelSpeed* - Speed of pitch, reported in miles per hour, when it leaves the pitcher’s hand

- *SpinRate* - How fast the ball is spinning as it leaves the pitcher’s hand, reported in revolutions per minute.
- *RelHeight* -  Height, reported in feet, above home plate at which the pitcher releases the ball.
- *ZoneSpeed* - Speed of the pitch, measured in miles per hour, as it crosses the front of home plate.
- *ZoneTime* -  Amount of time elapsed from pitcher’s release until it crosses the front of home plate. Also, may be referred to as “batter reaction time.”
- *Extension* - The distance, reported in feet, from which the pitcher releases the ball relative to the pitching rubber.
- *VertRelAngle*  Initial vertical (up-down) direction of the ball when it leaves the pitcher’s hand, reported in degrees. A positive number means the ball is released upward.
- *HorzRelAngle* -  Initial horizontal (left-right) direction of the ball when it leaves the pitcher’s hand, reported in degrees. A positive number means the ball is released to the right from the pitcher’s perspective.
- *SpinAxis* - Direction the ball is spinning, reported in degrees of tilt, A ball thrown with a spin axis of 0 has pure top spin and would break downward.
- *RelSide* - Distance from the center of the rubber, reported in feet, at which the pitcher releases the ball. Balls thrown from the right side of the mound from the pitcher’s perspective will have a positive number.
- *VertBreak* - Distance, measured in inches, between where the pitch actually crosses the front of home plate height-wise, and where it would have crossed home plate height-wise if had it traveled in a perfectly straight line from release, completely unaffected by gravity.  
- *InducedVertBreak* - Distance, measured in inches, between where the pitch actually crosses the front of home plate height-wise, and where it would have crossed home plate height-wise if had it traveled in a perfectly straight line from release, but affected by gravity. 
- *HorzBreak* - Distance, measured in inches, between where the pitch actually crosses the front of home plate side-wise, and where it would have crossed home plate side-wise if had it traveled in a perfectly straight line from release. A positive number means the break was to the right from the pitcher’s perspective.
- *PlateLocHeight* - The height of the ball relative to home plate, measured in feet, as the ball crosses the front of the plate. 
- *PlateLocSide* - Distance from the center of the plate to the ball, measured in feet, as it crosses the front of the plate. Negative numbers are to the left of center from the pitcher’s perspective (outside to a right-handed batter).
- *VertApprAngle* -  How steeply up or down the ball enters the zone, reported as the angle in degrees, as the pitch crosses the front of home plate. A negative number means it is sloping downward.
- *HorzApprAngle* - Left-right direction at which a pitched ball crosses the front of home plate, reported as an angle. A negative number means that the ball is moving from right to left from the pitcher’s perspective (away from a right-handed batter) as it enters the zone.




In [12]:




#filter = bsbl[['Date', 'Pitcher', 'RelSpeed', 'SpinRate', 'RelHeight', 'ZoneSpeed','ZoneTime', 'EffectiveVelo', 'SpeedDrop', 'Extension','PitcherThrows', 'TaggedPitchType', 'PitchCall', 'KorBB', 'TaggedHitType', 'PlayResult', 'VertRelAngle',  'HorzRelAngle', 'SpinAxis', 'Tilt',  'RelSide', 'VertBreak', 'InducedVertBreak','HorzBreak', 'PlateLocHeight', 'PlateLocSide', 'VertApprAngle', 'HorzApprAngle', 'HitSpinAxis', 'HangTime', 'Bearing', 'Distance', 'HitSpinRate','Direction', 'Angle', 'ExitSpeed',]]
filter = bsbl[['PitcherThrows','RelSpeed', 'SpinRate', 'RelHeight', 'ZoneSpeed','ZoneTime', 'Extension', 'VertRelAngle',  'HorzRelAngle', 'SpinAxis', 'RelSide', 'VertBreak', 'InducedVertBreak','HorzBreak', 'PlateLocHeight', 'PlateLocSide', 'VertApprAngle', 'HorzApprAngle','TaggedPitchType',]]

We then separate left handed pitchers from right. We do this because opposite handed pitchers spin the ball on an opposite axis so the same pitches break in different directions for different handed pitchers. We aslo keep track of what pitches are being tagged.

In [13]:
lhp = filter.loc[filter['PitcherThrows']== 'Left']
rhp = filter.loc[filter['PitcherThrows']== 'Right']


pitches =  filter.loc[:,"TaggedPitchType"]

We  then make a function to give every unique pitch a unique number ie. Fastball : 1, Slider : 2. We do this to give every string label a corresponding value that is the changeLabel function

In [14]:
def makePitchSet():
    pitchSet = {'Fastball' : 0}
    count = 1
    for pitch in pitches:
        if pitch in pitchSet:
            continue
        else:
            pitchSet[pitch] = count
            count += 1

    return pitchSet

pitchSet = makePitchSet()

def changeLabel(pitch):
    """ return the species index (a unique integer/category) """
    #print(f"converting {speciesname}...")
    return pitchSet[pitch]

Finally we apply our changeLabel function and write our left and right handed pitcher dataframes to a csv

In [15]:
lhp['TaggedPitchNum'] = lhp['TaggedPitchType'].apply(changeLabel)
rhp['TaggedPitchNum'] = rhp['TaggedPitchType'].apply(changeLabel)

lhp = lhp.drop('TaggedPitchType', axis =1 )
rhp = rhp.drop('TaggedPitchType', axis = 1)

# lhp = lhp.drop('Pitcher', axis =1 )
# rhp = rhp.drop('Pitcher', axis = 1)
rhp.to_csv('rhp.csv', index_label=False)
lhp.to_csv('lhp.csv', index_label=False)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lhp['TaggedPitchNum'] = lhp['TaggedPitchType'].apply(changeLabel)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rhp['TaggedPitchNum'] = rhp['TaggedPitchType'].apply(changeLabel)
