# Optimal Launch Angle and Barrels
For our 2024 WISD Hackathon project, we decided to analyze the ideal launch angle of the data given, as it relates to barrels. We did this by graphing out the bat path and trajectory for ideal swings.

In [1]:
import os
import json
import pandas as pd

### Load data

In [2]:
data_dir = 'anonymized-files-wisd'
data_list = []

for filename in os.listdir(data_dir):
    with open(os.path.join(data_dir, filename), 'r') as file:
        for line in file:
            data_list.append(json.loads(line))

In [3]:
df = pd.json_normalize(data_list)
df.head()

Unnamed: 0,events,samples_ball,samples_bat,units.length,units.velocity,units.acceleration,units.angle,summary_acts.pitch.eventId,summary_acts.pitch.type,summary_acts.pitch.result,...,summary_score.runs.game.team2,summary_score.runs.innings,summary_score.runs.play,summary_score.outs.inning,summary_score.outs.play,summary_score.count.balls.plateAppearance,summary_score.count.balls.play,summary_score.count.strikes.plateAppearance,summary_score.count.strikes.play,summary_acts.hit.eventId
0,[],"[{'time': 0.0206462, 'pos': [-2.78340062072433...",[{'event': 'No'}],foot,mph,mph/s,degree,54e48da8-67a0-46a4-bf51-c75170a07411,Sinker,Strike,...,0,"[{'team1': 0, 'team2': 0}, {'team1': 0, 'team2...",0,2,0,0,0,0,1,
1,[],"[{'time': 0.00387, 'pos': [-3.0194018498888253...","[{'event': 'First', 'time': -0.4092641, 'head'...",foot,mph,mph/s,degree,f1c5834c-ba80-419b-883f-56b665cb2e79,Sinker,Strike,...,0,"[{'team1': 0, 'team2': 0}, {'team1': 0, 'team2...",0,0,0,0,0,1,1,
2,"[{'start': {'angle': [27.482588965436427, 65.8...","[{'time': 0.0069402, 'pos': [-2.59684931578240...","[{'event': 'First', 'time': -0.3827868, 'head'...",foot,mph,mph/s,degree,ac29b4ab-63bc-4672-a29c-f4517fd03c85,Changeup,HitIntoPlay,...,0,"[{'team1': 0, 'team2': 0}, {'team1': 0, 'team2...",0,1,1,2,0,1,0,daf9742c-869c-4370-9d66-59e217be1c89
3,"[{'start': {'angle': [37.20027897884712, -9.14...","[{'time': 0.0226918, 'pos': [-2.26856730788878...","[{'event': 'First', 'time': -0.4404741, 'head'...",foot,mph,mph/s,degree,8efde6c7-6ab0-40aa-a197-c1ad42bb7ee7,FourSeamFastball,HitIntoPlay,...,0,"[{'team1': 0, 'team2': 0}, {'team1': 0, 'team2...",1,1,0,2,0,2,0,62848ee8-bff2-4410-9c61-eb672c283a60
4,[],"[{'time': 0.0197676, 'pos': [-3.41602776535463...","[{'event': 'First', 'time': -0.3568664, 'head'...",foot,mph,mph/s,degree,b4727ec0-5df2-48ae-baab-1dea20f53f15,Curveball,Strike,...,0,"[{'team1': 0, 'team2': 0}]",0,2,0,0,0,1,1,


In [None]:
len(df)

1251

In [None]:
df.dtypes

events                                          object
samples_ball                                    object
samples_bat                                     object
units.length                                    object
units.velocity                                  object
units.acceleration                              object
units.angle                                     object
summary_acts.pitch.eventId                      object
summary_acts.pitch.result                       object
summary_acts.pitch.action                       object
summary_acts.pitch.speed.mph                   float64
summary_acts.pitch.speed.kph                   float64
summary_acts.pitch.speed.mps                   float64
summary_acts.pitch.spin.rpm                    float64
summary_acts.hit.speed.mph                     float64
summary_acts.hit.speed.kph                     float64
summary_acts.hit.speed.mps                     float64
summary_acts.hit.spin.rpm                      float64
summary_sc

In [None]:
# check for rows with unit not foot, mph, mph/s
print('length: ', df['units.length'].unique())
print('velocity: ', df['units.velocity'].unique())
print('acceleration: ', df['units.acceleration'].unique())
print('angle: ', df['units.angle'].unique())

length:  ['foot']
velocity:  ['mph']
acceleration:  ['mph/s']
angle:  ['degree']


In [None]:
# df.to_csv('all_data.csv', index=False)

### Find groups of hits

In [21]:
save_dir = 'grouped_ids'

In [4]:
# filter for pitches with a hit
hit_df = df.dropna(subset=['summary_acts.hit.eventId'], ignore_index=True)
print(len(hit_df))

325


In [5]:
# expand events column (hit details) & remove unnecessary columns
events_df = pd.json_normalize(hit_df['events'].explode().to_list())
events_df = events_df.add_prefix('events.')
hit_df = hit_df.join(events_df)
hit_df = hit_df.drop(columns=['events', 'units.length', 'units.velocity', 'units.acceleration', 'units.angle',
                              'summary_acts.pitch.speed.kph', 'summary_acts.pitch.speed.mps', 
                              'summary_acts.hit.speed.kph', 'summary_acts.hit.speed.mps'])

In [6]:
angle_df = pd.DataFrame(hit_df['events.start.angle'].tolist(), columns=['events.spray_angle', 'events.launch_angle'])
hit_df = hit_df.drop(columns=['events.start.angle'])
hit_df = pd.concat([hit_df, angle_df], axis=1)

In [None]:
hit_df.iloc[0]

samples_ball                                   [{'time': -0.0046734, 'pos': [-1.0477735799342...
samples_bat                                    [{'event': 'First', 'time': -0.3944424, 'head'...
summary_acts.pitch.eventId                                  b120cf14-305c-442c-a739-c499bf61eec8
summary_acts.pitch.result                                                            HitIntoPlay
summary_acts.pitch.action                                                                    NaN
summary_acts.pitch.speed.mph                                                                84.0
summary_acts.pitch.spin.rpm                                                               2720.0
summary_acts.hit.speed.mph                                                                  84.0
summary_acts.hit.spin.rpm                                                                 1560.0
summary_score.runs.game.team1                                                                  4
summary_score.runs.game.team2 

#### Group by barrel

*"To be Barreled, a batted ball requires an exit velocity of at least 98 mph. At that speed, balls struck with a launch angle between 26-30 degrees always garner Barreled classification. For every mph over 98, the range of launch angles expands."* - mlb.com

In [None]:
# filter to hits that fulfill above requirement
# - launch angle: events.launch_angle
# - exit velocity: summary_acts.hit.speed.mph

barrel_df = hit_df[(hit_df['events.launch_angle'] >= 26) 
                & (hit_df['events.launch_angle'] <= 30)
                & (hit_df['summary_acts.hit.speed.mph'] >= 98)]
print(len(barrel_df))
print(barrel_df[['events.launch_angle', 'summary_acts.hit.speed.mph']])

7
     events.launch_angle  summary_acts.hit.speed.mph
26             28.158941                        98.0
30             28.101540                       100.0
111            28.688241                       105.0
188            29.320294                       100.0
242            28.543504                        99.0
275            28.753227                        99.0
298            29.489962                       104.0


In [None]:
barrel_ids = list(barrel_df['events.eventId'])
barrel_ids

['57d6343f-cdae-4517-acb4-ea73b838e2e9',
 'bee9aa56-bfdd-4871-ace1-178db56aa19a',
 'a44f0611-618d-41dd-bb9b-089140c3f317',
 '5fbf979d-ac7a-4f41-9498-2f94507ecba1',
 'f2f58c66-ea90-42cb-8d64-2ab98fe5c64a',
 '687e2c12-dff4-4580-9226-c111366746e5',
 'af219680-da54-4e43-8ea5-3ea020f3bc2d']

In [None]:
non_barrel_df = hit_df[(hit_df['events.launch_angle'] < 26) 
                    | (hit_df['events.launch_angle'] > 30)
                    | (hit_df['summary_acts.hit.speed.mph'] < 98)]
non_barrel_ids = list(non_barrel_df['events.eventId'])
len(non_barrel_ids)

318

In [None]:
barrel_group = {
    'barrel': barrel_ids,
    'other': non_barrel_ids
}
with open(os.path.join(save_dir, 'by_barrel.json'), 'w') as file:
    json.dump(barrel_group, file, indent=4)

#### Group by pitch count (strikes)

In [18]:
hit_df[['summary_score.count.balls.plateAppearance', 
        'summary_score.count.strikes.plateAppearance']]

Unnamed: 0,summary_score.count.balls.plateAppearance,summary_score.count.strikes.plateAppearance
0,2,1
1,2,2
2,2,2
3,1,2
4,0,1
...,...,...
320,0,0
321,2,1
322,0,0
323,1,2


In [35]:
# group by strike counts at the time of plate appearance
strikes_col = 'summary_score.count.strikes.plateAppearance'
event_id_col = 'events.eventId'

strike_ct_df = hit_df.groupby(strikes_col)[event_id_col].apply(list).reset_index()
strike_ct_df

Unnamed: 0,summary_score.count.strikes.plateAppearance,events.eventId
0,0,"[1c2ac927-d7e3-475d-a121-ae837139f3f8, b5ae071..."
1,1,"[daf9742c-869c-4370-9d66-59e217be1c89, 54ca7cb..."
2,2,"[62848ee8-bff2-4410-9c61-eb672c283a60, 49f533e..."


In [48]:
strike_ct_dict = strike_ct_df[event_id_col].to_dict()
with open(os.path.join(save_dir, 'by_strike_ct.json'), 'w') as file:
    json.dump(strike_ct_dict, file, indent=4)

## Analysis
MLB considers the ideal launch angle as the “sweet spot launch angle”, which is between 8-32 degrees. 
https://www.mlb.com/glossary/statcast/launch-angle

Our data was grouped into 3 ranges:
1.	Low launch angle: < 8 degrees
2.	Ideal launch angle: 8-32 degrees
3.	High launch angle: >32 degrees

There are 3 results of a pitch, being a strike, a hitintoplay, or a ball. Strikes are not desirable to the batter because 3 strikes lead to a strikeout, but  hitintoplays and  balls are desirable to the batter since four balls allow the batter to walk to first base. 

Method: 
1.	Import JSON files into Excel power query.
2.	Parse through JSON, creating columns for pitch results and angle. 
•	Since at bats that did not have hits were empty in “Events,” filtered out the rows that were null, then expanded “Events” to angles. 
3.	Since launch angle is in the second index of the angles list, create a new column indicating the row indices.
4.	Filter out the rows with odd indices, leaving only the launch angle behind in the angles column. 
5.	Determine % of strikes, % of hitintoplays, and % of balls for each launch angle range. 

The results from the data are: 
<div><img src="./Picture1.png" alt="Figure 1. Pitch results of low-high launch angles" /></div>