# <p style="text-align: right;"> 2025 Data Viz Championship

### <p style="text-align: right;"> &#9989; Jackson Reel</p>

### <p style="text-align: right;"> &#9989; 1/16/25</p>
---

Instructions for running the notebook:
1. Download associated CSV file title 'Dylan Carlson Data 1.csv'

The Results and Methodology both gradually progress throughout the notebook. There are not designated sections. I feel it is easier to follow in this manner.

# *Dylan Carlson Swing Analysis*

**Background and Motivation**

The follow-through is one of the first, and most basic skills taught to young athletes. Quarterbacks in football are taught to finish throws with their arms out forward and their wrists pronated downward. The "flick of the wrist" is a phrase that often showcases the follow-through of a throw. Imagine trying to throw a football but freezing the moment the ball is no longer contacting your hand. In tennis, the follow-through is extremely important to success. Pronation is again a key part to the effectiveness of a serve, and the highlights of Rafael Nadal hitting seemingly impossible shots on the court are due in part to his ability to generate massive spin, due to his exaggerated follow-through.  Basketball is another great example of the importance of the follow through. Coaches preach of B.E.E.F (balance, eyes, elbow, follow-through) when learning the fundamentals of the sport as players taught to repeat the same exact motion on every shot. Being able to replicate the same form over and over again is a very important step in the development of young player. The viral clips of Stephen Curry making 100 three point shots in a row are only possible because he, and the other elite players of his kind, are able to produce identical motions on every single shot. 


The follow-through in baseball and softball is just as important as any other sport. In simple terms, the follow-through allows the athlete to impart the maximum amoount of force on the object which leads to greater velocity of the object. In bat and ball centered sports, the velocity of the object is paramount. Sustaining high velocity is key to both pitchers and hitters because the quicker the object moves, the harder it is for the opponent to handle.  The pitching delivery of a baseball and softball are very different and very complex movements and will not be covered to much detail in this notebook. Instead, this notebook will be focusing on the hitter, particularly the baseball hitter. However, the keys to being a successful baseball hitter and a successful softball hitter are similar enough to where this will still have relevance to the latter. 


What separates baseball from other sports that rely on following-through, is the inherent lack of control the batter possesses. The duel between the batter and the pitcher is largely determined by the pitcher. At the highest level of the game, if the pitcher executes his plan, he will be successful a majority of the time. The batter, in a way, is at the mercy of the pitcher; hoping that he throws a mistake. To combat this, the batter must make sacrifices. The batting practice swing that gets him in a rhythm before the game is quickly disrupted by the fastball/slider combination during the game. The adjustments that hitters make to increase their chance of success are intricate and and often ambiguous to the viewer. One such adjustment though has intrigued me. The decision to follow through with one hand or two. Physically, it would appear that to maximize success at the plate, hitters should try to repeat the exact same swing every time. And that throughout the league, we would see an overarching trend of one method being preferred. That, however is not the case. There are superstar hitters who follow-through with one hand, there are superstar hitters who follow-through with two hands, and there are superstar hitters who do both. Overall, the idea that the follow-through has any impact on success is not prominent. NC State University has written about this idea. They claim that "It does not matter as long as they keep both hands on the bat while contact is being made with the ball." Baseball is a sport of numbers, and if a relationship exists, it it being tracked. As of April 2024, there has not been any mainstream recording or documenting of the swing follow-through and it's tendencies. In a sport where everything is tracked, the fact that this is not implies that the follow-through does not impact success. The baseball swing appears to be a rare instance of true preference in style. This notebook aims to analyze one particular player, and how that seemingly inconsequential preference just might impact his performance. 

**Who is Dylan Carlson?**

(This was written before Carlson was traded to the Tampa Bay Rays in July 2024.)

Dylan Carlson is an outfielder for the St. Louis Cardinals. Drafted out of high school in the first round of the 2016 MLB draft, Carlson quickly became one of the key pieces of the organization. His expectations continued to rise as he played well at each level of the minors leagues, one point reaching the status of 13th best prospect in all of baseball according to MLB.com. Before his debut season of 2020, Cardinals president of baseball operations, John Mozeliak declared Carlson to be of the '...the Albert Pujols or Oscar Taveras type." The former universally recognized as one of the greatest hitter of all time and the latter being a former number 1 overall prospect who tragically died after his debut season. In what are clearly unrealistic expectations, both the organization and fans had envisioned Carlson to be the solution to seemingly every problem they had. While the Cardinals and their fans are certainly still hoping for Carlson to become what many projected him to be, both groups, now 4 years into his big league career, are coming to grips that a 'New Carlson' will have to replace the old, broken one. 

While the magic key to unlocking Carlson's full potential is elusive and unobtainable, analyzing the source of his struggles is a good place to start. One of the great, or perhaps not so great, aspects of baseball is the abundance of information. Players and fans alike have a bevy of knowledge at their fingertips through the likes of fangraphs, baseball savant and many other online databases that have an answer to just about any question. So, in theory, Carlson and his coaches should easily know the areas in which he needs to improve, and I'm sure they do. Any attempt to mend his deficiencies by an inexperienced third party is not constructive, as the internal efforts are surely more precise. However, approaching these problems from a new perspective is almost always a healthy exercise and one that this notebook will hope to accomplish. Having followed the career of Carlson for several years, I have noticed certain trends that may be affecting his performance. The primary trend and the one that we will hopefully reach a definitive answer on, is whether his swing follow-through has an impact on his success. From watching Dylan Carlson, I have formed a narrative that when he bats left handed, he experiences more success when his follow through has two hands on the bat instead of one. This may be a great example of confirmation bias, or it may be a step toward unlocking that potential. 

**Methodology**

The question of whether or not his follow-through impacts his performance, we must first try to interpret his decision making. Ideally, a baseball hitter wants to replicate the same swing on every pitch, though as mentioned, the hitter does not have control of the situation. The 'ideal swing' may be heavily compromised as a result of a several different factors. On a random pitch or random swing, it is impossible to know on whether he chooses to swing a certain way because he feels comfortable doing so, or because he feels forced to do so. Understanding these extenuating conditions, and when Dylan Carlson employs each type of swing will help us know if there is a correlation between his follow-through and his success.

**Understanding his Swing**

The best approach to this would be to gather data on every swing Carlson has taken in the major leagues dating back to his debut. However, since the follow-through of the swing is not seen as a contributing factor, no data has been collected on the subject for Carlson, or anyone in MLB. Therefore, all data used and seen in this notebook has been collected first hand. This is a time consuming process leading to the sample size being limited to just the 2023 season. This is not enough games to make a definitive conclusion one way or the other, but enough to see the patterns and trends that may be present. 

In [None]:
%pip install seaborn
%pip install --upgrade matplotlib

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns



In [None]:
#Initialize data
dt = pd.read_csv('Dylan Carlson Data 1.csv')
dt1 = dt.drop(columns=['Index','Unnamed: 13','Unnamed: 14','Unnamed: 15', 'Unnamed: 16', 'Unnamed: 17'])
dt1.rename(columns={'Plate Appearance (season)': 'Plate Appearance'}, inplace=True)

In [None]:
#Subsets of data

#Swing Type
one_hand = dt1[dt1['Hands on Swing']=='1']
two_hand = dt1[dt1['Hands on Swing']=='2']
bunt = dt1[dt1['Hands on Swing']=='Bunt']
check = dt1[dt1['Hands on Swing']=='Check']
wrong_hand = dt1[dt1['Hands on Swing']=='-1']

#Handedness
left = dt1[dt1['Handedness']=='L']
left_full = left[(left['Hands on Swing'] == '1') | (left['Hands on Swing'] == '2')]
right = dt1[dt1['Handedness']=='R']
right_full = right[(right['Hands on Swing'] == '1') | (right['Hands on Swing'] == '2')]

#Handedness and Swing type
right_one = right[right['Hands on Swing']=='1']
right_two = right[right['Hands on Swing']=='2']
left_one = left[left['Hands on Swing']=='1']
left_two = left[left['Hands on Swing']=='2']

#Remove certain outcomes
no_foul = dt1[dt1['Outcome'] != 'Foul']
no_miss = no_foul[no_foul['Outcome'] != 'Miss']
in_play = no_miss[no_miss['Outcome'] != 'K']
in_play_left = in_play[in_play['Handedness']=='L']
in_play_right = in_play[in_play['Handedness']=='R']
in_play_left = in_play_left[in_play_left['Hands on Swing']!='Bunt']
in_play_right = in_play_right[in_play_right['Hands on Swing']!='Bunt']

in_play_left1 = in_play_left[in_play_left['Hands on Swing']=='1']
in_play_left2 = in_play_left[in_play_left['Hands on Swing']=='2']
in_play_right1 = in_play_right[in_play_right['Hands on Swing']=='1']
in_play_right2 = in_play_right[in_play_right['Hands on Swing']=='2']

leftpa = left['Plate Appearance'].unique()
rightpa = right['Plate Appearance'].unique()
left1pa = left_one['Plate Appearance'].unique()
left2pa = left_two['Plate Appearance'].unique()
right1pa = right_one['Plate Appearance'].unique()
right2pa = right_two['Plate Appearance'].unique()

sppal = len(leftpa) / len(left_full)
sppar = len(rightpa) / len(right_full)

# 2 strikes
two_strike = dt1[dt1['2 strikes']=='y']
left12k = left_one[left_one['2 strikes']=='y']
left22k = left_two[left_two['2 strikes']=='y']
right12k = right_one[right_one['2 strikes']=='y']
right22k = right_two[right_two['2 strikes']=='y']


In [None]:

#This plot was inspired from 
#https://python-graph-gallery.com/163-donut-plot-with-subgroups/

#AI was used to aid in coding

# Establish data
group_names=['Left Handed', 'Right Handed']
group_size=[len(left_full), len(right_full)]
subgroup_names=['L-1', 'L-2', 'R-1', 'R-2']
subgroup_size=[len(left_one), len(left_two), len(right_one), len(right_two)]
 
# Create colors
a, b=[plt.cm.Blues, plt.cm.Reds]
 
# First Ring (outside)
fig, ax = plt.subplots()
ax.axis('equal')
mypie, _ = ax.pie(group_size, radius=1.3, labels=group_names, colors=[a(0.6), b(0.6)] )
plt.setp( mypie, width=0.3, edgecolor='white')
 
# Second Ring (Inside)
mypie2, _ = ax.pie(subgroup_size, radius=1.3-0.3, labels=subgroup_names, labeldistance=0.7, colors=[a(0.4), a(0.4), b(0.4), b(0.4)])
plt.setp( mypie2, width=0.4, edgecolor='white')
plt.margins(0,0)
 
# show it
plt.show()



|              | Left Handed | Right Handed|
| :----------- | :---------: | ----------: |
| **One Hand**     |    148     |       25 |
| **Two Hands**    |    132     |       152 |





*Note: This graph includes all swings that Dylan Carlson took in 2023 that were not check swings or bunts.*

Due to there being roughly 25% of left handed pitchers in the league, it is expected that Carlson has more swings left handed (against right handed pitchers). However, Carlson had roughly 40% of his swings against left handed pitchers, and a higher swing per plate appearance batting left handed (56%) compared to right handed (46.3%). This indicates that Carlson played in a higher percentage of games batting right handed than left. His career marks for wRC+ (statistic where 100 is league average) batting left and right handed are 86 and 135 respectively, explaining the rates of handedness. 


In an interview with MLB in 2016, Carlson explained that from the age of 5 he was a 'natural right-handed hitter', who slowly learned to become a switch hitter. That, along with a scouting report done in 2014 by the Perfect Game who described Dylan Carlson as 'much more advanced right handed at present, power approach with strength in his swing from the right side, leans back and extends through contact, loose swing with bat speed. Left handed swing has a late pull back load and more length, hits from a crouch left handed and gets very uphill at times.' confirm that he has always been more comfortable swinging right handed. 

This is important to consider when comparing his one-handed swing rates in 2023. The vast majority of swings from the right side were with two hands. The comfort of a two handed swing may contribute to those rates, though the massive shift in success when hitting left vs. right handed indicate that a choice, although second nature to him, could be contributing to his success. When hitting left handed, we see less success and close to a 50/50 split on one vs two handed swings in 2023. 


The next step is too see what the outcomes of those swings were. This will allow us to get a sense of his success, and any differences that occurred in 2023 when he varied his swing.

In [None]:
outcome_count = dt1['Outcome'].value_counts()
l1count = left_one['Outcome'].value_counts()
r1count = right_one['Outcome'].value_counts()
l2count = left_two['Outcome'].value_counts()
r2count = right_two['Outcome'].value_counts()


| Outcome    | Overall         | Right Handed (1H) | Left Handed (1H)| Right Handed (2H) | Left Handed (2H) 
| :----------|:------: | :------: | :----: | :----: | :---:
| Foul       |       192  |             13   |            48 |              68 |                63   |
| Miss (including K)  |97|             6   |            41 |              31 |                 19  |
| Out (on contact)|  122  |            5   |            46 |          35 |                36   |
| Hit |                48 |            1   |            13 |               18 |              16     |
|Total|459 |25| 148| 152 | 132



| Outcome    | Overall         | Right Handed (1H) | Left Handed (1H)| Right Handed (2H) | Left Handed (2H) 
| :----------|:------: | :------: | :----: | :----: | :---:
| Foul Rate       |       42%  |                |            33% |        45%       |                47%   |
| Miss Rate  |21%|                |            28% |              20% |                 14%  |
| In play Rate|  37%  |               |            39% |          35% |                39%   |



*The tables above only account for the full swings made where the follow-through could have had an impact on the result.*

Comparing the right handed data, especially the rates, is not a productive exercise as the sample size of him swinging with one hand is far too low. When Carlson batted left handed, he experienced a higher miss rate when following through with one hand compared to two. As well as lower contact rates (foul and in-play). In fact, one could argue his left handed two hand follow-through swing was more productive than his right handed swings, which we know to be above average, based on these tables as he swings and misses less and makes contact more. 

Anyone with baseball experience, however, would say that the data presented in these tables does paint the full picture at all. While limiting misses overall is a good idea, often we see an increase in power when miss rate is higher. This can be interpreted as the batter swinging harder to get better results when he does make contact, in exchange for more swings where he doesn't. Similarly, putting the ball in play is a good idea. However, not every ball in play has the same level of productiveness, if any at all. 

Overall, there is a difference between his one handed and two handed follow-through. Quantifying that difference will require data that is independent of result and just a reflection of the swing. 


Exit Velocity and Launch Angle are two of the better raw metrics to use when evaluating talent, especially at a high level. As their names would suggest, they are how hard the ball is hit off the bat and the angle in which the ball is hit. These are useful because it ignores the inherent randomness of baseball. The Consortium for the Advancement of Undergraduate Statistics Education (CAUSE) goes into depth about the relationship between exit velocity, launch angle and the success they bring. Overall, exit velocity sees a simple positive relationship; as exit velocity increases so does the probability of a hit. Launch Angle generally sees a negative relationship with hit probability going down as the angle increases.  While balls on the ground (LA < 10) do result in hits more than fly balls (LA > 30), the type of hit is overwhelmingly singles on the ground while balls not on the ground have more of a chance to be extra base hits or home runs. The ideal zone for launch angle would be between 10 and 30 degrees, representing a line drive. This coupled with high exit velocity see the most sustained success at the plate. 


In [None]:
# AI was used to aid in coding

# Set the Seaborn theme and palette
sns.set_theme(style="ticks", palette="pastel")

# Create boxplots
sns.boxplot(x="Handedness", y="EV", hue="Hands on Swing", palette=["r", "b"], data=in_play_left)
sns.boxplot(x="Handedness", y="EV", hue="Hands on Swing", palette=["b", "r"], data=in_play_right)

# Remove spines
sns.despine(offset=10, trim=False)

# Create a custom legend
legend_labels = ['One Hand', 'Two Hands']  # Custom legend labels
legend_colors = ['r', 'b']  # Custom legend colors
legend_handles = [plt.Line2D([0], [0], color=color, lw=4) for color in legend_colors]
plt.legend(legend_handles, legend_labels, title='Category', loc='upper right')

plt.show()


In [None]:

#AI was used to aid in coding

#Plot 1 
sns.displot(in_play_left, x="LA", hue="Hands on Swing", kind="kde", fill=True)

plt.axvspan(xmin=10, xmax = 30, color = 'r', alpha = 0.5, label = 'Ideal LA')
plt.xlabel('Launch Angle')
plt.legend();
plt.title('Left Handed Launch Angle')

#plot 2
sns.displot(in_play_right, x="LA", hue="Hands on Swing", kind="kde", fill=True)

plt.axvspan(xmin=10, xmax = 30, color = 'r', alpha = 0.5, label = 'Ideal LA')
plt.xlabel('Launch Angle')
plt.legend();
plt.title('Right Handed Launch Angle')

*The charts above use only swings in which a ball was put in play.*

When batting right handed, Carlson's follow-through with two hands is significantly better than with one hand, and occurs much more frequently. He does a great in that respect at maximizing the potential of every swing. The same cannot be said about his lefty swings. 


While a high miss rate can be excused if hard contact is made frequently, the box-plot above shows that Carlson had a much higher exit velocity with a two handed follow-through. When batting left handed, over 50% of his swings with two hands resulted in a higher exit velocity that his mean exit velocity with one hand. His max exit velocities are similar which indicate he is capable of hitting the ball hard with one hand, though replicating that is clearly more challenging. When looking at launch angle, there is a similar narrative. Carlson's two handed swing result in considerably more balls hit with favorable launch angles, allowing him a better chance at line drives and extra base hits. With one hand, he hits a the majority of his balls with a launch angle of less than 10 degrees, which are ground balls. Even with identical exit velocity numbers, his success would still be limited with one hand because he has trouble elevating the ball. 



In [None]:
dt1_good = dt1[(dt1['EV'] >= 95) & (dt1['LA'] >= 10) & (dt1['LA'] <= 30)]
left1_good = left_one[(left_one['EV'] >= 95) & (left_one['LA'] >= 10) & (left_one['LA'] <= 30)]
left2_good = left_two[(left_two['EV'] >= 95) & (left_two['LA'] >= 10) & (left_two['LA'] <= 30)]
right1_good = right_one[(right_one['EV'] >= 95) & (right_one['LA'] >= 10) & (right_one['LA'] <= 30)]
right2_good = right_two[(right_two['EV'] >= 95) & (right_two['LA'] >= 10) & (right_two['LA'] <= 30)]

dt1_ev95 = dt1[(dt1['EV'] >= 95)]
left1_ev95 = left_one[(left_one['EV'] >= 95)]
left2_ev95 = left_two[(left_two['EV'] >= 95)]
right1_ev95 = right_one[(right_one['EV'] >= 95)]
right2_ev95 = right_two[(right_two['EV'] >= 95)]

dt1_la10 = dt1[(dt1['LA'] < 10)]
left1_la10 = left_one[(left_one['LA'] < 10)]
left2_la10 = left_two[(left_two['LA'] < 10)]
right1_la10 = right_one[(right_one['LA'] < 10)]
right2_la10 = right_two[(right_two['LA'] < 10)]

dt1_lagood = dt1[(dt1['LA'] >= 10) & (dt1['LA'] <= 30)]
left1_lagood = left_one[(left_one['LA'] >= 10) & (left_one['LA'] <= 30)]
left2_lagood = left_two[(left_two['LA'] >= 10) & (left_two['LA'] <= 30)]
right1_lagood = right_one[(right_one['LA'] >= 10) & (right_one['LA'] <= 30)]
right2_lagood = right_two[(right_two['LA'] >= 10) & (right_two['LA'] <= 30)]

dt1_la30 = dt1[(dt1['LA'] > 30)]
left1_la30 = left_one[(left_one['LA'] > 30)]
left2_la30 = left_two[(left_two['LA'] > 30)]
right1_la30 = right_one[(right_one['LA'] > 30)]
right2_la30 = right_two[(right_two['LA'] > 30)]



| Item                   | Overall  |Left (1H)|Left (2H)|Right (1H)|Right (2H) 
| :----------------      | :------: | :----: | :---:     | :-----:  | :-----:
| EV > 95                |     72   | 12    |      32   |     0    | 28
| LA < 10                |   122    | 53    |     24    |    10    |  33
| 10 < LA < 30           |  87      | 21    |      31   |     2    |  31
| LA > 30                |141       | 27    |     51    |     6    |  55
| EV > 95 & 10 < LA < 30 |     32   |  4    |     15    |     0    |  13

| Item                   | Overall  |Left (1H)|Left (2H)|Right (1H)|Right (2H) 
| :----------------      | :------: | :----: | :---:     | :-----:  | :-----:
| Average Exit Velocity  |     89.3        |     84.9   |  93  |     74.9    |     92.3    | 


Overall in 2023, Carlson has an average exit velocity of 89 mph, which is right at league average. That figure is really just an average of his exit velocities by follow-through, though. If we were to isolate each swing as if they were all different players the results are much different. An average exit velocity of 84.9 mph would put him around the 5th percentile, comparable to backup catcher Austin Barnes. An average exit velocity of 93 mph would put him in the 95th percentile, ahead of Mookie Betts and Mike Trout. This is not claiming that one simple change turns him into a multi time MVP,  but only showcasing the noticeable drop offs in miss rate, exit velocity and launch angle when he swings with one hand. 

Now that we know he experiences more success with two hands, the question becomes why is he swinging with one-hand 50% of the time when he's hitting left handed?

To answer that, we need to see what influences him to swing with either one hand or two. If everything was in Carlson's control, he should be swinging with a two handed follow-through exclusively. However, that is rarely ever the case. The pitcher is often the one dictating how at-bats progress, and if Carlson finds himself at the mercy of the pitcher he may feel he has a better shot at being productive by abandoning his "A swing" and doing whatever is necessary to stay alive. In theory, that should only happen when he is in a two strike count. Once the pitcher has two strikes, overall success for batters decreases greatly. This would be a logical scenario in which Carlson employs his one-handed swing if he feels he can make contact and avoid the strikeout. Any other count, the batter has strikes to play with and should aim to do as much damage as possible.

Before continuing, it is important to recognize that the health of the batter plays a part in what they feel comfortable doing at the plate. A history of injury can severely limit the capabilities of the batter, even if they are cleared to play. In Carlson's case, while he did suffer an ankle injury in mid May, he did not experience any physical limitations that can be claimed to impact his follow-through. 


**Why does Dylan Carlson swing with one hand?**

In [None]:
#AI was used to aid in coding

swing_datasets = [dt1, left_one, left_two, right_one, right_two]
dataset_names = ['Overall', 'Left (1H)', 'Left (2H)', 'Right (1H)', 'Right (2H)']

# Initialize lists to store percentages
with_two_strikes_percentages = []
without_two_strikes_percentages = []

# Iterate over each dataset
for df in swing_datasets:
    # Calculate the percentage of swings with two strikes
    swings_with_two_strikes = (df['2 strikes'] == 'y').sum()
    total_swings = len(df)
    with_two_strikes_percentage = (swings_with_two_strikes / total_swings) * 100
    with_two_strikes_percentages.append(with_two_strikes_percentage)

    # Calculate the percentage of swings without two strikes
    without_two_strikes_percentage = 100 - with_two_strikes_percentage
    without_two_strikes_percentages.append(without_two_strikes_percentage)


# Plot the percentages
plt.figure(figsize=(7, 5))
plt.bar(range(len(swing_datasets)), with_two_strikes_percentages, label='With Two Strikes')
plt.bar(range(len(swing_datasets)), without_two_strikes_percentages, bottom=with_two_strikes_percentages, label='Without Two Strikes')
plt.xlabel('Swing Type')
plt.ylabel('Percentage')
plt.title('Percentage of Swings with and without Two Strikes')
plt.xticks(range(len(swing_datasets)), ['Overall', 'Left (1H)', 'Left (2H)', 'Right (1H)', 'Right (2H)'])
plt.legend()
plt.tight_layout()
plt.show()

#Pie Chart 
group_names=['Left Handed', 'Right Handed']
group_size=[len(left12k)+len(left22k), len(right12k)+len(right22k)]
subgroup_names=['L-1', 'L-2', 'R-1', 'R-2']
subgroup_size=[len(left12k), len(left22k), len(right12k), len(right22k)]
 
# Create colors
a, b=[plt.cm.Blues, plt.cm.Reds]
 
# First Ring (outside)
fig, ax = plt.subplots()
ax.axis('equal')
mypie, _ = ax.pie(group_size, radius=1.3, labels=group_names, colors=[a(0.6), b(0.6)] )
plt.setp( mypie, width=0.3, edgecolor='white')
 
# Second Ring (Inside)
mypie2, _ = ax.pie(subgroup_size, radius=1.3-0.3, labels=subgroup_names, labeldistance=0.7, colors=[a(0.4), a(0.4), b(0.4), b(0.4)])
plt.setp( mypie2, width=0.4, edgecolor='white')
plt.margins(0,0)
 
# show it
plt.show()


In [None]:
# Assuming 'in_play' is your DataFrame
filtered_data = in_play[in_play['2 strikes']=='y']
k2_ev = np.mean(filtered_data['EV'])
filtered_data = in_play_left1[in_play_left1['2 strikes']=='y']
k2_evl1 = np.mean(filtered_data['EV'])
filtered_data = in_play_right1[in_play_right1['2 strikes']=='y']
k2_evr1 = np.mean(filtered_data['EV'])
filtered_data = in_play_left2[in_play_left2['2 strikes']=='y']
k2_evl2 = np.mean(filtered_data['EV'])
filtered_data = in_play_right2[in_play_right2['2 strikes']=='y']
k2_evr2 = np.mean(filtered_data['EV'])

k2 = [k2_ev, k2_evl1, k2_evl2, k2_evr1, k2_evr2]


Of Carlson's one handed swings roughly 40% of them came with two strikes and he was slightly more likely to use a one handed in those counts. A possible explanation for this trend is that he feels he has a better chance of making contact with the ball (which we now know to be false). The exit velocities in two strikes counts are below.


| Item                   | Overall  |Left (1H)|Left (2H)|Right (1H)|Right (2H) 
| :----------------      | :------: | :----: | :---:     | :-----:  | :-----:
| Average Exit Velocity  |     88.3        |     85.7   |  90.7  |     75.4    |     90.3    | 


The left handed exit velocities still have a rather large difference, and the one handed exit velocity with 2 strikes is very close to his exit velocity with one hand overall. 2 strike counts have some correlation to swing usage, though not to the extent of explaining the lack of performance with one hand or his decision to use his one handed swing as much as he does.

Outside of 2 strike counts, it is possible there are other factors that influence his swing rates. The biggest being pitch location and pitch velocity. Even in non two strike counts, these factors could influence his decision making.

In [None]:
#AI was used to aid to coding 


dataset_names = ['Overall', 'Left (1H)', 'Left (2H)', 'Right (1H)', 'Right (2H)']
# Initialize lists to store percentages
outside_strike_zone_percentages = []

# Iterate over each dataset
for df in swing_datasets:
    # Filter swings outside the strike zone
    outside_strike_zone = df[df['Pitch Location'].isin([10, 11, 12, 13, 14])]
    
    # Calculate the percentage of swings outside the strike zone
    total_swings = len(df)
    swings_outside_strike_zone = len(outside_strike_zone)
    outside_strike_zone_percentage = (swings_outside_strike_zone / total_swings) * 100
    outside_strike_zone_percentages.append(outside_strike_zone_percentage)

# Plot the percentages
plt.figure(figsize=(7, 5))
plt.bar(dataset_names, outside_strike_zone_percentages)
plt.xlabel('Swing Type')
plt.ylabel('Percentage')
plt.title('Percentage of Swings Outside the Strike Zone')
plt.tight_layout()
plt.show()


In [None]:
# AI was used to aid in coding


dataset_names = ['Overall', 'Left (1H)', 'Left (2H)', 'Right (1H)', 'Right (2H)']
# Initialize lists to store percentages
outside_strike_zone_with_two_strikes_percentages = []

# Iterate over each dataset
for df in swing_datasets:
    # Filter swings with two strikes and outside the strike zone
    two_strikes_outside_strike_zone = df[(df['2 strikes'] == 'y') & df['Pitch Location'].isin([10, 11, 12, 13, 14])]
    
    # Calculate the percentage of swings with two strikes outside the strike zone
    total_swings_with_two_strikes = len(df[df['2 strikes'] == 'y'])
    swings_with_two_strikes_outside_strike_zone = len(two_strikes_outside_strike_zone)
    outside_strike_zone_with_two_strikes_percentage = (swings_with_two_strikes_outside_strike_zone / total_swings_with_two_strikes) * 100
    outside_strike_zone_with_two_strikes_percentages.append(outside_strike_zone_with_two_strikes_percentage)

# Plot the percentages
plt.figure(figsize=(7, 5))
plt.bar(dataset_names, outside_strike_zone_with_two_strikes_percentages)
plt.xlabel('Swing Type')
plt.ylabel('Percentage')
plt.title('Percentage of Swings Outside the Strike Zone with Two Strikes')
plt.tight_layout()
plt.show()


Allowing the pitcher to have more strikes than he deserves is one of the biggest things to avoid in baseball as the hitter. The advantage the pitcher has over the batter is already large from the first pitch of the at bat, so by swinging at pitches that would not be called strikes heavily reduces the batters chance of success.  Baseball Savant has league wide metrics on each region where a ball can be thrown, and, in general, the further the pitch is from the middle of the strike zone, the less likely batters are to get hits. 

The biggest difference we have seen so far is Carlson's chase rate at pitches outside of the strike zone. He is much more prone to swing at pitches outside the strike zone with one hand. Nearly 50% of his swings with one hand are against pitches outside the strike zone. Improving that number alone will benefit Carlson greatly. We can analyze his different swings and how he uses them, but his success will come a lot sooner and when he is able to not swing at so many pitches out of the zone. Over 40% of those swings did come with two strikes, where it is easier to forgive poor swings because of the position they are in, but chase rate is still an area of improvement.

It is hard to say that because he swings with one hand, he chases more. However, we can certainly see that he does chase more with one hand. The massive different in chase rate between one hand and two is surprising, especially since near 60% of those swings happened in a count where he, the batter, still had some control. Knowing everything we know so far, it is safe to say that if Carlson feels the need to swing with one hand, he just shouldn't swing at all. 


The following heat maps allow for a more comprehensive look at what is he swinging at. The orientation of every map is from the view of the pitcher.

In [None]:
# AI was used to aid in coding

def new_pitch_loc(data):
    coordinates = []  # Initialize an empty list to store coordinates
    for num in data['Pitch Location']:
        if num == 1:
            coordinates.append([-0.66, 1.5])
        elif num == 2:
            coordinates.append([0, 1.5])
        elif num == 3:
            coordinates.append([0.66, 1.5])
        elif num == 4:
            coordinates.append([-0.66, 2.5])
        elif num == 5:
            coordinates.append([0, 2.5])
        elif num == 6:
            coordinates.append([0.66, 2.5])
        elif num == 7:
            coordinates.append([-0.66, 3.5])
        elif num == 8:
            coordinates.append([0, 3.5])
        elif num == 9:
            coordinates.append([0.66, 3.5])
        elif num == 10:
            coordinates.append([-1.25, 0.75])
        elif num == 11:
            coordinates.append([1.25, 0.75])
        elif num == 12:
            coordinates.append([-1.25, 4.25])
        elif num == 13:
            coordinates.append([1.25, 4.25])
        elif num == 14:
            coordinates.append([0, 4.75])
    return coordinates

# Define datasets and their names
datasets = [dt1, left_one, left_two, right_one, right_two, two_strike,
                       left12k, left22k, right12k, right22k]
dataset_names = ['Overall', 'Left (1H)', 'Left (2H)', 'Right (1H)', 'Right (2H)',
               'Overall (2K)','Left (1H 2K)', 'Left (2H 2K)', 'Right (1H 2K)', 'Right (2H 2K)']

# Create subplots
num_datasets = len(datasets)
num_cols = 3
num_rows = (num_datasets + num_cols - 1) // num_cols
fig, axes = plt.subplots(nrows=num_rows, ncols=num_cols, figsize=(15, 5 * num_rows))

# Iterate over datasets and plot strike zone heatmap for each
for i, (dataset, name) in enumerate(zip(datasets, dataset_names)):
    row = i // num_cols
    col = i % num_cols
    
    # Calculate pitch coordinates for the current dataset
    coordinates = new_pitch_loc(dataset)
    x_coordinates = np.array([coord[0] for coord in coordinates])
    y_coordinates = np.array([coord[1] for coord in coordinates])
    
    # Create a 2D histogram (heatmap)
    heatmap = axes[row, col].hist2d(x_coordinates, y_coordinates, bins=8, cmap='YlGnBu')
    
    # Plot strike zone boundaries
    axes[row, col].plot([-1, 1], [1, 1], color='red', linewidth=2)  # Top of strike zone
    axes[row, col].plot([-1, 1], [4, 4], color='red', linewidth=2)  # Bottom of strike zone
    axes[row, col].plot([-1, -1], [1, 4], color='red', linewidth=2) # Left side of strike zone
    axes[row, col].plot([1, 1], [1, 4], color='red', linewidth=2)   # Right side of strike zone
    axes[row, col].plot([-1, 1], [2, 2], color='red', linewidth=2)  # top 1/3
    axes[row, col].plot([-1, 1], [3, 3], color='red', linewidth=2)  # Bottom 1/3
    axes[row, col].plot([-.33, -.33], [1, 4], color='red', linewidth=2) # left 1/3
    axes[row, col].plot([.33, .33], [1, 4], color='red', linewidth=2) # right 1/3

    axes[row, col].set_xticklabels([])
    axes[row, col].set_yticklabels([])
    
    # Add colorbar with label
    plt.colorbar(heatmap[3], ax=axes[row, col], label='Pitch Count')
    
    # Set title
    axes[row, col].set_title(name)
    
    # Invert y-axis to match baseball strike zone
    axes[row, col].invert_yaxis()

# Adjust layout and display
plt.tight_layout()
plt.show()


In [None]:

# Assuming 'in_play' is your DataFrame
filtered_data = in_play[~in_play['Pitch Location'].isin([10, 11, 12, 13, 14])]
kzone_ev = np.mean(filtered_data['EV'])
filtered_data = in_play_left1[~in_play_left1['Pitch Location'].isin([10, 11, 12, 13, 14])]
kzone_evl1 = np.mean(filtered_data['EV'])
filtered_data = in_play_right1[~in_play_right1['Pitch Location'].isin([10, 11, 12, 13, 14])]
kzone_evr1 = np.mean(filtered_data['EV'])
filtered_data = in_play_left2[~in_play_left2['Pitch Location'].isin([10, 11, 12, 13, 14])]
kzone_evl2 = np.mean(filtered_data['EV'])
filtered_data = in_play_right2[~in_play_right2['Pitch Location'].isin([10, 11, 12, 13, 14])]
kzone_evr2 = np.mean(filtered_data['EV'])

kzones = [kzone_ev, kzone_evl1, kzone_evl2, kzone_evr1, kzone_evr2]



Pitches below the zone, particularly down and away, gave Carlson the most trouble. In a defensive position, it is feasible that the one handed swing gives the batter a longer reach on pitches very far outside. While valid, it still does not explain all of his one handed swings.

Carlson did a good job at attacking pitches in the center of the zone regardless of how he is swinging. It is possible that due to his high chase rate with one hand, the average exit velocities calculated earlier are unfairly skewed down. We are only trying to see if his swing impacts his exit velocity, not if the location of the pitch impacts exit velocity. If we were to remove the swings outside the strike zone and calculated the average exit velocities of the pitches in the zone, we have the following.

| Item                   | Overall  |Left (1H)|Left (2H)|Right (1H)|Right (2H) 
| :----------------      | :------: | :----: | :---:     | :-----:  | :-----:
| Average Exit Velocity  |     90.9        |     86   |  94.2  |     79.4    |     93.4    | 


In-zone exit velocities continue to tell the same story; Carlson struggles with a one handed-swing. Even after eliminating the pitches that would explain a drop in exit velocity, his two handed swing, both left and right handed, remain significantly better. 

The final possible contributor to his swing usage that we will look at is pitch velocity. The speed of a pitch can impact a swing by disrupting the timing of the batter. If the batters begins his swing too soon or too late, the number of hands that he decides to leave on the bat may change. For instance, 80 mph curveballs may result in more one handed swings than 95 mph fastballs because the batter loses balance or feels like a one handed swing can cover more area. 

In [None]:
# AI was used to aid in coding

from scipy.optimize import curve_fit

dt2 = dt1[dt1['Pitch Velo'] != 51]
left_2 = left_two[left_two['Pitch Velo'] != 51]

data = [dt2, left_one, left_2]
names = ['Overall', 'Left (1H)', 'Left (2H)']

for i, dataset in enumerate(data):
    velocity_counts = dataset['Pitch Velo'].value_counts()
    x = velocity_counts.index
    y = velocity_counts.values

    plt.figure(figsize=(6, 4))
    plt.scatter(x, y, label='Data')

    # Fit a quadratic curve to the data
    def quadratic_fit(x, a, b, c):
        return a * (x ** 2) + b * x + c

    params, _ = curve_fit(quadratic_fit, x, y)
    a, b, c = params

    # Plot the curve of best fit
    x_fit = np.linspace(min(x), max(x), 100)
    plt.plot(x_fit, quadratic_fit(x_fit, a, b, c), color='red', label='Quadratic Fit')

    plt.xlabel('Velocity of Pitch')
    plt.ylabel('Number of Swings')
    plt.title(f'Swings per Velo - {names[i]}')
    plt.legend()
    plt.show()



In [None]:
# AI was used to aid in coding

dt2 = dt1[dt1['Pitch Velo'] != 51]
left_2 = left_two[left_two['Pitch Velo'] != 51]

data = [dt2, left_one, left_2]
names = ['Overall', 'Left (1H)', 'Left (2H)']

plt.figure(figsize=(8, 6))  # Create the figure outside the loop

for i, dataset in enumerate(data):
    velocity_counts = dataset['Pitch Velo'].value_counts()
    x = velocity_counts.index
    y = velocity_counts.values

    # Fit a quadratic curve to the data
    def quadratic_fit(x, a, b, c):
        return a * (x ** 2) + b * x + c

    params, _ = curve_fit(quadratic_fit, x, y)
    a, b, c = params

    # Plot the curve of best fit
    x_fit = np.linspace(min(x), max(x), 100)
    plt.plot(x_fit, quadratic_fit(x_fit, a, b, c), label=f'{names[i]} - Quadratic Fit')

plt.xlabel('Pitch Velocity')
plt.ylabel('Number of Swings')
plt.title('Swings per Velocity with Quadratic Fit')
plt.legend()
plt.grid(True)
plt.show()


In [None]:
# AI was used to aid in coding

# Define the velocity ranges
slow_range = (70.0, 85.0)
medium_range = (85.1, 95.0)
fast_range = (95.1, 106.0)

# Iterate over each dataset
for i, dataset in enumerate([in_play, in_play_left1, in_play_left2, in_play_right1, in_play_right2]):
    #print(f"Dataset {i + 1}:")
    
    # Filter pitches in each velocity range
    slow_pitches = dataset[(dataset['Pitch Velo'] >= slow_range[0]) & (dataset['Pitch Velo'] < slow_range[1])]
    medium_pitches = dataset[(dataset['Pitch Velo'] >= medium_range[0]) & (dataset['Pitch Velo'] < medium_range[1])]
    fast_pitches = dataset[(dataset['Pitch Velo'] >= fast_range[0]) & (dataset['Pitch Velo'] < fast_range[1])]
    
    # Calculate average exit velocities for each velocity range
    slow_avg_ev = slow_pitches['EV'].mean()
    medium_avg_ev = medium_pitches['EV'].mean()
    fast_avg_ev = fast_pitches['EV'].mean()
    
    # Print the averages
    #print("Average Exit Velocity for Pitches < 85 mph:", slow_avg_ev)
    #print("Average Exit Velocity for Pitches 85-95 mph:", medium_avg_ev)
    #print("Average Exit Velocity for Pitches > 95 mph:", fast_avg_ev)
    #print()


The graph above plots Carlson's different swing frequencies vs each pitch velocity, with the lines representing the trends in the raw data presented in the scatter plots. A quadratic model was used as that is the closest match to the data. Overall, we see that most of Carlson's swings are against pitches around 90 mph. The league average fastball velocity is 93 mph. Only his left handed swings are being compared because he already has a very good approach when right handed. The curve of his two handed swing appears considerably less steep indicating that against pitches of extreme velocity (low or high), he uses more two handed swings. The curve of one handed swings shows a higher peak at pitches of average velocity, meaning that he swings more with one hand in those scenarios.

These findings are the opposite of what was expected. Carlson swinging more at pitches of average velocity, ones that he sees the most and would theoretically be more comfortable against, with the lesser of his two choices is surprising. This is only based on 2023 which is not a big enough sample size to form a conclusion of this magnitude, though the trend was present in 2023 and would be something to look for if expanding this to past or future years. 

We will again compare exit velocities to establish what what swing he should be using. 


| Average Exit Velocity                 | Overall  |Left (1H)|Left (2H)|Right (1H)|Right (2H) 
| :----------------      | :------: | :----: | :---:     | :-----:  | :-----:
| Pitch < 85mph  |     87.1        |     78.1   |  87.7  |         |     90.7    | 
| 85 < Pitch < 95 mph | 89.7 | 86.4 | 94.1 | 71.6 | 94.2
| Pitch > 95 | 91.1 | 85 | 92.8 | 91.6 | 98.3 


It is clear that Carlson should be swinging with two hands regardless of pitch velocity. (Unrelated, but the 98.3 mph average exit velocity when hitting right handed with a two handed follow through is higher than any other hitters overall A.E.V in 2023.)

We have shown that there is not much correlation between many of factors that would influence his swings, to get a full illustration of this view the following correlation matrices.

In [None]:
# AI was used to aid in coding

# Example correlation matrices (replace with your data)
correlation_matrices = [dt1, left_one, left_two, right_one, right_two, two_strike,
                       left12k, left22k, right12k, right22k]  # List of correlation matrices (e.g., pandas DataFrames)
matrix_names = ['Overall', 'Left (1H)', 'Left (2H)', 'Right (1H)', 'Right (2H)',
               'Overall (2K)','Left (1H 2K)', 'Left (2H 2K)', 'Right (1H 2K)', 'Right (2H 2K)']

# Create subplots
num_matrices = len(correlation_matrices)
num_cols = num_matrices // 2  # Number of columns in each row
fig, axes = plt.subplots(nrows=2, ncols=num_cols, figsize=(15, 10))

# Plot each correlation matrix
for i, (matrix_data, matrix_name) in enumerate(zip(correlation_matrices, matrix_names)):
    row = i // num_cols  # Determine row number
    col = i % num_cols   # Determine column number
    mask = np.triu(np.ones_like(matrix_data.corr(numeric_only = True), dtype=bool))  # Compute mask for the current matrix
    sns.heatmap(matrix_data.corr(numeric_only=True), ax=axes[row, col], cmap='coolwarm', annot=False, fmt=".2f", cbar=False, mask=mask)
    #if row == 1:  # Only remove y-axis labels for subplots in the second row
        #axes[row, col].set_yticklabels([])  # Remove y-axis labels
    axes[row, col].set_title(matrix_name)  # Set title with the corresponding matrix name

# Adjust layout and display
plt.tight_layout()
plt.show()



*The one handed swing when batting one handed can be effectively disregarded because the sample size is too small.*

The blue cells above represent a negative correlation coefficient or R-value, corresponding to a negative relationship between the two variables. The red cells represent a positive correlation between the two variables. The intensity of the color in either direction, shows a stronger correlation. Cells with a light color do not have much correlation, if any. 

The strongest correlation, which is present in all the matrices, is between outcome direction and launch angle. This makes sense, as certain outcome directions are heavily favored to either fly balls or ground balls, and launch angle is essentially a numerical representation of that. Any relationship involving plate appearance does not hold much weight. The only contributing factor to that would be injury, and that has already been ruled as a possibility (to the best of an unaffiliated person's ability). There are other relationships that are not directly related to the current question of swing follow-through, but can still be important for contextual purposes. 

**Conclusion**

The results of this notebook are clear. Dylan Carlson should swing with two hands more often. He is significantly better left handed when he keeps both hands on the bat throughout the duration of his swing. His right handed swing distributions are in line with this claim, and he is a proficient right handed hitter. There does not appear to be one large area that needs to be improved or changed, rather Carlson should become more comfortable using a two handed follow-through, and laying off the pitches where he feels he needs to use one hand. Within that, him just gaining more plate discipline, even without changing his swing or approach will lead to more two handed swings and thus better results. Pitches in the strike zone were still an issue for Carlson with one hand, and he did not show a major preference to certain pitch velocities or pitch counts. Overall, I believe this is a matter of comfort. He grew up as a natural right handed hitter, learning to hit left handed as he learned baseball in general. Overtime, he has developed habits that have become second nature to him, that are likely a detriment to his success.

These results would hold more weight if the data included his entire career. Since this only looked at about 1/5 of his total swings, it is entirely possible that some or all of what was found in this notebook are outliers in the overall picture. 480 swings is just not a big enough sample size to make wholesale claims about a player, especially one who has an history of injury. To solidify the suggestions made in this notebook, further data and analysis should be done. 


# Bibliography

Dylan Carlson Class of 2016 - Player Profile | Perfect Game USA. (n.d.). Perfect Game. https://www.perfectgame.org/players/playerprofile.aspx?ID=352408 

Gray, R., Hecht, H., Bertamini, M., Bailey, S. R., Sievert, C., & Mills, B. M. (2018). Predicting batted ball outcomes in major league baseball. In Sport, Exercise, and Performance Psychology (Vol. 7, Issue 3, p. 318). https://www.causeweb.org/usproc/sites/default/files/usclap/2019-1/Predicting%20Batted%20Ball%20Outcomes%20in%20Major%20League%20Baseball.pdf 

Langosch, J. (2023, February 27). Carlson 2nd prep star for Cards at No. 33. MLB.com. https://www.mlb.com/news/cardinals-draft-dylan-carlson-at-no-33-c183131938

LaRue, J. (2020, March 17). Cardinals 2020 player preview: The freewheelin’ Dylan Carlson. Viva El Birdos. https://www.vivaelbirdos.com/2020/3/17/21171362/st-louis-cardinals-2020-player-preview-the-freewheelin-dylan-carlson Post-swing | Psychology of hitting. (n.d.). https://psychologyofhitting.wordpress.ncsu.edu/fundamentals/post-swing/ 

Statcast Leaderboard. (n.d.). baseballsavant.com. https://baseballsavant.mlb.com/statcast_leaderboard
