# Home Run Exit Velocity: 2015 vs 2017

Everyone who watches MLB Baseball is speculating that either the Bats or the Players are 'Juiced' this year.  We've seen an incredible 6105 Home Runs in the 2017 regular season vs. 4909 Home Runs in 2015.  I compared the mean Ball Exit Velocities for the two populations using Python's ttest_ind() method.  Note: Although its appropriate to use a Z-test instead, since the data set is the entire poulation and the variance is known, I'll use the Independent Sample Mean T-Test for the sake of the exercise.

The Null Hypothesis is that the Ball Exit Velocity Averages are identical between the 2015 Home Run population and the 2017 Home Run population.  

The Alternate Hypothesis is that in 2017, either a physically changed ball, or players on steroids would cause the ball to leave the bat with greater velocity (which would result in more Home Runs).  

In order to reject the Null Hypothesis at a 5% alpha, we'll need an independent samples T test p value result below 0.05 (or 5%):

In [None]:
import pandas as pd
import scipy as sp
from scipy.stats import ttest_ind as TTest
import seaborn as sns
import matplotlib.pyplot as plt

path = "../input/HR Exit Velocity.csv"
df = pd.read_csv(path)
df_2015 = df[df['Season'] == 2015]
df_2017 = df[df['Season'] == 2017]

TTest(df_2017['HR Exit Velocity'],df_2015['HR Exit Velocity'])

We can see from the result, that the p-value of 0.75 means we will keep the Null Hypothesis.  The Home Run Ball Exit Velocity Means are identical in 2015 and 2017 (as are the IQRs).  We see in the Histograms below that the same is apparent.  

Note:  The Histogram scales are skewed due to a single 36 MPH HR by Bryce Harper in 2015.  I decided to leave it in because it's Bryce Harper ;-) 

### Histograms of 2015 Home Runs vs. 2017 Home Runs

In [None]:
df15hr = df_2015['HR Exit Velocity']
df17hr = df_2017['HR Exit Velocity']

f, axes = plt.subplots(2, figsize=(15, 10), sharex=True, sharey=True)

sns.distplot(df15hr, bins=200, kde=False, color = "b", ax=axes[0]).set_title("2015")
sns.distplot(df17hr, bins=75, kde=False, color = "r", ax=axes[1]).set_title("2017")

We can dive a little deeper by looking at the descriptive statistics for Home Run years 2015 & 2017.  You'll see that the Mean Exit Velocities, Standard Deviations, and Inter-Quartile Ranges are all very similar.  It appears at face value that the Balls are leaving the Bat at the same speed in 2015 & 2017.  So if the Balls are leaving with the same Force, then why are we getting more home runs in 2017?  Well I still have to do the analysis, but I'm thinking that its because the Pitching sucks.  Maybe a few too many balls are getting left out over the plate.

### Descriptive Statistics 2015 & 2017 Home Runs:
#### n, Average Exit Velocity (MPH), ...

In [None]:
df15hr.describe()

In [None]:
df17hr.describe()