# MLB Pitcher Analysis

## Modeling

### Called Strikes Model

## Project Goals:

- Determine why Pitcher 1 struggled after the all-star break.
- Determine why Pitchers 2 and 3 improved after the all-star break.
- For each pitcher, build a predictive model of called strikes to use for improving pitching performance.

## Summary of Data:

The data for this study contains all pitches that were taken by a batter (i.e., ball or called strike) when thrown by one of three pitchers. Pitcher’s 1, 2, and 3 are all considered elite, high-value players across the league. In the 2019 season, Pitcher 1 had an earned run average (ERA) of 2.30 before the all-star break, followed by an ERA of 4.80 after the all-star break. Pitchers 2 and 3 posted ERA's of 3.10 and 3.30, respectively, before the all-star break. After the all-star break, Pitchers 2 and 3 posted ERA's of 1.80 and 1.44 respectively. This data was used to provide exploratory data analysis for all three pitchers and provide initial findings for the project goals listed above. The initial findings were used as a guide to build a predictive model of called strikes for each pitcher. Called strikes was identified as the response variable that could lead to improved pitching performance over the course of the baseball season (without the use of hit, strikeout, or walk data).

In [2]:
#import and run libraries and cleaned data from mlb_pitcher_analysis_data_cleaning notebook
%run ../../python_files/mlb_pitcher_analysis_libraries
%run ../../python_files/mlb_pitcher_analysis_data_cleaning
%matplotlib inline
sns.set(style="whitegrid")
pd.options.display.max_columns = 100
# from mlb_pitcher_analysis_libraries import *    #for use within .py file

In [4]:
x_train_pitcher1

Unnamed: 0,x,z,spin_rate,release_velo,release_x,release_y,release_z,pfx_x,pfx_z,extension,ump,catcherid,ball_count,strike_count,baserunner_count,out_count,baserunner_on_first,baserunner_on_second,baserunner_on_third,result_strike,pitch_type_CB,pitch_type_CH,pitch_type_CT,pitch_type_FF,pitch_type_FT,pitch_type_SL
0,-16.391585,29.099555,1553.866893,85.310139,-3.37,54.04,5.10,-11.96,1.74,6.458,7,6,1,1,0,1,0.0,0.0,0.0,0,0.0,1.0,0.0,0.0,0.0,0.0
1,7.092059,25.644536,2583.628084,94.969000,-3.29,54.25,5.42,-7.95,7.51,6.251,6,1,0,0,0,1,0.0,0.0,0.0,1,0.0,0.0,0.0,1.0,0.0,0.0
2,0.132016,9.792737,1437.224158,84.161239,-3.50,54.55,5.22,-9.51,2.82,5.949,4,6,2,1,1,0,1.0,0.0,0.0,0,0.0,1.0,0.0,0.0,0.0,0.0
3,7.070683,26.340875,2512.432340,94.659353,-3.53,54.56,5.37,-9.71,8.94,5.938,21,6,0,1,0,0,0.0,0.0,0.0,1,0.0,0.0,0.0,1.0,0.0,0.0
4,13.727402,16.222042,2554.533265,87.019283,-3.62,54.54,4.91,-0.17,1.27,5.956,2,1,1,1,1,1,0.0,1.0,0.0,0,0.0,0.0,0.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1202,11.906081,24.380337,2665.513628,94.800558,-3.60,54.36,5.00,-8.83,8.35,6.143,49,1,0,1,1,1,1.0,0.0,0.0,0,0.0,0.0,0.0,1.0,0.0,0.0
1203,-9.742490,21.467883,2433.162261,84.160375,-3.64,54.44,5.15,1.49,-1.09,6.057,33,1,1,2,1,0,1.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,1.0
1204,18.538083,12.764592,2362.765408,86.651315,-3.63,54.45,5.04,1.09,3.60,6.054,48,6,0,0,0,0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,1.0
1205,4.716835,29.987920,2324.824912,95.128822,-3.34,54.96,5.36,-7.10,8.40,5.536,13,1,0,0,0,1,0.0,0.0,0.0,1,0.0,0.0,0.0,1.0,0.0,0.0


## Modeling Strategies

##### Pre-Modeling Techniques

- Scaling: We use Standard Scaler to scale our 'x' training and test datasets so that our model does not unfairly penalize our coefficients due to differences in units.

##### Model Implementation and Model Performance

We utilized a pipeline technique to implement 8 different model types for each player (3 pipelines- 1 for each player, 24 total models):

- Logistic Regression
- KNN
- SVC
- NuSVC
- Decision Tree
- Random Forest
- Ada Boost
- Gradient Boosting

After running each pipeline, we will be able to review model performance using accuracy as our primary metric. We will also review confusion matrices to review correct predictions vs incorrect predictions.

## Pitcher 1: Called Strikes Model

In [None]:
# Pitcher 1 Called Strike Training Model Selection and Comparison

classifiers = [
    LogisticRegression(),
    KNeighborsClassifier(),
    SVC(),
    NuSVC(probability=True),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    AdaBoostClassifier(),
    GradientBoostingClassifier()
    ]

for classifier in classifiers:
    pipe = Pipeline([
                     ('ss', StandardScaler()),
                     ('classifier', classifier)])
    pipe.fit(x_train_pitcher1, y_train_pitcher1)   
    print(classifier, '\n')
    conf_matrix = pd.DataFrame(confusion_matrix(y_train_pitcher1, pipe.predict(x_train_pitcher1)),
                           index = ['actual 0', 'actual 1'], 
                           columns = ['predicted 0', 'predicted 1'])
    display(conf_matrix)
    print("Accuracy Score: ",(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,0])/(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,1]+conf_matrix.iloc[1,0]+conf_matrix.iloc[0,0]))
    print("Model Score: %.3f" % pipe.score(x_train_pitcher1, y_train_pitcher1), '\n')

In [None]:
# Pitcher 1 Called Strike Test Model Selection and Comparison

for classifier in classifiers:
    pipe = Pipeline([
                     ('ss', StandardScaler()),
                     ('classifier', classifier)])
    pipe.fit(x_train_pitcher1, y_train_pitcher1)   
    print(classifier, '\n')
    conf_matrix = pd.DataFrame(confusion_matrix(y_test_pitcher1, pipe.predict(x_test_pitcher1)),
                           index = ['actual 0', 'actual 1'], 
                           columns = ['predicted 0', 'predicted 1'])
    display(conf_matrix)
    print("Accuracy Score: ",(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,0])/(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,1]+conf_matrix.iloc[1,0]+conf_matrix.iloc[0,0]))
    print("Model Score: %.3f" % pipe.score(x_test_pitcher1, y_test_pitcher1), '\n')

## Pitcher 1 Best Model

### Our *NuSVC Model* had the best model performance.

#### Further analysis of this model will help us determine how effective our model is in predicting called strikes for pitchers.

##### Model Implementation

In [None]:
# re-run our Pitcher 1 Called Strike NuSVC model for further review of model performance

classifier_nusvc = [NuSVC(probability=True)]

for classifier in classifier_nusvc:
    pipe = Pipeline([
                     ('ss', StandardScaler()),
                     ('classifier', classifier)])
    pipe.fit(x_train_pitcher1, y_train_pitcher1)   
    print(classifier, '\n')
    conf_matrix = pd.DataFrame(confusion_matrix(y_test_pitcher1, pipe.predict(x_test_pitcher1)),
                           index = ['actual 0', 'actual 1'], 
                           columns = ['predicted 0', 'predicted 1'])
    display(conf_matrix)
    print("Accuracy Score: ",(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,0])/(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,1]+conf_matrix.iloc[1,0]+conf_matrix.iloc[0,0]))
    print("Model Score: %.3f" % pipe.score(x_test_pitcher1, y_test_pitcher1), '\n')

##### Accuracy Calculation

Through the above calculation of accuracy, approximately 70% of the variability in 'draft_status' can be explained using our feature variables, which is promising for our first run of this model.

##### Precision Calculation and Confusion Matrix Results

The calculation of approximately 74% precision is good for our purposes, as we indentified precision as our optimized metric for this study.

The above confusion matrix results are telling us that our of 620 total predictions, we have 431 correct predictions (351 true positives + 80 true negatives = 431 correct predictions) and 189 incorrect predictions (124 false positives + 65 false negatives = 189 incorrect predictions). The ratio of approximately 2.3 correct predictions to every 1 incorrect prediction is also a good sign for our first run of this model.

##### ROC Curve

In [None]:
# ROC Curve

# nusvc_model = LogisticRegression()
# nusvc_model.fit(x_train_ds, y_train_ds)

# nusvc_roc_auc = roc_auc_score(y_test_ds, nusvc_model.predict(x_test_ds))
# fpr, tpr, thresholds = roc_curve(y_test_ds, nusvc_model.predict_proba(x_test_ds)[:,1])
# plt.figure()
# plt.plot(fpr, tpr, label='NuSVC (area = %0.2f)' % logit_roc_auc)
# plt.plot([0, 1], [0, 1],'r--')
# plt.xlim([0.0, 1.0])
# plt.ylim([0.0, 1.05])
# plt.xlabel('False Positive Rate')
# plt.ylabel('True Positive Rate')
# plt.title('Draft Status ROC Curve')
# plt.legend(loc="lower right")
# plt.show()

## Which Pitcher 1 pitching variables are most important in predicting called strikes?

### We use our *Logistic Regression model* to answer this question.

#### Further analysis of this model will help determine which pitching variables are most important in predicting called strikes.

##### Model Implementation

Through the review of our logistic regression model results, we can use the coefficient values of our feature variables to identify which pitching variables are most important in predicting called strikes.

In [None]:
# re-run our Pitcher 1 Called Strike Logistic Regression model for further review of feature performance

logit_model = sm.Logit(y_train_pitcher1, x_train_pitcher1)
logit_result = logit_model.fit()
print(logit_result.summary())

##### Evaluation of Model Coefficients

After running our logistic regression model, we can see which feature variables have an impact on draft_status. It is clear that the 40_yard_dash is by far the most statistically significant, implying that this combine drill has the strongest influence on draft_status. The next most significant feature variables are 3_cone_drill and 3_cone_drill_missed.

## Pitcher 1 Conclusions and Recommendations

- The NFL Combine proves that strong athleticism can solely get a player drafted.
- From a team's perspective, excluding a player's college statistics, teams display a high willingness to take a chance on a player who performs well in the NFL Combine.
- From a player's perspective, if a player is lacking a strong volume of college statistics, the NFL Combine offers them a big opportunity to get drafted. The NFL Combine matters enough that if a player performs well in the event, it is likely to get that player drafted.
- From an agent or draft evaluator's perspective, precision is our preferred metric when offering a player a projection on his potential draft status. In our model, false positives are worse than false negatives, as we don’t want to inform a player that they will get drafted and then they actually don’t. We want to be conservative and very sure in our recommendations to players. We want to avoid offering incorrect projections as much as possible.
- The drill that has the most impact on a lineman's draft status is by far the 40-Yard Dash. The 40-Yard Dash shows strong predictive power in that a good performance in this drill can actually help boost a lineman's draft status, while a bad performance in this drill can actually hurt a player's draft status. The second most important drill is the 3-Cone Drill. Similarly to the 40-Yard Dash, a good performance in this drill can slightly improve a player's draft status, while a bad performance can slightly harm a player's draft_status. It is also notable that players who skipped the 3-Cone Drill entirely had a higher draft status. We suspect that this is the case because most of the time, the players who skip this drill are already highly likely to be drafted. 

## Pitcher 2: Called Strikes Model

In [None]:
# Pitcher 2 Called Strike Training Model Selection and Comparison

classifiers = [
    LogisticRegression(),
    KNeighborsClassifier(),
    SVC(),
    NuSVC(probability=True),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    AdaBoostClassifier(),
    GradientBoostingClassifier()
    ]

for classifier in classifiers:
    pipe = Pipeline([
                     ('ss', StandardScaler()),
                     ('classifier', classifier)])
    pipe.fit(x_train_pitcher2, y_train_pitcher2)   
    print(classifier, '\n')
    conf_matrix = pd.DataFrame(confusion_matrix(y_train_pitcher2, pipe.predict(x_train_pitcher2)),
                           index = ['actual 0', 'actual 1'], 
                           columns = ['predicted 0', 'predicted 1'])
    display(conf_matrix)
    print("Accuracy Score: ",(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,0])/(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,1]+conf_matrix.iloc[1,0]+conf_matrix.iloc[0,0]))
    print("Model Score: %.3f" % pipe.score(x_train_pitcher2, y_train_pitcher2), '\n')

In [None]:
# Pitcher 2 Called Strike Test Model Selection and Comparison

for classifier in classifiers:
    pipe = Pipeline([
                     ('ss', StandardScaler()),
                     ('classifier', classifier)])
    pipe.fit(x_train_pitcher2, y_train_pitcher2)   
    print(classifier, '\n')
    conf_matrix = pd.DataFrame(confusion_matrix(y_test_pitcher2, pipe.predict(x_test_pitcher2)),
                           index = ['actual 0', 'actual 1'], 
                           columns = ['predicted 0', 'predicted 1'])
    display(conf_matrix)
    print("Accuracy Score: ",(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,0])/(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,1]+conf_matrix.iloc[1,0]+conf_matrix.iloc[0,0]))
    print("Model Score: %.3f" % pipe.score(x_test_pitcher2, y_test_pitcher2), '\n')

## Pitcher 2 Best Model

### Our *NuSVC Model* had the best model performance.

#### Further analysis of this model will help us determine how effective our model is in predicting called strikes for pitchers.

##### Model Implementation

In [None]:
# re-run our Pitcher 2 Called Strike NuSVC model for further review of model performance

classifier_nusvc = [NuSVC(probability=True)]

for classifier in classifier_nusvc:
    pipe = Pipeline([
                     ('ss', StandardScaler()),
                     ('classifier', classifier)])
    pipe.fit(x_train_pitcher2, y_train_pitcher2)   
    print(classifier, '\n')
    conf_matrix = pd.DataFrame(confusion_matrix(y_test_pitcher2, pipe.predict(x_test_pitcher2)),
                           index = ['actual 0', 'actual 1'], 
                           columns = ['predicted 0', 'predicted 1'])
    display(conf_matrix)
    print("Accuracy Score: ",(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,0])/(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,1]+conf_matrix.iloc[1,0]+conf_matrix.iloc[0,0]))
    print("Model Score: %.3f" % pipe.score(x_test_pitcher2, y_test_pitcher2), '\n')

##### Accuracy Calculation

Through the above calculation of accuracy, approximately 70% of the variability in 'draft_status' can be explained using our feature variables, which is promising for our first run of this model.

##### Precision Calculation and Confusion Matrix Results

The calculation of approximately 74% precision is good for our purposes, as we indentified precision as our optimized metric for this study.

The above confusion matrix results are telling us that our of 620 total predictions, we have 431 correct predictions (351 true positives + 80 true negatives = 431 correct predictions) and 189 incorrect predictions (124 false positives + 65 false negatives = 189 incorrect predictions). The ratio of approximately 2.3 correct predictions to every 1 incorrect prediction is also a good sign for our first run of this model.

##### ROC Curve

In [None]:
# ROC Curve

# nusvc_model = LogisticRegression()
# nusvc_model.fit(x_train_ds, y_train_ds)

# nusvc_roc_auc = roc_auc_score(y_test_ds, nusvc_model.predict(x_test_ds))
# fpr, tpr, thresholds = roc_curve(y_test_ds, nusvc_model.predict_proba(x_test_ds)[:,1])
# plt.figure()
# plt.plot(fpr, tpr, label='NuSVC (area = %0.2f)' % logit_roc_auc)
# plt.plot([0, 1], [0, 1],'r--')
# plt.xlim([0.0, 1.0])
# plt.ylim([0.0, 1.05])
# plt.xlabel('False Positive Rate')
# plt.ylabel('True Positive Rate')
# plt.title('Draft Status ROC Curve')
# plt.legend(loc="lower right")
# plt.show()

## Which Pitcher 2 pitching variables are most important in predicting called strikes?

### We use our *Logistic Regression model* to answer this question.

#### Further analysis of this model will help determine which pitching variables are most important in predicting more called strikes.

##### Model Implementation

Through the review of our logistic regression model results, we can use the coefficient values of our feature variables to identify which pitching variables are most important in predicting called strikes.

In [None]:
# re-run our Pitcher 2 Called Strike Logistic Regression model for further review of feature performance

logit_model = sm.Logit(y_train_pitcher2, x_train_pitcher2)
logit_result = logit_model.fit()
print(logit_result.summary())

##### Evaluation of Model Coefficients

After running our logistic regression model, we can see which feature variables have an impact on draft_status. It is clear that the 40_yard_dash is by far the most statistically significant, implying that this combine drill has the strongest influence on draft_status. The next most significant feature variables are 3_cone_drill and 3_cone_drill_missed.

## Pitcher 2 Conclusions and Recommendations

- The NFL Combine proves that strong athleticism can solely get a player drafted.
- From a team's perspective, excluding a player's college statistics, teams display a high willingness to take a chance on a player who performs well in the NFL Combine.
- From a player's perspective, if a player is lacking a strong volume of college statistics, the NFL Combine offers them a big opportunity to get drafted. The NFL Combine matters enough that if a player performs well in the event, it is likely to get that player drafted.
- From an agent or draft evaluator's perspective, precision is our preferred metric when offering a player a projection on his potential draft status. In our model, false positives are worse than false negatives, as we don’t want to inform a player that they will get drafted and then they actually don’t. We want to be conservative and very sure in our recommendations to players. We want to avoid offering incorrect projections as much as possible.
- The drill that has the most impact on a lineman's draft status is by far the 40-Yard Dash. The 40-Yard Dash shows strong predictive power in that a good performance in this drill can actually help boost a lineman's draft status, while a bad performance in this drill can actually hurt a player's draft status. The second most important drill is the 3-Cone Drill. Similarly to the 40-Yard Dash, a good performance in this drill can slightly improve a player's draft status, while a bad performance can slightly harm a player's draft_status. It is also notable that players who skipped the 3-Cone Drill entirely had a higher draft status. We suspect that this is the case because most of the time, the players who skip this drill are already highly likely to be drafted. 

## Pitcher 3: Called Strikes Model

In [None]:
# Pitcher 3 Called Strike Training Model Selection and Comparison

classifiers = [
    LogisticRegression(),
    KNeighborsClassifier(),
    SVC(),
    NuSVC(probability=True),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    AdaBoostClassifier(),
    GradientBoostingClassifier()
    ]

for classifier in classifiers:
    pipe = Pipeline([
                     ('ss', StandardScaler()),
                     ('classifier', classifier)])
    pipe.fit(x_train_pitcher3, y_train_pitcher3)   
    print(classifier, '\n')
    conf_matrix = pd.DataFrame(confusion_matrix(y_train_pitcher3, pipe.predict(x_train_pitcher3)),
                           index = ['actual 0', 'actual 1'], 
                           columns = ['predicted 0', 'predicted 1'])
    display(conf_matrix)
    print("Accuracy Score: ",(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,0])/(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,1]+conf_matrix.iloc[1,0]+conf_matrix.iloc[0,0]))
    print("Model Score: %.3f" % pipe.score(x_train_pitcher3, y_train_pitcher3), '\n')

In [None]:
# Pitcher 3 Called Strike Test Model Selection and Comparison

for classifier in classifiers:
    pipe = Pipeline([
                     ('ss', StandardScaler()),
                     ('classifier', classifier)])
    pipe.fit(x_train_pitcher3, y_train_pitcher3)   
    print(classifier, '\n')
    conf_matrix = pd.DataFrame(confusion_matrix(y_test_pitcher3, pipe.predict(x_test_pitcher3)),
                           index = ['actual 0', 'actual 1'], 
                           columns = ['predicted 0', 'predicted 1'])
    display(conf_matrix)
    print("Accuracy Score: ",(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,0])/(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,1]+conf_matrix.iloc[1,0]+conf_matrix.iloc[0,0]))
    print("Model Score: %.3f" % pipe.score(x_test_pitcher3, y_test_pitcher3), '\n')

## Pitcher 3 Best Model

### Our *NuSVC Model* had the best model performance.

#### Further analysis of this model will help us determine how effective our model is in predicting called strikes for pitchers.

##### Model Implementation

In [None]:
# re-run our Pitcher 3 Called Strike NuSVC model for further review of model performance

classifier_nusvc = [NuSVC(probability=True)]

for classifier in classifier_nusvc:
    pipe = Pipeline([
                     ('ss', StandardScaler()),
                     ('classifier', classifier)])
    pipe.fit(x_train_pitcher3, y_train_pitcher3)   
    print(classifier, '\n')
    conf_matrix = pd.DataFrame(confusion_matrix(y_test_pitcher3, pipe.predict(x_test_pitcher3)),
                           index = ['actual 0', 'actual 1'], 
                           columns = ['predicted 0', 'predicted 1'])
    display(conf_matrix)
    print("Accuracy Score: ",(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,0])/(conf_matrix.iloc[1,1]+conf_matrix.iloc[0,1]+conf_matrix.iloc[1,0]+conf_matrix.iloc[0,0]))
    print("Model Score: %.3f" % pipe.score(x_test_pitcher3, y_test_pitcher3), '\n')

##### Accuracy Calculation

Through the above calculation of accuracy, approximately 70% of the variability in 'draft_status' can be explained using our feature variables, which is promising for our first run of this model.

##### Precision Calculation and Confusion Matrix Results

The calculation of approximately 74% precision is good for our purposes, as we indentified precision as our optimized metric for this study.

The above confusion matrix results are telling us that our of 620 total predictions, we have 431 correct predictions (351 true positives + 80 true negatives = 431 correct predictions) and 189 incorrect predictions (124 false positives + 65 false negatives = 189 incorrect predictions). The ratio of approximately 2.3 correct predictions to every 1 incorrect prediction is also a good sign for our first run of this model.

##### ROC Curve

In [None]:
# ROC Curve

# nusvc_model = LogisticRegression()
# nusvc_model.fit(x_train_ds, y_train_ds)

# nusvc_roc_auc = roc_auc_score(y_test_ds, nusvc_model.predict(x_test_ds))
# fpr, tpr, thresholds = roc_curve(y_test_ds, nusvc_model.predict_proba(x_test_ds)[:,1])
# plt.figure()
# plt.plot(fpr, tpr, label='NuSVC (area = %0.2f)' % logit_roc_auc)
# plt.plot([0, 1], [0, 1],'r--')
# plt.xlim([0.0, 1.0])
# plt.ylim([0.0, 1.05])
# plt.xlabel('False Positive Rate')
# plt.ylabel('True Positive Rate')
# plt.title('Draft Status ROC Curve')
# plt.legend(loc="lower right")
# plt.show()

## Which Pitcher 3 pitching variables are most important in predicting called strikes?

### We use our *Logistic Regression model* to answer this question.

#### Further analysis of this model will help determine which pitching variables are most important in predicting more called strikes.

##### Model Implementation

Through the review of our logistic regression model results, we can use the coefficient values of our feature variables to identify which pitching variables are most important in predicting called strikes.

In [None]:
# re-run our Pitcher 3 Called Strike Logistic Regression model for further review of feature performance

logit_model = sm.Logit(y_train_pitcher3, x_train_pitcher3)
logit_result = logit_model.fit()
print(logit_result.summary())

##### Evaluation of Model Coefficients

After running our logistic regression model, we can see which feature variables have an impact on draft_status. It is clear that the 40_yard_dash is by far the most statistically significant, implying that this combine drill has the strongest influence on draft_status. The next most significant feature variables are 3_cone_drill and 3_cone_drill_missed.

## Pitcher 3 Conclusions and Recommendations

- The NFL Combine proves that strong athleticism can solely get a player drafted.
- From a team's perspective, excluding a player's college statistics, teams display a high willingness to take a chance on a player who performs well in the NFL Combine.
- From a player's perspective, if a player is lacking a strong volume of college statistics, the NFL Combine offers them a big opportunity to get drafted. The NFL Combine matters enough that if a player performs well in the event, it is likely to get that player drafted.
- From an agent or draft evaluator's perspective, precision is our preferred metric when offering a player a projection on his potential draft status. In our model, false positives are worse than false negatives, as we don’t want to inform a player that they will get drafted and then they actually don’t. We want to be conservative and very sure in our recommendations to players. We want to avoid offering incorrect projections as much as possible.
- The drill that has the most impact on a lineman's draft status is by far the 40-Yard Dash. The 40-Yard Dash shows strong predictive power in that a good performance in this drill can actually help boost a lineman's draft status, while a bad performance in this drill can actually hurt a player's draft status. The second most important drill is the 3-Cone Drill. Similarly to the 40-Yard Dash, a good performance in this drill can slightly improve a player's draft status, while a bad performance can slightly harm a player's draft_status. It is also notable that players who skipped the 3-Cone Drill entirely had a higher draft status. We suspect that this is the case because most of the time, the players who skip this drill are already highly likely to be drafted. 