# <u>Model Insights
---

## Objective
This final notebook extracts meaningful results from the tuned models. Comparing R-squared and coefficients lends to the comparison of players based on their Club Head Speed. The conclusion captures the realities of professional golf and ways to improve upo

-----
#### External Libraries Import

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pickle
import warnings
warnings.filterwarnings('ignore')

#### Read Data and Models

In [2]:
df_pga = pd.read_csv('../Data/Sets/final_model.csv')
slow_model = pickle.load(open('../Best_Models/slow_model.pk', 'rb'))
fast_model = pickle.load(open('../Best_Models/fast_model.pk', 'rb'))

#### Prepare Data

In [3]:
features = [
    col for col in df_pga.columns if col not in ['date', 'finish', 
                                                  'player', 'event', 
                                                  'sg:_off-the-tee',
                                                  'sg:_approach-the-green',
                                                  'sg:_around-the-green',
                                                  'sg:_putting',
                                                  'sg:_total']]

X = df_pga[features]
y = df_pga['sg:_total']

# train, test split
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                        test_size = 0.3, random_state = 77)

# standardize the data using StandardScaler
ss = StandardScaler()
X_train_sc = ss.fit_transform(X_train)
X_test_sc = ss.transform(X_test)

## <u>Compare Models<u/>

### R-squared
- The slow model was trained on only golf tournaments where the player averaged a club speed below 115.51mph.
- The fast model was trained on only golf tournaments where the player averaged a club speed above 115.51mph.

In [4]:
print(f'The slow model explains {round((slow_model.score(X_test_sc, y_test)*100), 2)}%\
 of variation in total strokes gained.')
print(f'The fast model explains {round((fast_model.score(X_test_sc, y_test)*100), 2)}%\
 of variation in total strokes gained.')

The slow model explains 58.41% of variation in total strokes gained.
The fast model explains 61.32% of variation in total strokes gained.


- Remember that a LASSO regression trained on the entire dataset explained 60.32% of variation in Total Strokes Gained compared to the mean.
- The slow model loses explanatory power by nearly 2% compared to the original model.
- The fast model gains explanatory power by about 1% compared to the original model.

#### Insight
By training a model on golf tournaments where the player averaged below the mean Club Head Speed, the model loses predictive power. That is, there is something about the performance of a golfer with a slower swing speed that is less explainable by these metrics than the performance of a golfer with a faster swing speed.

### Interpret Coefficients

#### Below Average Club Head Speed Players

In [5]:
slow = pd.DataFrame(slow_model.coef_, columns=['slow_coefs'])
slow['slow_abs_coefs'] = abs(slow_model.coef_)
slow.index = X_train.columns
slow = slow.sort_values('slow_abs_coefs', ascending=False).head(12)

# create standard deviation column
slow_std = []
for col in slow.index:
    slow_std.append(df_pga[col].std())
slow['std._dev'] = slow_std
slow

Unnamed: 0,slow_coefs,slow_abs_coefs,std._dev
greens_in_regulation_percentage,0.547266,0.547266,7.533094
scrambling,0.401197,0.401197,10.716159
putting_average,-0.353627,0.353627,0.078363
overall_putting_average,-0.175324,0.175324,0.074604
going_for_the_green_-_hit_green_pct.,0.100728,0.100728,18.173266
going_for_the_green_-_birdie_or_better,0.072928,0.072928,20.238183
putting_from_-_10-25',0.070319,0.070319,27.231458
club_head_speed,0.060038,0.060038,4.235433
3-putt_avoidance,-0.048774,0.048774,2.034111
sand_save_percentage,0.043033,0.043033,22.914026


#### Above Average Club Head Speed Players

In [6]:
fast = pd.DataFrame(fast_model.coef_, columns=['fast_coefs'])
fast['fast_abs_coefs'] = abs(fast_model.coef_)
fast.index = X_train.columns
fast = fast.sort_values('fast_abs_coefs', ascending=False).head(12)

# create standard deviation column
fast_std = []
for col in fast.index:
    fast_std.append(df_pga[col].std())
fast['std._dev'] = fast_std
fast

Unnamed: 0,fast_coefs,fast_abs_coefs,std._dev
greens_in_regulation_percentage,0.785991,0.785991,7.533094
overall_putting_average,-0.497467,0.497467,0.074604
scrambling,0.471522,0.471522,10.716159
putting_average,-0.339779,0.339779,0.078363
going_for_the_green_-_hit_green_pct.,0.18505,0.18505,18.173266
one-putt_percentage,-0.108674,0.108674,6.335913
fairway_proximity,0.107791,0.107791,52.124938
putts_per_round,-0.09894,0.09894,1.342851
going_for_the_green,-0.094355,0.094355,19.28453
scrambling_from_10-30_yards,-0.082979,0.082979,33.867141


## Comparison of Slow Swing Golfers to Fast Swing Golfers

- **Size of coefficients:**
    - The first thing to notice in this comparison is the magnitude of the coefficients. The fast model produced much larger coefficients than the slow model. 
    - For example: a single standard deviation increase (0.0746 putts) in a players overall putting average decreases a fast swing speed players' strokes gained by 0.497 while it only decreases a slow swing speed players' strokes gained by only 0.175.
    - The coefficients of the slow model prove that Strokes Gained for a player with a slower swing speed is affected by more aspects of the game than a player with a faster swing speed. 
<br><br>
- **Club head speed**
    - The strength of club head speed as a predictor is similar for both models. It increases a players strokes gained by 0.06 - 0.07 for every increase of 4.23 mph. 
<br><br>
- **Interesting takeaways**
    - Going for the Green percentage hurts a golfer with a fast swing speed. For every increase in 19% of Going for the Green, a player can expect to lose 0.094 Strokes. This implies that there is a sweet spot for going for the green percentage which was mentioned in the EDA; there are times when a player should not go for the green.
    <br><br>
    - Putting features hurt golfers with a fast swing speed much more than golfers with a slow swing speed. Overall putting percentage is a stronger predictor of Strokes Gained than Scrambling for players with a faster swing speed. Keeping their overall putting average low is key to their success.
    <br><br>
    - Important putts from 6 feet, 10 feet, 10-25 feet are stronger predictors for slower swing players. This is because they have to account for the strokes they lose before getting to the green.
    <br><br>
    - For players with a slow swing speed, areas that account for total strokes gained are more spread out among all metrics. This reiterates the fact that their Total Strokes Gained is more difficult to explain.
    <br><br>
    - For players with a fast swing speed, it is clear that their success comes from getting to the green in regulation, getting close to the hole off the fairway, and getting up and down around the green.

## Conclusion

In the impossible game of golf, especially at a competitive level, players are always looking for ways to make improve. According to the models, one thing that makes the sport easier is increasing Club Head Speed. It allows for more GIR and less backlash from missing those mid-range putts. With a slower swing speed, a player has to be excellent at every aspect of the game. The results of this study prove the difficulty of what golfers are doing especially those with a slower Club Head Speed. One important thing to note is that each of the golfers included in this study are of the best in the world. With the influx of powerful swings coming to the PGA in recent years, the ability of certain golfers with slow swing speeds to sustain their ability to compete is unexplainable by the numbers. 

<br><br>
Increasing your club head speed as a golfer is difficult as there are physical limitations to improving it. So how do you improve Strokes Gained?
- Avoiding 3-Putts
- Going for the Green Successfully
- Sand Saves
- Clutch Putting