# Punt Commons
I wanted to take the challenge of understanding the punts more. To be honest, I did not know anything about the american football before this challenge. I needed to read and watch a lot first to understand what the people are doing over there. As you will be reading in the next parts, I probably still miss some knowledge in this field. I really welcome if you would help me with my not so correct interpretations.

In this notebook, I created a model with an weighted F1 score of 0.87 with very minimal number of features, by defining a custom success. Then I tried to understand why the model behaved so and where it failed.

In [None]:
!pip install swifter pandarallel xgboost shap 

In [None]:
import pandas as pd
import numpy as np

import seaborn as sns 
import matplotlib.animation as anim
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objs as go
import ipywidgets as wg
from IPython.core.display import HTML

from pandarallel import pandarallel
import swifter

from sklearn.pipeline import Pipeline
from sklearn import preprocessing
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split, TimeSeriesSplit, StratifiedKFold, cross_val_score, GridSearchCV
from sklearn.metrics import classification_report, accuracy_score, f1_score
from sklearn.ensemble import RandomForestClassifier, RandomForestClassifier
from sklearn.multiclass import OneVsRestClassifier

from xgboost.sklearn import XGBClassifier
from xgboost import plot_importance

import shap

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.min_rows', 100)
pd.set_option('display.max_rows', 200)
plt.style.use('ggplot')
pandarallel.initialize(progress_bar=False)

In [None]:
games = pd.read_csv('../input/nfl-big-data-bowl-2022/games.csv')
players = pd.read_csv('../input/nfl-big-data-bowl-2022/players.csv')
plays = pd.read_csv('../input/nfl-big-data-bowl-2022/plays.csv')

## Tracking data. Dtype tricks to save some memory
tracking = pd.DataFrame()
for year in range(2018,2021):
    temp_track = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking'+str(year) +'.csv')
    tracking = tracking.append(temp_track)
    del temp_track
#tracking = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking2018.csv')
tracking.fillna({"nflId":-1, "jerseyNumber":-1},inplace=True)
tracking = tracking.astype({'nflId':np.int32, 'jerseyNumber':np.int32})
float_cols = tracking.select_dtypes(include=["float64"]).columns.to_list()
tracking[float_cols] = tracking[float_cols].astype(np.float16)

pff = pd.read_csv("../input/nfl-big-data-bowl-2022/PFFScoutingData.csv")
plays["gameClockMinute"]= plays.apply(lambda row: int(row["gameClock"].split(":")[0]), axis=1)

plays = plays.merge(games, on=["gameId"],how="left",validate="m:1")

Standardize the tracking positions as if all games are played to the direction of left.

In [None]:
tracking.loc[tracking["playDirection"] == "right", "x"] = 120-tracking.loc[tracking["playDirection"] == "right", "x"]
tracking.loc[tracking["playDirection"] == "right", "y"] = 160.0/3-tracking.loc[tracking["playDirection"] == "right", "y"]

In [None]:
fig = px.pie(plays.query("specialTeamsPlayType=='Punt'"), names="specialTeamsResult")
fig.show()

In [None]:
px.pie(plays.query("specialTeamsPlayType =='Punt'"),names="specialTeamsResult")

In [None]:
end_plays = plays.merge(
    plays.groupby("gameId").playId.max().reset_index(), 
    on=["gameId", "playId"], 
    how = "right")[["gameId","playId","preSnapHomeScore","preSnapVisitorScore","homeTeamAbbr","visitorTeamAbbr"]]

## Features for Each Punt
The team statistics stays unchanged during a game, but the following ones change for each punt. Namely:
+ [x] Mean of Y and count for Vises on the left and right side
+ [x] Mean of Y and count for Gunners on the left and right side
+ [x] X and Y of football
+ [x] X and Y position of the punter and the returner
+ [x] Quarter and minute (Available on plays)
+ [x] Punt Landed
+ [x] Punt Received

In [None]:
punt_plays = plays.query("specialTeamsPlayType =='Punt'").copy()
punt_plays

In [None]:
punt_plays["touchdown"] = punt_plays.playDescription.apply(lambda descr: False if "TOUCHDOWN NULLIFIED" in descr else "TOUCHDOWN" in descr )

## Create Labels
First we need to get the starting point for the possession team. But yardlineNumber is not always counted from the side of the posssesion team.

In [None]:
(punt_plays.possessionTeam == punt_plays.yardlineSide).value_counts()

So new column is created to indicated yardlineNumber from the side of possesion team.
But when the side is not the possession team, the count should be done from the other side.

Also to indicate the yardlineNumber of the yardsToGo from the punter side, poss_yardlineNumber_toGo column is created

In [None]:
punt_plays["poss_yardlineNumber"] = punt_plays.yardlineNumber
punt_plays.loc[punt_plays.possessionTeam != punt_plays.yardlineSide, ["poss_yardlineNumber"]] = 100 - punt_plays.loc[punt_plays.possessionTeam != punt_plays.yardlineSide, ["poss_yardlineNumber"]]
punt_plays["poss_yardlineNumber_toGo"] = punt_plays.poss_yardlineNumber + punt_plays.yardsToGo

Now let us calculate the line number from the side of the receiving team, when the play is over.

In [None]:
punt_plays["left_to_end"] = 100-(punt_plays.poss_yardlineNumber + punt_plays.playResult)

Below, we see that:
- Almost all Touchbaks are resulted in 20 yardline, which makes sense after the ball crosses the goal line.
- On average Returns are resulted between 30 and 40 but we dont know it was due to punt returners' abilities or the punters aimed those area (or close to that area).
- The fair catch calls decrease a lot within the 10 yards of the receiving team.
- The ball is downed more in the same yards. This is actually the plot of punt returners' unfullfilled hopes.
- Muffed is distributed a large area on the returning team side, with counts of no partical difference. Considering that this is a momentary slip of the ball from the hand, it makes sense to see a white noise like distribution.

When it comes to labeling:
- Within 0-20 yardlines, it will be a **good progress**
- More than 20 yardlines will be a **bad progress**.
- When the result is downed, muffed(if the yardsToGo is exceeded) or touchdown (with left_to_end is equal to 0) it is **good result**.
- When the left_to_end is 100 and it is touchdown, it is **bad result**.

In [None]:
plt.figure(figsize = (15,8))
sns.histplot(
    x = "left_to_end" 
    ,hue = "specialTeamsResult"
    ,multiple = "stack"
    ,data = punt_plays
).set(title="Special Teams Result Distribution")
plt.show()

Just to check whether the `touchdown` and `left_to_end` information is correct, we check the plot below.
All looks as expected, whenever we have a touchdown, the `left_to_end` is either 100

In [None]:
plt.figure(figsize = (15,8))
sns.histplot(
    x = "left_to_end" 
    ,hue = "specialTeamsResult"
    ,multiple = "stack"
    ,bins=100
    ,data = punt_plays.query("touchdown==True")
).set(title="Special Teams Result Distribution")
plt.show()


In [None]:
def label_it(play_row):
    if (play_row.left_to_end == 100 and play_row.touchdown) or 100-play_row.left_to_end < play_row.poss_yardlineNumber:
        return "bad result"
    
    if (play_row.left_to_end == 0 and play_row.touchdown) or (play_row.specialTeamsResult in ["Muffed", "Downed"] and 100-play_row.left_to_end >=play_row.poss_yardlineNumber_toGo):
        return "good result"
    
    if  play_row.left_to_end < 20:
        return "good progress"
    else:
        return "bad progress"

In [None]:
punt_plays["punt_label"] = punt_plays.parallel_apply(label_it, axis=1)

In [None]:
plt.figure(figsize = (15,8))
sns.histplot(
    x = "left_to_end" 
    ,hue = "punt_label"
    ,multiple = "stack"
    ,data = punt_plays
).set(title="Punt Label Distribution")
plt.show()

In [None]:
plt.figure(figsize = (15,8))
sns.countplot(x="punt_label", data=punt_plays)
plt.show()

Since *good result* and *bad result* do not have many representatives, let us just blend them into *good progress* and *bad progress*, respectively.

In [None]:
punt_plays["punt_label"] = punt_plays[["punt_label"]].apply(lambda row: "bad progress" if(row.punt_label in ["bad result"])  else "good progress" if row.punt_label in ["good result"] else row.punt_label,axis=1)

Final label distribution

In [None]:
plt.figure(figsize = (15,8))
sns.countplot(x="punt_label", data=punt_plays)
plt.show()

Extract punt statistics from each row of punt plays

In [None]:
def punt_stats(play_row):
    attacker,returner,returnerTeam = ("home","away", play_row.visitorTeamAbbr) if play_row.possessionTeam == play_row.homeTeamAbbr else ("away", "home",play_row.homeTeamAbbr)
    pff_row = pff.query(
        "gameId=={0} and playId=={1}".format(
            play_row.gameId,
            play_row.playId
        )
    ).iloc[0]
    
    the_play =  tracking.query(
        "gameId=={0} and playId=={1}".format(
            play_row.gameId,
            play_row.playId
        )
    )
    
    play_first_frame = the_play.query("frameId==1")
    
    football = play_first_frame.query("team=='football'").iloc[0]
    
    # Gunners Features
    if not pd.isna(pff_row.gunners):
        gunners_jersey = [int(full_jersey.strip().split(" ")[1]) for full_jersey in pff_row.gunners.split(";")]
        gunners = play_first_frame[play_first_frame.jerseyNumber.isin(gunners_jersey)].query("team=='{0}'".format(attacker))
        left_gunners = gunners.query("y <= {0}".format(football.y))
        right_gunners = gunners.query("y > {0}".format(football.y))
        
        if left_gunners.shape[0]>0:
            left_gunners_y = left_gunners.y.mean()
            num_left_gunners = left_gunners.shape[0]
        else:
            left_gunners_y=-25
            num_left_gunners = 0
        
        if right_gunners.shape[0]>0:
            right_gunners_y = right_gunners.y.mean()
            num_right_gunners = right_gunners.shape[0]
        else:
            right_gunners_y=-25
            num_right_gunners = 0
        
    else:
        left_gunners_y=-25
        num_left_gunners = 0
        right_gunners_y=-25
        num_right_gunners = 0 
    
    # Vises Features
    if not pd.isna(pff_row.vises):
        vises_jersey = [int(full_jersey.strip().split(" ")[1]) for full_jersey in pff_row.vises.split(";")]
        vises = play_first_frame[play_first_frame.jerseyNumber.isin(vises_jersey)].query("team=='{0}'".format(returner))
        left_vises = vises.query("y <= {0}".format(football.y))
        right_vises = vises.query("y > {0}".format(football.y))
        
        if left_vises.shape[0]>0:
            left_vises_y = left_vises.y.mean()
            num_left_vises = left_vises.shape[0]
        else:
            left_vises_y=-25
            num_left_vises = 0
        
        if right_vises.shape[0]>0:
            right_vises_y = right_vises.y.mean()
            num_right_vises = right_vises.shape[0]
        else:
            right_vises_y=-25
            num_right_vises = 0
        
    else:
        left_vises_y=-25
        num_left_vises = 0
        right_vises_y=-25
        num_right_vises = 0
        
    
    # Punter
    #punter = play_first_frame[play_first_frame.y == play_first_frame.y.max()].iloc[0]
    
    # Punt returner
    #punt_returner = play_first_frame[play_first_frame.y == play_first_frame.y.min()].iloc[0]
    
    punt_received = 1 if the_play.query("event=='punt_received' or event=='fair_catch'").shape[0]>0 else 0
    
    punt_landed = 1 if the_play.query("event=='punt_land'").shape[0]>0 else 0
    
    
    # Get score difference : Punting Team - Returning Team
    score_diff = play_row.preSnapHomeScore - play_row.preSnapVisitorScore
    if attacker != "home":
        score_diff *= -1
    
    return [
        play_row.gameId 
        ,play_row.playId
        #,punter.x
        #,punter.y
        ,left_gunners_y
        ,num_left_gunners
        ,right_gunners_y
        ,num_right_gunners
        ,left_vises_y
        ,num_left_vises
        ,right_vises_y
        ,num_right_vises
        #,punt_returner.nflId
        #,punt_returner.x
        #,punt_returner.y
        ,punt_received
        ,punt_landed
        ,football.x
        ,football.y
        ,score_diff
        ,returnerTeam
    ]

In [None]:
punt_feats = punt_plays.parallel_apply(punt_stats,axis=1,result_type ='expand')
punt_feats.columns=[
    "gameId"
    ,"playId"
    #,"punter_x"
    #,"punter_y"
    ,"left_gunners_y"
    ,"num_left_gunners"
    ,"right_gunners_y"
    ,"num_right_gunners"
    ,"left_vises_y"
    ,"num_left_vises"
    ,"right_vises_y"
    ,"num_right_vises"
    #,"punt_returner_x"
    #,"punt_returner_y"
    ,"punt_received"
    ,"punt_landed"
    ,"football_x"
    ,"football_y"
    ,"score_diff"
    ,"returnerTeam"
]

In [None]:
punt_feats

In [None]:
punt_data = punt_plays.reset_index()\
        .merge(punt_feats, how="inner", on=["gameId","playId"])\
        .merge(pff[["gameId", "playId", "hangTime"]])[
        [
            "index"
            #,"possessionTeam"
            #,"returnerTeam"
            #,"home_num_wins" 
            #,"home_num_draws"
            #,"visitor_num_wins"
            #,"visitor_num_draws"
            ,"hangTime"
            ,"left_gunners_y"
            ,"num_left_gunners"
            ,"right_gunners_y"
            ,"num_right_gunners"
            ,"left_vises_y"
            ,"right_vises_y"
            ,"punt_received"
            ,"punt_landed"
            #,"football_x"
            ,"football_y"
            ,"score_diff"
            ,"gameClockMinute"
            ,"kickLength"
            ,"yardsToGo"
            ,"poss_yardlineNumber"
            ,"punt_label"
        ]
    ]
punt_data.set_index("index", inplace=True)
punt_data["hangTime"] = punt_data.hangTime.fillna(0)
punt_data["kickLength"] = punt_data.kickLength.fillna(0.0000001)
punt_data["hang_per_length"] = punt_data.hangTime / punt_data.kickLength
punt_data["kick_end"] = punt_data.kickLength + punt_data.poss_yardlineNumber
punt_data.dropna()

To save some time in the next runs, save the `punt_data` to a file.

In [None]:
punt_data.to_csv("punt_data.csv")
#punt_data = pd.read_csv("punt_data.csv", index_col=0)

## Preprocessing
Because we will be using xgboost and random forest as the models, there is no added value of monotonic scaling. The only preprocessing will be for the label encoding.

In [None]:
label_pipeline = Pipeline(steps=[
  ("label_encoder", preprocessing.OrdinalEncoder())
])
#features_pipeline = Pipeline(steps=[
#    ("feature_scaler", preprocessing.MinMaxScaler())
#])

Divide the data to train and test sets, then run the pipelines

In [None]:
X = punt_data.drop(["punt_label"], axis=1)
y = punt_data[["punt_label"]]
X_train, _, y_train_raw, _ = train_test_split(X,y, shuffle=False, test_size=0.2)

y_train = label_pipeline.fit_transform(y_train_raw)

## Short EDA Before Training

In [None]:
train = X_train.merge(pd.get_dummies(y_train_raw["punt_label"], prefix="label"), left_index=True, right_index=True)

In [None]:
train.info()

In [None]:
train.describe()

In [None]:
plt.figure(figsize = (15,8))
sns.heatmap(train.corr())
plt.show()

# Models

In [None]:
cv_split  = StratifiedKFold(shuffle=True, random_state=10)
result_models=dict()

models = {    
    "random_forest":[
        RandomForestClassifier(n_jobs=4),
        {
            "max_depth":range(3,15,1)
            ,"min_samples_split":range(2,10,2)
        }
    ],
    "xgboost":[
        XGBClassifier(eval_metric="mlogloss", use_label_encoder=False, objective="binary:logistic"), {
            "max_depth":range(3,17,2)
            ,"min_child_weight":range(1,6,1)
            ,"eta" : np.linspace(0.1,0.2,2)
    }],
    
}

for model in models:
    gridcv = GridSearchCV(
        estimator=models[model][0], 
        scoring="f1_weighted", 
        param_grid=models[model][1], 
        cv=cv_split,
        refit=True
    )
    
    %time gridcv.fit(X_train, y_train[:,0])
    result_models[model] = gridcv.best_estimator_
    print("Model:",model)
    print("Best Params:")
    print(gridcv.best_params_)
    print("Best Score:")
    print(gridcv.best_score_)
    print("###########################################################################")

In [None]:
result_models["xgboost"].get_booster().feature_names = X.columns.to_list()
plot_importance(result_models["xgboost"])

In [None]:
label_pipeline["label_encoder"].categories_

SHAP for Random Forest model

In [None]:
rf_explainer = shap.Explainer(result_models["random_forest"], output_names=label_pipeline["label_encoder"].categories_)
rf_shap_values = rf_explainer(pd.DataFrame(X_train, columns=X.columns.to_list()))
rf_explanation = shap.Explanation(rf_shap_values[:, :, 1], data=X_train, feature_names=X.columns.to_list())
shap.plots.beeswarm(rf_explanation,max_display=20)

SHAP for XGBoost model

In [None]:
xgb_explainer = shap.Explainer(result_models["xgboost"], output_names=label_pipeline["label_encoder"].categories_)
xgb_shap_values = xgb_explainer(pd.DataFrame(X_train, columns=X.columns.to_list()))
shap.plots.beeswarm(xgb_shap_values, max_display=20)

## After a Simpler Model
Before doing any model explanation, I would like to try to get the model simpler hopefully without losing any significant accuracy or F1 score. For this purpose, my focus will be on random forest model since the SHAP value of xgboost model is more heavily dependend on a single feauture: **kick end**, while random forest model shows the same symptoms but to a smaller extend, so I think it is less prone to overfitting. In any case, the smallest mean absolute SHAP values of features are mostly similar with minor differences.

Starting from bottom of the graph, I would like to put the features up to the kick end (excluding) into a simple feature selection process. Let's start with it for a final model!

Please note that when I ran on my local, the order was slightly different. It would be better I would use a random state for reproducability, but I don't want to touch it since it is not affecting the end result.

In [None]:
features_to_select = [
    "num_right_gunners"
    ,"num_left_gunners"
    ,"gameClockMinute"
    ,"score_diff"
    ,"yardsToGo"
    ,"football_y"
    ,"right_gunners_y"
    ,"right_vises_y"
    ,"left_vises_y"
    ,"left_gunners_y"
    ,"hangTime"
    ,"kickLength"
    ,"hang_per_length"
    ,"punt_landed"
    ,"punt_received"
    ,"poss_yardlineNumber"
]
cv_split  = StratifiedKFold(shuffle=True, random_state=10)
model_results=[]

models = {    
    "random_forest":[
        RandomForestClassifier(n_jobs=4),
        {
            "max_depth":range(3,15,1)
            ,"min_samples_split":range(2,10,2)
        }
    ],
    "xgboost":[
        XGBClassifier(eval_metric="mlogloss", use_label_encoder=False, objective="binary:logistic"), {
            "max_depth":range(3,17,2)
            ,"min_child_weight":range(1,6,1)
            ,"eta" : np.linspace(0.1,0.2,2)
    }],
    
}
for n_drop in range(1,len(features_to_select)+1):
    X = punt_data.drop(
    [
        "punt_label",
        *features_to_select[:n_drop]
        
    ], axis=1
    )
    y = punt_data[["punt_label"]]
    X_train, _, y_train_raw, _ = train_test_split(X,y, shuffle=False, test_size=0.2)

    y_train = label_pipeline.fit_transform(y_train_raw)

    for model in models:

        gridcv = GridSearchCV(
            estimator=models[model][0], 
            scoring="f1_weighted", 
            param_grid=models[model][1], 
            cv=cv_split,
            refit=True
        )

        %time gridcv.fit(X_train, y_train[:,0])
        train_preds = gridcv.best_estimator_.predict(X_train)
        model_results.append(
            (
                model 
                ,n_drop
                ,gridcv
                ,gridcv.best_score_
            )
        )
        print("Model:",model, "Num to Drop:", n_drop)
        print("Best Params:")
        print(gridcv.best_params_)
        print("Best Score:")
        print(gridcv.best_score_)
        print("###########################################################################")

Let us have a closer look on the results

In [None]:
model_res = pd.DataFrame(model_results, columns=["model", "features_dropped", "GridSearchObj", "cv_f1"])
model_scores = model_res.drop(["GridSearchObj"],axis=1)
model_scores

In [None]:
plt.figure(figsize=(14,10))
sns.lineplot(x="features_dropped",y="cv_f1", style="model", data=model_scores)
plt.show()

As seen in the above graph, after dropping 13 features the validation results does not even change. But one obvious point is that, with only `kick end` feature f1 is close to 0.865. Additional features 4 features give us only about 0.03 f1.

Random forest model looks like performing better in general also in 13-features-dropped model, so I will go on with this model.

## Model Test Results
Let us now check how this model performs and takes decisions in the test set!

In [None]:
n_drop=13
X = punt_data.drop(
    [
        "punt_label",
        *features_to_select[:n_drop]
        
    ], axis=1
    )
y = punt_data[["punt_label"]]
_, X_test, _, y_test_raw = train_test_split(X,y, shuffle=False, test_size=0.2)

y_test = label_pipeline.transform(y_test_raw)

In [None]:
final_model = model_res.query("features_dropped==13 and model=='random_forest'").iloc[0]["GridSearchObj"].best_estimator_
y_pred = final_model.predict(X_test)
print(classification_report(y_test, y_pred))

We have a similar score on the test set, awesome! Let us now check how these decisions are made with `shap`.

## Exploring the Model with SHAP

In [None]:
rf_explainer = shap.Explainer(final_model, output_names=label_pipeline["label_encoder"].categories_)
rf_shap_values = rf_explainer(X_test)
rf_explanation = shap.Explanation(rf_shap_values[:, :, 1], data=X_test, feature_names=X.columns.to_list())
shap.plots.beeswarm(rf_explanation)

### Kick End
It is clear that `kick_end` is the most important feature of the model. Just to remind, `kick_end` is the yardline number where the ball drops (either received or landed or out of bounds) where the yardlines are counted starting from punting team end zone. In general higher is better.

As seen in the graph, higher the `kick_end` it is more likely for a good result for the punting team, up to a point. Then, the effect starts to work in the reverse direction. Let us now check the distribution in a better graph.

In [None]:
#fig = plt.figure(figsize=(12,10))
shap.plots.scatter(rf_explanation[:,"kick_end"], show=False)
fig = plt.gcf() 
fig.set_size_inches(12, 8)
plt.show()

So the model is very realistic, if the punter does not punt the ball to within the 20th yardline of the receiving team, then this works against the punting team. Given the definition of *good progress* this just makes sense.

Between 80 and 100 the SHAP values increase with a constant speed. But very close to 100 and in 100, the SHAP values decrease, because the possibility of touchback is increasing and the punt returner has more room to advance the ball back.

###  End of Punt Combined with the Start
Let us check the effect of `kick_end` together with `poss_yardlineNumber`.

In [None]:
shap.plots.scatter(rf_explanation[:,"kick_end"], color=rf_explanation[:,"poss_yardlineNumber"], show=False, cmap=plt.get_cmap("cool"))
fig = plt.gcf() 
fig.set_size_inches(12, 8)
plt.show()

On the right part of the graph, we can see that SHAP value is not really dependent on the start of the punt. 

The story is different on the part where `kick_end < 80`. The lower the poss_yardlineNumber the SHAP value is less negative. I believe this is caused by the hang time, since it is higher when the punt started closer to the punting side and giving more chance to tackle the punt returner.The SHAP starts be very negative with higher `poss_yardlineNumber`, which could be regarded as failure since the start point is close to kick end. What I would expect is that the kick would be more precise, closer to the end zone when punted from a closer point. Plus, the hang time is lower when the punt is kicked from a point closer to the receiving team, opening a shorter time window for gunners, to reach to the punt returner before the ball comes back to the ground. According to the model, it looks like the punting team is better punt the ball to the further point possible of course under 100th yardline :) This becomes even more critical when punting from a further point.

### Punt Landed and Received

In [None]:
fig_main, axes = plt.subplots(ncols=2)
shap.plots.scatter(rf_explanation[:,"kick_end"], color=rf_explanation[:,"punt_landed"], show=False, ax = axes[0], cmap=plt.get_cmap("cool"))
shap.plots.scatter(rf_explanation[:,"kick_end"], color=rf_explanation[:,"punt_received"], show=False, ax = axes[1], cmap=plt.get_cmap("cool"))
fig_main.set_size_inches(16, 6)
fig_main.show()

For the part between 80 and 100, the top line where the punt is neither received nor landed would mean it is out of bounds. So it is a direct gain without any other possibility, that is why we see this small line with higher SHAP values. For the other 2 parts where it is either received or landed, closer to `kick_end: 80` punt received has a lower SHAP than punt landed. As the `kick_end` increases, SHAP value of the `punt_received=True` increases more. So the model prefers that the punt returner let the ball bounce within last 10 yardlines, to give a slightly lower SHAP to the *good result*.

Between 0 and 55 for the `kick_end`, the line below is mostly neither received nor landed. I believe the model treats them as unsuccessful exections since the SHAP is very low. I guess, they could be short punts, blocked punts or even fake punts.

For the cluster between 40 and 80: It is clear that punt landed is preferable to punt received for the punting team, as the negative SHAP is closer to 0. The below cluster, where the punt is received, could be explained by the `poss_yardlineNumber`: if you give more time to gunners your chances increase. But also by the `kick_end` itself since closer to the 80 yardline is of course better.

### Focus on the Start of Punt

In [None]:
plt.figure(figsize=(14,8))
norm = plt.Normalize(rf_explanation[:,"kick_end"].data.min(), rf_explanation[:,"kick_end"].data.max())
sm = plt.cm.ScalarMappable(cmap=plt.get_cmap("tab20"), norm=norm)
sm.set_array([])
ax = sns.scatterplot(x=rf_explanation[:,"poss_yardlineNumber"].data, y=rf_explanation[:,"poss_yardlineNumber"].values, hue=rf_explanation[:,"kick_end"].data, palette=plt.get_cmap("tab20"))
ax.get_legend().remove()
ax.figure.colorbar(sm, label="kick_end")
ax.set_ylabel('SHAP Value for poss_yardlineNumber')
ax.set_xlabel('poss_yardlineNumber')

plt.show()


fig_main, axes = plt.subplots(ncols=2)
shap.plots.scatter(rf_explanation[:,"poss_yardlineNumber"], color=rf_explanation[:,"punt_landed"], show=False, ax = axes[0], cmap=plt.get_cmap("cool"))
shap.plots.scatter(rf_explanation[:,"poss_yardlineNumber"], color=rf_explanation[:,"punt_received"], show=False, ax = axes[1], cmap=plt.get_cmap("cool"))
fig_main.set_size_inches(16, 6)
fig_main.show()

I am actually fascinated that you can draw a dinosaur with SHAP values of a random forest model. In this one what really grabbed my attention is that when the `kick_end` is between 80 and 90, the SHAP value from `poss_yardlineNumber` is even lower than the tail of the dinosaur (where the `kick_end` is mostly below 80). For the values between 80 and 90, we almost have a linear function, or tanh to be more precise. The closer the start point to the receiving team's endzone, the better the chances. Around 40 `poss_yardlineNumber` is a turning point from negative SHAP to a positive. Maybe this has something to do with that mostly they are received, not landed. I might be missing some field knowledge, so I would appreciate if you could help in the comments.

### Another Check on the Interpretations
I tried to interpret the model, but this is like accepting trends are all correct. Now I would like to do one more visualization to catch, where we can see the points it failed.

In [None]:
X_check = X_test.copy()
X_check["result_correct"] = (y_pred == y_test.squeeze())

In [None]:
#plt.figure(figsize=(14,8))
fig, axes = plt.subplots(ncols=2,figsize=(16,6))
sns.scatterplot(x=rf_explanation[:,"kick_end"].data, y=rf_explanation[:,"kick_end"].values, hue=X_check["result_correct"], ax=axes[0])
axes[0].set_ylabel('SHAP Value for kick_end')
axes[0].set_xlabel('kick_end')

shap.plots.scatter(rf_explanation[:,"kick_end"], color=rf_explanation[:,"poss_yardlineNumber"], show=False, cmap=plt.get_cmap("cool"), ax=axes[1])

plt.show()

fig_landed, axes_l = plt.subplots(ncols=2)
shap.plots.scatter(rf_explanation[:,"kick_end"], color=rf_explanation[:,"punt_landed"], show=False, ax = axes_l[0], cmap=plt.get_cmap("cool"))
shap.plots.scatter(rf_explanation[:,"kick_end"], color=rf_explanation[:,"punt_received"], show=False, ax = axes_l[1], cmap=plt.get_cmap("cool"))
fig_landed.set_size_inches(16, 6)
fig_landed.show()

So a clear pattern is in the cluster between 40-80 `kick_end` where the landed is preferred to the received mentioned. A big portion of the mispredictions are in section. Therefore this comment is actually not so correct, and to be improved with more/better data or features.

Another suspicious part is between 80 and 100, it looks like mostly the received punts are incorrect, even though it does not look bad as in the previous part. For this one, I think it is better if we see the numbers.

In [None]:
X_check.query("kick_end>80").groupby(["punt_received", "result_correct"]).count()

In [None]:
92/(92+290)*100

The suspicions are correct, when the punt is received the predictions are wrong around 24%, after the 80th yardline. However this is not changing the idea that the model prefers punt returner to bounce the ball. I believe the model is not taking the possibilty of punt returners options or capabilities. More features like the number of gunners close to the ball when it is received or player features like speed acceleration of the punt returner could help over here.

## Future Work
I believe the success could be made more sophisticated. If I would have more data, I really wanted to keep *good result* and *bad result* in the label to get more insight.
Also, there could be more features to help the success of the model. As mentioned there is a clear pattern in the models mispredictions. This could be eliminated with good features.

Thank you for reading :)

If you want to reach me, here it is https://www.linkedin.com/in/akyazi/