# Fatigue Factors and Sprint Speed on Deep Kickoffs (2020 Season)
<img src="https://c.tenor.com/gAIIE5tIz4MAAAAd/superbowllii-superbowl.gif" style="width:500px;">

## During each NLF kickoff (at the beginning of the half or after a score), each player on the kickoff coverage unit will sprint down the field to tackle the returner. Dhriti Yandpally's [notebook](https://www.kaggle.com/dhritiyandapally/speed-on-kickoff-plays-across-surfaces-python) determined each player's max speed reached during the initial sprint (before they are blocked / obstructed by the return team). Building off that work we perform some basic EDA on the impact of game duration and cumulative kickoffs on a players max speed on the kickoff coverage unit. We explore the difference across positions and suggest further statistical analysis that could be performed with the given data.

<a id="ReadData"></a>
# 1. Read Data

In [None]:
import numpy as np 
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

In [None]:
# Function to decrease data usage in notebook (from Baek Kyun Shin)
def downcast(df, verbose=True):
    start_mem = df.memory_usage().sum() / 1024**2
    for col in df.columns:
        dtype_name = df[col].dtype.name
        if dtype_name == 'object':
            pass
        elif dtype_name == 'bool':
            df[col] = df[col].astype('int8')
        elif dtype_name.startswith('int') or (df[col].round() == df[col]).all():
            df[col] = pd.to_numeric(df[col], downcast='integer')
        else:
            df[col] = pd.to_numeric(df[col], downcast='float')
    end_mem = df.memory_usage().sum() / 1024**2
    if verbose:
        print('{:.1f}% Compressed'.format(100 * (start_mem - end_mem) / start_mem))
    
    return df

In [None]:
#includes info on games
df_games = pd.read_csv("../input/nfl-big-data-bowl-2022/games.csv")
df_games = downcast(df_games);

#includes play-by-play info on specific plays
df_plays = pd.read_csv("../input/nfl-big-data-bowl-2022/plays.csv")
df_plays = downcast(df_plays);

#includes background info for players
df_players = pd.read_csv("../input/nfl-big-data-bowl-2022/players.csv")

#Reading tracking data (2020 only for this case)
df_tracking = pd.read_csv("../input/nfl-big-data-bowl-2022/tracking2020.csv")
df_tracking = downcast(df_tracking);

#includes scouting info on specific plays
df_PFFScouting = pd.read_csv("../input/nfl-big-data-bowl-2022/PFFScoutingData.csv")

#loading data from Lee Sharpe's public GitHub repository. It includes info on field surface.
df_leeSharpeGames = pd.read_csv("https://raw.githubusercontent.com/nflverse/nfldata/master/data/games.csv")

In [None]:
df_tracking.head()

# 2. Compute the elapsed (actual) time as a potential factor in fatigue

In [None]:
#convert to datetime
df_tracking['time'] = df_tracking['time'].astype('datetime64[ns]')

#find the start of the game by actual time
df_gameStartTimes = pd.DataFrame(df_tracking.groupby(['gameId'])['time'].min())
df_gameStartTimes.reset_index(inplace=True)
df_gameStartTimes.rename(columns={'time':'gameStartTime'}, inplace=True)

#add the game start time to the tracking data
df_tracking = pd.merge(left=df_tracking, right=df_gameStartTimes, how= 'left', on= ['gameId'])

#change type to datetime
df_tracking['gameStartTime'] = df_tracking['gameStartTime'].astype('datetime64[ns]')

#compute the elapsed actual time in minutes
df_tracking['elapsedTime'] = (df_tracking['time'] - df_tracking['gameStartTime']).astype('timedelta64[m]')

#create a table for elapsed time for the start of each play in each game
df_playElapseTime = pd.DataFrame(df_tracking[['gameId','playId','elapsedTime']].groupby(['gameId','playId'], as_index=False)['elapsedTime'].min())
df_playStartTime = pd.DataFrame(df_tracking[['gameId','playId','time']].groupby(['gameId','playId'], as_index=False)['time'].min())

df_playElapseTime['time'] = df_playStartTime.time 

print(df_playElapseTime.head())

# 3. Clean Data (from Dhriti Yandpally's [notebook](https://www.kaggle.com/dhritiyandapally/speed-on-kickoff-plays-across-surfaces-python))

### To start the cleaning process, we must first use PFF scouting data to identify special teams safeties who are not actively advancing downfield. To be able to merge this information, we will use tracking data to get a map of a player's jersey number to their `nflId` in each game.

In [None]:
#using df_tracking to merge to jersey numbers

#selecting variables of interest & dropping duplicates - jersey # is constant throughout game
df_jerseyMap = df_tracking.drop_duplicates(subset = ["gameId", "team", "jerseyNumber", "nflId"])

#joining to games
df_jerseyMap = pd.merge(df_jerseyMap, df_games, left_on=['gameId'], right_on =['gameId'])

#getting name of team
conditions = [
    (df_jerseyMap['team'] == "home"),
    (df_jerseyMap['team'] != "home"),
]

values = [df_jerseyMap['homeTeamAbbr'], df_jerseyMap['visitorTeamAbbr']]

#adjusting jersey number so that it includes 0 when < 10
df_jerseyMap['team'] = np.select(conditions, values)

df_jerseyMap['jerseyNumber'] = df_jerseyMap['jerseyNumber'].astype(str)

df_jerseyMap.loc[df_jerseyMap['jerseyNumber'].map(len) < 4, 'jerseyNumber'] = "0"+df_jerseyMap.loc[df_jerseyMap['jerseyNumber'].map(len) < 4, 'jerseyNumber'].str[:2]

df_jerseyMap['jerseyNumber'] = df_jerseyMap['jerseyNumber'].str[:2]

#getting team and jersey
df_jerseyMap['teamJersey'] = df_jerseyMap['team'] + ' ' + df_jerseyMap['jerseyNumber'].str[:2]

#map to merge nflId to teamJersey
df_jerseyMap = df_jerseyMap[['gameId', 'nflId', 'teamJersey']]

df_jerseyMap = df_jerseyMap.sort_values(['gameId', 'nflId', 'teamJersey'])

In [None]:
#dataframe will include gameId, playId and nflId for each special teams safety
df_PFF_specialTeamSafeties = df_PFFScouting.copy()

#splitting into a column for each special teams safety
df_PFF_specialTeamSafeties[['teamJersey1', 'teamJersey2', 'teamJersey3', 'teamJersey4', 'teamJersey5', 'teamJersey6']] = df_PFF_specialTeamSafeties['specialTeamsSafeties'].str.split('; ',expand=True)

#selecting jersey numbers for each team
df_PFF_specialTeamSafeties = df_PFF_specialTeamSafeties[['gameId', 'playId', 'teamJersey1', 'teamJersey2', 'teamJersey3', 'teamJersey4', 'teamJersey5', 'teamJersey6']]

#gathering data
df_PFF_specialTeamSafeties = pd.melt(df_PFF_specialTeamSafeties, id_vars =['gameId', 'playId'], value_vars =['teamJersey1', 'teamJersey2', 'teamJersey3', 'teamJersey4', 'teamJersey5', 'teamJersey6'],
               value_name = 'teamJersey')

#dropping NA rows
df_PFF_specialTeamSafeties.dropna()

#joining to jersey map
df_PFF_specialTeamSafeties = pd.merge(df_PFF_specialTeamSafeties, df_jerseyMap, on = ['gameId', 'teamJersey'])

#selecting variables of interest
df_PFF_specialTeamSafeties = df_PFF_specialTeamSafeties[['gameId', 'playId', 'nflId']]

df_PFF_specialTeamSafeties = df_PFF_specialTeamSafeties.sort_values(['gameId', 'playId', 'nflId'])

### Next, we will create a dataframe to only include deep kickoffs so that we can remove all unnecessary rows in the tracking data.

In [None]:
#creating data frame that will only include deep kickoffs
df_deepKickoffs = df_plays.copy()

#joining the scouting data
df_deepKickoffs = pd.merge(df_deepKickoffs, df_PFFScouting, on = ['gameId', 'playId'])

#filtering for kickoff plays only & deep kickoffs only
df_deepKickoffs = df_deepKickoffs[(df_deepKickoffs['specialTeamsPlayType'] == 'Kickoff') & (df_deepKickoffs['kickType'] == 'D')]

#selecting variables of interest
df_deepKickoffs = df_deepKickoffs[['gameId', 'playId', 'kickerId', 'possessionTeam']]

### Now, we will use the data frames we created to filter the tracking data. We will use `df_PFF_specialTeamSafeties` to filter out players who were special teams safeties and use `df_deepKickoffs` to remove plays where there was not a deep kickoff. For each player on each play in the the tracking data, we will filter for the first 40 frames which approximately corresponds to the initial sprint portion of the play. Over that interval, we calculate the maximum speed reached for each player in the play.

In [None]:
df_maxSpeeds = df_tracking.copy()

#joining games
df_maxSpeeds = pd.merge(df_maxSpeeds, df_games, on = 'gameId')

#using a join to remove special teams safeties
df_maxSpeeds = pd.merge(left = df_maxSpeeds, right = df_PFF_specialTeamSafeties, how='left', indicator=True, on = ['gameId', 'playId', 'nflId'])

df_maxSpeeds.loc[df_maxSpeeds._merge != 'left_only', :].head()

In [None]:
df_maxSpeeds = df_maxSpeeds.loc[df_maxSpeeds._merge == 'left_only', :].drop(columns = '_merge')

#joining deep kickoffs
df_maxSpeeds = pd.merge(df_maxSpeeds, df_deepKickoffs, on = ['gameId', 'playId'])

#removing the kicker from the tracking data
df_maxSpeeds = df_maxSpeeds[(df_maxSpeeds['kickerId'] != df_maxSpeeds['nflId']) &
                            
                            #player is on home team and kicking team is home
                            ((df_maxSpeeds['team']=='home') & (df_maxSpeeds['possessionTeam'] == df_maxSpeeds['homeTeamAbbr']) |
                             
                             #or player is on away team and kicking team is away
                            (df_maxSpeeds['team']=='away') & (df_maxSpeeds['possessionTeam'] == df_maxSpeeds['visitorTeamAbbr']))]

#select variables of interest
df_maxSpeeds = df_maxSpeeds[['gameId', 'playId', 'frameId', 'nflId', 'event', 's','time']]

#arranging by gameId, playId, frameId and nflId
df_maxSpeeds = df_maxSpeeds.sort_values(['gameId', 'playId', 'frameId'])

#grouping by gameId, playId and nflId & filtering for frames after kickoff
df_maxSpeeds = df_maxSpeeds.loc[df_maxSpeeds.groupby(['gameId', 'playId']).event.transform(lambda z: np.cumsum(z.isin(['kickoff', 'free_kick'])) >= 1)]

#grouping by gameId, playId and nflId & filtering for first 40 observations
df_maxSpeeds = df_maxSpeeds.groupby(['gameId', 'playId', 'nflId']).head(40).reset_index()

#calculating max speed for given play / player
df_maxSpeeds = df_maxSpeeds.groupby(['gameId', 'playId', 'nflId']).s.apply(lambda z: z.max()).reset_index()

#renaming speed column as maxSpeed
df_maxSpeeds = df_maxSpeeds.rename(columns={"s" : "maxSpeed"})

In [None]:
#merge the  play information of interest
df_maxSpeedPlays = pd.merge(left = df_maxSpeeds, right = df_plays, how='left', indicator=True, on = ['gameId', 'playId'])

#select variables of interest
df_maxSpeedPlays = df_maxSpeedPlays[['gameId', 'playId', 'nflId', 'maxSpeed', 'quarter', 'possessionTeam', 'preSnapHomeScore', 'preSnapVisitorScore','gameClock']]

#compute the score differential (home - away)
df_maxSpeedPlays['scoreDiff'] = df_maxSpeedPlays['preSnapHomeScore'] - df_maxSpeedPlays['preSnapVisitorScore'] 

#merge the home team abbr and week of game
df_maxSpeedPlays = pd.merge(left = df_maxSpeedPlays, right = df_games[['gameId','week','homeTeamAbbr','visitorTeamAbbr']], how='left', on = ['gameId'])

# Add a column whether the home team is kicking off
df_maxSpeedPlays['homeYes'] = df_maxSpeedPlays.possessionTeam.eq(df_maxSpeedPlays['homeTeamAbbr']).astype('int')

In [None]:
#will have max speeds and play/game situation info for each player with surface info
df_maxSpeeds2 = df_maxSpeedPlays.copy()

#merging to Lee Sharpe's data to get surface type
df_maxSpeeds2 = pd.merge(df_maxSpeedPlays, df_leeSharpeGames[['old_game_id', 'surface']], left_on = ['gameId'], right_on = ['old_game_id'])
df_maxSpeeds2 = df_maxSpeeds2.drop(columns = 'old_game_id')

#striping the surface column to remove extra spaces at the end
df_maxSpeeds2['surface'] = df_maxSpeeds2['surface'].transform(lambda x : x.str.strip())

#convert speed to mph 1 yd/sec = 2.0454 mph
ypsToMph = 2.04545
df_maxSpeeds2['maxSpeedMph'] = df_maxSpeeds2['maxSpeed']*ypsToMph

#add player name and position to data
df_maxSpeeds2 = pd.merge(df_maxSpeeds2, df_players[['nflId', 'Position', 'displayName']], how= 'left', left_on = ['nflId'], right_on = ['nflId'])

df_maxSpeeds2.head()

# 4. Kickoff Counts as potential factor in fatigue
### A kickoff coverage player completing their 1st kickoff versus their 7th may have an impact on their max speed. Compute the sequence of kickoffs for a player in a game `koPlayerSeq`

In [None]:
#arranging by gameId, playId, nflID
df_maxSpeeds2 = df_maxSpeeds2.sort_values(['gameId', 'playId', 'nflId'])

# Sequence of the kick off for a player in a game
df_maxSpeeds2['koPlayerSeq'] = df_maxSpeeds2.groupby(['nflId','gameId']).cumcount()+1
df_maxSpeeds2.sort_values(['maxSpeedMph'])

# 5. Elapsed time at the start of the kickoff

## While the game clock measures the duration of the game, the real time is also a potential factor in player fatigue. Record the total elapsed time (not game clock) for each kickoff in `elapsedTime`

In [None]:
#get the elapsed time from start of the game (minutes) for each play
df_maxSpeeds2 = pd.merge(left=df_maxSpeeds2, right=df_playElapseTime, how= 'left', left_on = ['gameId', 'playId'], right_on = ['gameId', 'playId'])

#save the data file
df_maxSpeeds2.to_csv('maxSpeeds2.csv')
df_maxSpeeds2.head()

# 6. EDA for max speed in different game situations


In [None]:
# (from Baek Kyun Shin)
def resumetable(df):
    print(f'Shape : {df.shape}')
    summary = pd.DataFrame(df.dtypes, columns=['Data Type'])
    summary = summary.reset_index()
    summary = summary.rename(columns={'index': 'Feature'})
    summary['Num of null'] = df.isnull().sum().values
    summary['Num of unique'] = df.nunique().values
    summary['First value'] = df.loc[0].values
    summary['Second value'] = df.loc[1].values
    summary['Third value'] = df.loc[2].values
    return summary

resumetable(df_maxSpeeds2)

In [None]:
df_maxSpeeds2.describe().T

In [None]:
#import grahpics packages
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight')

# 7. Distribution of max speeds of deep kickoffs?
### Given that the fastest measured speed in a human was Usain Bolt at 27.8 mph between 60 and 80 meters in his world record 100m in 2009, the players reporting max speed above that are clearly measurement errors. I'm going to leave them, but account for these outliers and those with max mph less than 10 mph when viewing the charts.

In [None]:
mpl.rc('font', size=15) 
plt.figure(figsize=(10, 5))

g = sns.histplot(data=df_maxSpeeds2, x='maxSpeedMph',bins=50)

#clean up the chart
g.set_title('Kickoff Coverage Player Max Speeds')
g.set_ylabel('')
g.set_xlabel('Player Max MPH on Kickoff Coverage');

# 8. What is the total number of kickoff coverage sprints for a player in game?
### A deep kickoff is typically a full sprint from the 35-yd line for 40+ yards. Some players do this as many as 10 times within a game while six or less is more typical.
### Here we see the distribution of kickoff coverage sprints in a game for each player. There are 910 players that participated in a total of 4 deep kickoffs in a game. There are 80 players who completed 8 kickoff coverage sprints in a single game.

In [None]:
#find the total number of kickoffs in a game for each player
df_totalKos = pd.DataFrame(df_maxSpeeds2.groupby(['gameId','nflId'])['koPlayerSeq'].max())
df_totalKos.reset_index(inplace=True)

df_totalKos.rename(columns={'koPlayerSeq':'totalKos'}, inplace=True)

mpl.rc('font', size=15) 
plt.figure(figsize=(10, 5))

g = sns.histplot(data=df_totalKos, x='totalKos',discrete=True)

#clean up the chart
g.set_title('Total Kickoff Sprints per Player in a Game')
g.set_xticks(range(1,11))
g.set_ylabel('Number of Players')
g.set_xlabel('Total Kickoff Coverage Player Sprints in Game');

In [None]:
df_totalKos.totalKos.value_counts().sort_index()

# 9. How do cumulative kickoff sprints impact player max sprint speed?

### Here we look that max speed for players on their first, second, third, etc. kickoff coverage of a game. While the range decreases, likely more due to sample size, the actual distribution does not appear to be significantly different with the average only dropping by roughly 0.5 mph from the first kickoff to the eigth of the game.

In [None]:
mpl.rc('font', size=15) 
plt.figure(figsize=(10, 5))

g = sns.violinplot(data=df_maxSpeeds2, y='maxSpeedMph', x='koPlayerSeq',color='#30a2da')

#clean up the chart
g.set_title('Player Max Speed on Kickoff')
g.set_ylabel('Max Speed (mph)')
g.set_xlabel('Cumulutive Kickoff Sprints for Player in Game');

mpl.rc('font', size=15) 
plt.figure(figsize=(10, 5))

h = sns.pointplot(data=df_maxSpeeds2, y='maxSpeedMph', x='koPlayerSeq',color='#30a2da', errwidth= 1, scale= 0.5)

#clean up the chart
h.set_title('Average Player Max Speed on Kickoff')
h.set_ylabel('Max Speed (mph)')
h.set_xlabel('Cumulutive Kickoff Sprints for Player in Game');

# 10. Do some positions feel the cumulative effects of repeated kickoff sprints in a game more than others?

### We look at the top 10 positions for players participating in kickoffs and visualize the impact of cumulative kickoffs on their average max speed. As would be expected, some positions are faster than others. The largest dropoffs in max speed appear to occur for MLB.

In [None]:
top10pos = df_maxSpeeds2.Position.value_counts().head(10).index.tolist()
df_top = df_maxSpeeds2[(df_maxSpeeds2.Position.isin(top10pos)) & (df_maxSpeeds2.koPlayerSeq < 9) & (df_maxSpeeds2.quarter < 5)]
g = sns.FacetGrid(data=df_top, col="Position", col_wrap=5, height=5, ylim=(15, 22))
g = g.map(sns.pointplot, "koPlayerSeq", "maxSpeedMph",color='#30a2da', errwidth= 1, scale= 0.5);

#clean up the chart
g.set_axis_labels('Cumulative Kickoff Sprints', 'Max Speed (mph)');

# 11. How much does players' kickoff sprint slow as the game progresses (by quarters)?

### The effect isn't as clear here likely due to a variety of factors (limited number of kickoffs).

In [None]:
g = sns.FacetGrid(data=df_top, col="Position", col_wrap=5, height=5, ylim=(15, 22))
g = g.map(sns.pointplot, "quarter", "maxSpeedMph",color='#30a2da', errwidth= 1, scale= 0.5);

#clean up the chart
#g.set(xticks=range(1,5))
g.set_axis_labels('Quarter', 'Max Speed (mph)');

# 12. How much does players' kickoff sprint slow as the game progresses (by actual time)?

### The effect isn't as clear here likely due to a variety of factors (limited number of kickoffs). Clearly a dropoff as the game progresses for some positions more than others.

In [None]:
#create 10 minute bins 
interval_range = pd.interval_range(start=0,end=220,freq=20,closed='left')
df_top.loc[:,'elapsedTimeBinned'] = pd.cut(df_top['elapsedTime'], bins=interval_range)

g = sns.FacetGrid(data=df_top, col="Position", col_wrap=5, height=5, ylim=(15, 22))
g = g.map(sns.pointplot, "elapsedTimeBinned", "maxSpeedMph", errwidth= 1, scale= 0.5);

#clean up the chart
g.set_xticklabels(rotation=90)
g.set_axis_labels('Game Duration (min)', 'Max Speed (mph)');

# 13. Kicking team win margin impact on sprint effort?

### Perhaps the current margin of victory for the kicking team might impact motivation and players will give a lesser effort (sprint)?

In [None]:
df_top['kickTeamMargin'] = np.where(df_top['homeYes'] == 1, df_top['scoreDiff'], -df_top['scoreDiff'])
df_top.describe()

### Looking at all the positions combined, there doesn't seem to be too much dropoff based on the score. Even when the kicking team is down by 30 (-30) or up by 30.

In [None]:
mpl.rc('font', size=15) 
plt.figure(figsize=(10, 5))

h = sns.scatterplot(data=df_top, y='maxSpeedMph', x='kickTeamMargin', color='#30a2da')

#clean up the chart
h.set_title('Average Player Max Speed on Kickoff')
h.set_ylabel('Max Speed (mph)')
h.set_xlabel('Cumulutive Kickoff Sprints for Player in Game');

### Let's look at the positions of players for the those from the top 10 most frequent (only in regulation and less than 9 cumulative kickoffs in a game). We see some of effect we might expect to see (dropoff in the tails where motivation due to large lead or large deficit) is exhibited.

In [None]:
g = sns.FacetGrid(data=df_top, col="Position", col_wrap=5, height=5, ylim=(10, 22))
g = g.map(sns.lineplot, "kickTeamMargin", "maxSpeedMph");
#g = g.map(sns.pointplot, "kickTeamMargin", "maxSpeedMph",color='#30a2da', errwidth= 1, scale= 0.5);
g.axes[0].legend()
#clean up the chart
g.set_axis_labels('Kick Team Score Margin', 'Max Speed (mph)');

### Does late game paint a different picture? Just looking at the fourth quarter.

In [None]:
g = sns.FacetGrid(data=df_top[df_top['quarter'] == 4], col="Position", col_wrap=5, height=5, ylim=(10, 22))
g = g.map(sns.lineplot, "kickTeamMargin", "maxSpeedMph");
#g = g.map(sns.pointplot, "kickTeamMargin", "maxSpeedMph",color='#30a2da', errwidth= 1, scale= 0.5);
g.axes[0].legend()
#clean up the chart
g.set_axis_labels('Kick Team Score Margin', 'Max Speed (mph)');

### Create 5 point interval margins and repeat. This is just for fourth quarter effects and some of the high/low dropoffs are more apparent (RB and OLB).

In [None]:
#create 10 minute bins 
interval_range = pd.interval_range(start=-35,end=50,freq=5,closed='left')
df_top.loc[:,'kickTeamMargin_bin'] = pd.cut(df_top['kickTeamMargin'], bins=interval_range)

g = sns.FacetGrid(data=df_top[df_top['quarter'] == 4], col="Position", col_wrap=5, height=5, ylim=(14, 22))
g = g.map(sns.pointplot, "kickTeamMargin_bin", "maxSpeedMph", errwidth= 1, scale= 0.5);

#clean up the chart
g.set_xticklabels(rotation=90)
g.set_axis_labels('Kick Team Score Margin', 'Max Speed (mph)');

# 14. Conclusion

### Just a quick EDA, but further exploration with this data would look at the potential impact of the score differential (Do players "relax" when the game is in hand?). I was intending to do a regression analysis with the effects:
* Postion
* Quarter
* Score Differential
* Cumulative KOs
* Real-time since the last KO

### I just don't have time and wanted to get my feet wet with the data and the Big Data Bowl process. Thanks to those that shared their starter code.

In [None]:
df_top.head()

In [None]:
#sort data by time
df_topTimeSort = df_top.sort_values(by=['time','nflId'],ascending=True)

#compute the lag of the max speed for each ko in a game
df_topTimeSort['lagSpeedMph'] = (df_topTimeSort.groupby((['gameId','nflId']))['maxSpeedMph'].shift(1))
df_topTimeSort[df_topTimeSort['nflId'] == 37211].head(20)

In [None]:
#import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [None]:
#regression with all interactions
preds = '+'.join(['C(Position)','lagSpeedMph*C(quarter)'])
my_formula = "maxSpeedMph~" + preds
mod = smf.ols(formula=my_formula, data= df_topTimeSort).fit()
print(mod.summary())

In [None]:
g = sns.FacetGrid(data=df_topTimeSort[df_topTimeSort['lagSpeedMph'] > 10].dropna(), col="Position", col_wrap=5, height=5, ylim=(0, 32),xlim=(10, 25))
g = g.map(sns.lineplot, 'lagSpeedMph',"maxSpeedMph", color='#30a2da');

#clean up the chart
#g.set(xticks=range(1,5))
g.set_axis_labels('lagSpeedMph', 'Max Speed (mph)');