When a quarterback takes a snap and drops back to pass, what happens next may seem like chaos. As offensive players move in various patterns, the defense works together to prevent successful pass completions and then to quickly tackle receivers that do catch the ball. In this year’s Kaggle competition, your goal is to use data science to better understand the schemes and players that make for a successful defense against passing plays.

![https://storage.googleapis.com/kaggle-competitions/kaggle/15696/logos/thumb76_76.png?t=2019-10-04-16-17-46](https://storage.googleapis.com/kaggle-competitions/kaggle/15696/logos/thumb76_76.png?t=2019-10-04-16-17-46)

In American football, there are a plethora of defensive strategies and outcomes. The National Football League (NFL) has used previous Kaggle competitions to focus on offensive plays, but as the old proverb goes, “defense wins championships.” Though metrics for analyzing quarterbacks, running backs, and wide receivers are consistently a part of public discourse, techniques for analyzing the defensive part of the game trail and lag behind. Identifying player, team, or strategic advantages on the defensive side of the ball would be a significant breakthrough for the game.

This competition uses NFL’s Next Gen Stats data, which includes the position and speed of every player on the field during each play. You’ll employ player tracking data for all drop-back pass plays from the 2018 regular season. The goal of submissions is to identify unique and impactful approaches to measure defensive performance on these plays. There are several different directions for participants to ‘tackle’ (ha)—which may require levels of football savvy, data aptitude, and creativity. As examples:

* What are coverage schemes (man, zone, etc) that the defense employs? What coverage options tend to be better performing?
* Which players are the best at closely tracking receivers as they try to get open?
* Which players are the best at closing on receivers when the ball is in the air?
* Which players are the best at defending pass plays when the ball arrives?
* Is there any way to use player tracking data to predict whether or not certain penalties – for example, defensive pass interference – will be called?
* Who are the NFL’s best players against the pass?
* How does a defense react to certain types of offensive plays?
* Is there anything about a player – for example, their height, weight, experience, speed, or position – that can be used to predict their performance on defense?

In [None]:
import numpy as np
import pylab as pl
import pandas as pd
import matplotlib.pyplot as plt 
%matplotlib inline
import seaborn as sns
from sklearn.utils import shuffle
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix,classification_report
from sklearn.model_selection import cross_val_score, GridSearchCV
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

In [None]:
week1 = pd.read_csv('../input/nfl-big-data-bowl-2021/week1.csv')
week2 = pd.read_csv('../input/nfl-big-data-bowl-2021/week2.csv')
week3 = pd.read_csv('../input/nfl-big-data-bowl-2021/week3.csv')
week4 = pd.read_csv('../input/nfl-big-data-bowl-2021/week4.csv')
week5 = pd.read_csv('../input/nfl-big-data-bowl-2021/week5.csv')
week6 = pd.read_csv('../input/nfl-big-data-bowl-2021/week6.csv')
week7 = pd.read_csv('../input/nfl-big-data-bowl-2021/week7.csv')
week8 = pd.read_csv('../input/nfl-big-data-bowl-2021/week8.csv')
week9 = pd.read_csv('../input/nfl-big-data-bowl-2021/week9.csv')
week10 = pd.read_csv('../input/nfl-big-data-bowl-2021/week10.csv')
week11 = pd.read_csv('../input/nfl-big-data-bowl-2021/week11.csv')
week12 = pd.read_csv('../input/nfl-big-data-bowl-2021/week12.csv')
week13 = pd.read_csv('../input/nfl-big-data-bowl-2021/week13.csv')
week14 = pd.read_csv('../input/nfl-big-data-bowl-2021/week14.csv')
week15 = pd.read_csv('../input/nfl-big-data-bowl-2021/week15.csv')
week16 = pd.read_csv('../input/nfl-big-data-bowl-2021/week16.csv')
week17 = pd.read_csv('../input/nfl-big-data-bowl-2021/week17.csv')

Data= week1.append([week1,week2,week3,week4,week5,week6,week7,week8,week9,week10,week11,week12,week13,week14,week15,week16,week17])
x = Data.iloc[:, [3]].values

Tracking data
Each of the 17 week[week].csv files contain player tracking data from all passing plays during Week [week] of the 2018 regular season. Nearly all plays from each [gameId] are included; certain plays or games with insufficient data are dropped. Each team and player plays no more than 1 game in a given week.

* time: Time stamp of play (time, yyyy-mm-dd, hh:mm:ss)

* x: Player position along the long axis of the field, 0 - 120 yards. See Figure 1 below. (numeric)

* y: Player position along the short axis of the field, 0 - 53.3 yards. See Figure 1 below. (numeric)

* s: Speed in yards/second (numeric)

* a: Acceleration in yards/second^2 (numeric)
 
* dis: Distance traveled from prior time point, in yards (numeric)

* o: Player orientation (deg), 0 - 360 degrees (numeric)

* dir: Angle of player motion (deg), 0 - 360 degrees (numeric)

* event: Tagged play details, including moment of ball snap, pass release, pass catch, tackle, etc (text)

* nflId: Player identification number, unique across players (numeric)

* displayName: Player name (text)

* jerseyNumber: Jersey number of player (numeric)

* position: Player position group (text)

* team: Team (away or home) of corresponding player (text)

* frameId: Frame identifier for each play, starting at 1 (numeric)

* gameId: Game identifier, unique (numeric)

* playId: Play identifier, not unique across games (numeric)

* playDirection: Direction that the offense is moving (text, left or right)

* route: Route ran by offensive player (text)





##  What does data tell us about defending the pass play? Let's find out.

In [None]:
Data.head(50)

In [None]:
Data.groupby(['position']).count()[['playDirection']]


In [None]:
Data.groupby(['playDirection']).count()[['position']]


In [None]:
Data.groupby(['position']).count()[['playDirection']].plot(kind='bar')


# plays

In [None]:
plays = pd.read_csv('../input/nfl-big-data-bowl-2021/plays.csv')
plays


yardlineNumber unique

In [None]:
print(plays.yardlineNumber.unique())


playType

In [None]:
play_Type = plays['playType'].value_counts()  [:50]
plt.figure(figsize=(6,4))
sns.barplot(play_Type.index, play_Type.values, alpha=0.8)
plt.ylabel('Number of playType', fontsize=12)
plt.xlabel('playType', fontsize=9)
plt.xticks(rotation=90)
plt.show();

## Looking at play type by down


In [None]:
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, FactorRange, FixedTicker
from bokeh.io import output_notebook
from collections import Counter
from bokeh.transform import factor_cmap
from bokeh.palettes import Paired, Spectral
import itertools
pd.set_option('display.max_columns', 150)
output_notebook()

# define range of yards to go I want to look at
y2g = range(1,50000)

# filter down the total team_df to just third downs
possessionTeam = plays.loc[plays['down'] == 3]

# create list of Counters of PlayType for each yard in my rage of interest
possessionTeam = [Counter(plays.loc[plays['yardsToGo'] == yrd]['playType']) for yrd in y2g]

# x-axis is y2g, defined above
x = y2g

# extract the sack count for each yard
y_sack = [play['play_type_sack'] for play in possessionTeam]

# extract the pass count for each yard
y_pass = [play['play_type_pass'] for play in possessionTeam]

# get the figure ready and put my lines on it
p = figure(title='Third Down Play Type by Yard to Go', toolbar_location=None, tools='',
           plot_height=350, plot_width=750)
p.line(x, y_pass, color='#2b83ba', legend='Pass', line_width=4)
p.line(x, y_sack, color='#abdda4', legend='Sack', line_width=4)
p.legend.location = 'top_left'
show(p)

In [None]:
#drop overtime
plays = plays[~(plays['quarter'] == 5)]
print(len(plays))

#convert time/quarters
def translate_game_clock(row):
    raw_game_clock = row['GameClock']
    quarter = row['quarter']
    minutes, seconds_raw = raw_game_clock.partition(':')[::2]

    seconds = seconds_raw.partition(':')[0]

    total_seconds_left_in_quarter = int(seconds) + (int(minutes) * 60)

    if quarter == 3 or quarter == 1:
        return total_seconds_left_in_quarter + 900
    elif quarter == 4 or quarter == 2:
        return total_seconds_left_in_quarter

if 'GameClock' in list (plays.columns):
    plays['secondsLeftInHalf'] = plays.apply(translate_game_clock, axis=1)

if 'quarter' in list(plays.columns):
    plays['half'] = plays['quarter'].map(lambda q: 2 if q > 2 else 1)

In [None]:
#filter rows
plays = plays[(plays.possessionTeam=='ATL') & (plays.down.isin([1.0, 2.0, 3.0, 4.0])) & ((plays.playType=='play_type_sack') | (plays.playType == 'play_type_pass'))]

In [None]:
#filter columns
plays = plays[['down','yardsToGo','possessionTeam','playType','yardlineSide','yardlineNumber','offenseFormation','passResult','playResult']]
plays.head()

# players

In [None]:
players = pd.read_csv('../input/nfl-big-data-bowl-2021/players.csv')
players

In [None]:
players['birthDate'] = pd.to_datetime(players['birthDate'])
players

# Counting Age each Players

In [None]:
now = pd.Timestamp('now')
players['Age'] = (now - players['birthDate']).astype('<m8[Y]').astype(int)
players


In [None]:
players_age = players['Age'].value_counts()  [:50]
plt.figure(figsize=(6,4))
sns.barplot(players_age.index, players_age.values, alpha=0.8)
plt.ylabel('Number of Age', fontsize=12)
plt.xlabel('Age', fontsize=9)
plt.xticks(rotation=90)
plt.show();

# Players: Age Groupings

Here we are comparing the weight of players with each Age, first group the collegeName and get the max, min, and mean of weight and age players

In [None]:
display(players[["weight","collegeName","Age",]].groupby(["collegeName"]).agg(["max",'mean',"min"]).style.background_gradient(cmap="cool"))

In [None]:
display(players[["nflId","height","weight","collegeName","displayName","Age"]].groupby(["nflId","collegeName","displayName",
                                                        "Age", ]).agg("sum").sort_values(by="Age",
                                                          ascending = False).head(100).style.background_gradient(cmap='autumn'))

# Players: Position Groupings

In [None]:
def NFL_2021(x):
    y = players[["nflId","height","weight","birthDate","collegeName","position","displayName","Age"]][players["position"] == x]
    y = y.sort_values(by="nflId",ascending=False)
    return y.head(100)

In [None]:
NFL_2021("CB")

In [None]:
NFL_2021("SS")

In [None]:
NFL_2021("MLB")

In [None]:
NFL_2021("OLB")

In [None]:
NFL_2021("FS")

In [None]:
NFL_2021("WR")

In [None]:
NFL_2021("QB")

In [None]:
NFL_2021("TE")

In [None]:
NFL_2021("RB")

In [None]:
NFL_2021("DE")

In [None]:
NFL_2021("LB")

In [None]:
NFL_2021("FB")

In [None]:
NFL_2021("ILB")

In [None]:
NFL_2021("DB")

In [None]:
NFL_2021("S")

In [None]:
NFL_2021("HB")

In [None]:

NFL_2021("NT")

In [None]:
NFL_2021("P")

In [None]:

NFL_2021("LS")

In [None]:

NFL_2021("K")

In [None]:

NFL_2021("DT")

## References

* Visualisation Bokeh : https://j253.github.io/blog/fun-with-nfl-stats.html

# On Progress :)