# 05 - KNN
- Assigns labels to ~10% of receivers
  - Receivers who are well-known representatives of their labels 
- Filters for the features I think are most important
  - Based on my domain expertise, I think utilization metrics are more relevant to a player's style than success metrics.
    - For example, a receiver can be considered a speedster, but not a statistically productive one. His lack of impressive stats should not exclude him from the "Speedster" category.
- Scales the features
- Performs PCA on the features
- Runs KNN with k = 4 on the principal components
- Demonstrates good labeling accuracy, based on domain expertise

In [1]:
import numpy as np
import pandas as pd
import warnings
import copy

from sklearn.model_selection import train_test_split

# Column and row display
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_seq_items', None)

# Max column width so we can read play descriptions
pd.set_option('display.max_colwidth', None)

np.set_printoptions(threshold=np.inf)


# Notebook cell width display
from IPython.display import display, HTML
display(HTML("<style>:root { --jp-notebook-max-width: 98% !important; }</style>"))

# Float appearance, Pandas and NumPy
pd.set_option('display.float_format', '{:.2f}'.format)
np.set_printoptions(suppress=True, precision = 2)

# Supress warnings
warnings.filterwarnings('ignore')

In [2]:
import sys
sys.path.append('../functions/')
import functions as fn

In [3]:
aggregate = pd.read_csv('../working_exports/aggregate.csv')

# DATA PREPARATION

## Labeling select players
- To "train" the model, I am assigning playing style labels to receivers I think are representative of each style
- The labels are:
  - Versatile
    - A receiver who possesses good speed, route running skills, hands, ball tracking ability, mid-air body control, and catch radius.
  - Speedster
    - Generally a smaller receiver who relies mainly on his speed to get open.
    - May also be a good route runner, but still relies more on speed.
    - May also have superb acceleration and deceleration.
  - Big Speedster
    - A big, strong receiver who also has great speed.
    - Sometimes has good route running skills, but relies more on pure athleticism.
  - Possession
    - A big, strong receiver who is usually targeted for short/intermediate yardage just past the line to gain, particularly on 3rd or 4th down to save the team's possession.
    - Targeted in possession-saving situations because they have the physicality to make highly contested catches, as the line to gain is usually tightly defended.
    - May have good speed, but typically is a better route runner than track star.
   - Route Technician
     - An expert route runner with elite footwork, agility, acceleration, deceleration, and understanding of defensive movements.
     - Can run all types of routes very well, whether they are more straight-line or require sudden change of direction
     - May also have good speed.
     - Can be counted on for short, intermediate, and deep passes.
   - YAC Specialist
     - Not a very refined route runner, but great at catching short passes and gaining more yards after the catch.
     - Has elite agility, acceleration, and deceleration to quickly change directions and weave through defenders.
   - Slot
     - A smaller receiver who typically does not have good speed, but has good route running skills and can get open in crowded areas
     - Typically lines up in the slot (more inward, rather than near the sideline)
     - Can still sometimes free himself for deep passes
- With these labels assigned to representative receivers, the model should look for unlabeled receivers with similar attributes and assign the correct labels

In [68]:
aggregate['playing_style'] = None

aggregate.loc[aggregate['player_name'] == 'Tyreek Hill', 'playing_style'] = 'Speedster'
aggregate.loc[aggregate['player_name'] == 'Tyler Lockett', 'playing_style'] = 'Speedster'
aggregate.loc[aggregate['player_name'] == 'Marquise Brown', 'playing_style'] = 'Speedster'
aggregate.loc[aggregate['player_name'] == 'Marquez Valdes-Scantling', 'playing_style'] = 'Speedster'
aggregate.loc[aggregate['player_name'] == 'Justin Jefferson', 'playing_style'] = 'Versatile'
aggregate.loc[aggregate['player_name'] == 'Ja\'Marr Chase', 'playing_style'] = 'Versatile'
aggregate.loc[aggregate['player_name'] == 'Nico Collins', 'playing_style'] = 'Versatile'
aggregate.loc[aggregate['player_name'] == 'DK Metcalf', 'playing_style'] = 'Big Speedster'
aggregate.loc[aggregate['player_name'] == 'A.J. Brown', 'playing_style'] = 'Big Speedster'
aggregate.loc[aggregate['player_name'] == 'Chase Claypool', 'playing_style'] = 'Big Speedster'
# aggregate.loc[aggregate['player_name'] == 'Josh Gordon', 'playing_style'] = 'Big Speedster'
aggregate.loc[aggregate['player_name'] == 'Tee Higgins', 'playing_style'] = 'Possession'
aggregate.loc[aggregate['player_name'] == 'Mike Evans', 'playing_style'] = 'Possession'
aggregate.loc[aggregate['player_name'] == 'Michael Thomas', 'playing_style'] = 'Possession'
aggregate.loc[aggregate['player_name'] == 'DeAndre Hopkins', 'playing_style'] = 'Possession'
aggregate.loc[aggregate['player_name'] == 'Stefon Diggs', 'playing_style'] = 'Route Technician'
aggregate.loc[aggregate['player_name'] == 'Davante Adams', 'playing_style'] = 'Route Technician'
aggregate.loc[aggregate['player_name'] == 'Jaylen Waddle', 'playing_style'] = 'Route Technician'
aggregate.loc[aggregate['player_name'] == 'Deebo Samuel', 'playing_style'] = 'YAC Specialist'
aggregate.loc[aggregate['player_name'] == 'Kadarius Toney', 'playing_style'] = 'YAC Specialist'
aggregate.loc[aggregate['player_name'] == 'Brandon Powell', 'playing_style'] = 'YAC Specialist'
aggregate.loc[aggregate['player_name'] == 'CeeDee Lamb', 'playing_style'] = 'Slot'
aggregate.loc[aggregate['player_name'] == 'Amon-Ra St. Brown', 'playing_style'] = 'Slot'
aggregate.loc[aggregate['player_name'] == 'Christian Kirk', 'playing_style'] = 'Slot'

In [69]:
aggregate.columns

Index(['player_name', 'player_position', 'season_year', 'player_game_count',
       'receptions', 'target', 'yards', 'att_yards', 'yards_after_catch',
       'yards_after_contact', 'touchdown', 'routes', 'pass_plays',
       'contested_receptions', 'contested_targets', 'weather_attempt',
       'difficult_attempt', 'difficult_catch', 'difficult_success_rate',
       'difficult_pct', 'weather_catch', 'qb_bf_attempt', 'qb_bf_catch',
       'hurry_up_attempt', 'hurry_up_catch', 'possession_saver_attempt',
       'possession_saver_catch', 'conversion_attempt', 'conversion_catch',
       'redzone_attempt', 'redzone_catch', 'clutch_catch', 'deep_attempt',
       'deep_catch', 'deep_sideline_attempt', 'deep_sideline_catch',
       'large_yac_catch', 'tackle_breaker_catch', 'beast_catch',
       'play_action_attempt', 'play_action_catch', 'rpo_attempt', 'rpo_catch',
       'cross_attempt', 'cross_catch', 'corner_attempt', 'corner_catch',
       'out_attempt', 'out_catch', 'curl_attempt', 'curl

In [70]:
aggregate.isnull().sum()

player_name                         0
player_position                     0
season_year                         0
player_game_count                   0
receptions                          0
target                              0
yards                               0
att_yards                           0
yards_after_catch                   0
yards_after_contact                 0
touchdown                           0
routes                              0
pass_plays                          0
contested_receptions                0
contested_targets                   0
weather_attempt                     0
difficult_attempt                   0
difficult_catch                     0
difficult_success_rate              0
difficult_pct                       0
weather_catch                       0
qb_bf_attempt                       0
qb_bf_catch                         0
hurry_up_attempt                    0
hurry_up_catch                      0
possession_saver_attempt            0
possession_s

We'll drop categorical variables

## Column filtering

In [71]:
# Ensuring that we only include numeric columns for PCA
# Dropping non-numeric columns (assuming non-numeric columns are 'player_name' and 'season_year')

# FRINGE COLUMNS - Consider adding back
# 'corner_success_rate', 'out_success_rate',

scalable_features_df = aggregate.drop(['player_name', 'player_position', 'season_year', 'player_game_count','receptions', 'target', 'yards', 'att_yards', 'yards_after_catch',
                                       'yards_after_contact', 'touchdown', 'routes', 'pass_plays', 'contested_receptions', 'contested_targets', 'weather_attempt', 
                                       'weather_catch', 'difficult_attempt', 'difficult_catch', 'qb_bf_attempt', 'qb_bf_catch', 'hurry_up_attempt','hurry_up_catch', 'possession_saver_attempt', 'possession_saver_catch',
                                       'conversion_attempt', 'conversion_catch', 'redzone_attempt', 'redzone_catch', 'deep_attempt', 'deep_catch', 'deep_sideline_attempt', 'deep_sideline_catch', 'clutch_catch',
                                       'difficult_success_rate', 'cross_success_rate', 'curl_success_rate', 'post_success_rate', 'underneath_screen_success_rate',
                                        'flat_success_rate', 'slant_success_rate', 'wr_screen_success_rate', 'comeback_success_rate', 'go_success_rate', 'corner_success_rate', 'out_success_rate',
                                        'in_success_rate', 'deep_success_rate', 'play_action_success_rate', 'rpo_success_rate', 'hurry_up_success_rate',
                                        'deep_sideline_success_rate', 'possession_saver_success_rate', 'route_rate', 'large_yac_catch', 'tackle_breaker_catch', 'beast_catch', 'play_action_attempt', 'play_action_catch', 'rpo_attempt', 'rpo_catch',
                                       'cross_attempt', 'cross_catch', 'corner_attempt', 'corner_catch', 'curl_attempt', 'out_attempt', 'out_catch', 'curl_catch', 'post_attempt', 'post_catch', 'underneath_screen_attempt',
                                       'underneath_screen_catch', 'flat_attempt', 'flat_catch', 'slant_attempt', 'slant_catch', 'wr_screen_attempt', 'wr_screen_catch',
                                       'comeback_attempt', 'comeback_catch', 'go_attempt', 'go_catch', 'in_attempt', 'in_catch', 'slot_snaps', 'wide_snaps', 'route_rate', 'playing_style'], axis=1)
scalable_features_df.head()

Unnamed: 0,difficult_pct,slot_rate,wide_rate,contested_catch_rate,cross_pct,corner_pct,out_pct,curl_pct,post_pct,underneath_screen_pct,flat_pct,slant_pct,wr_screen_pct,comeback_pct,go_pct,in_pct,deep_pct,play_action_pct,rpo_pct,hurry_up_pct,deep_sideline_pct,possession_saver_pct,conversion_pct,redzone_pct,adot,avg_yac,avg_yacon,catch_rate,yprr,height_in,weight_lbs,40,bench,vertical,broad_jump,shuttle,3_cone
0,0.18,0.3,0.69,0.56,0.03,0.1,0.21,0.1,0.07,0.0,0.09,0.07,0.07,0.1,0.11,0.08,0.15,0.24,0.02,0.18,0.11,0.58,0.21,0.1,10.1,4.88,1.03,0.7,2.62,73.25,202,4.43,14,37.5,126,4.27,7.02
1,0.16,0.42,0.54,0.52,0.09,0.04,0.13,0.11,0.11,0.0,0.04,0.11,0.08,0.1,0.14,0.05,0.22,0.41,0.15,0.03,0.09,0.65,0.2,0.04,12.39,4.05,0.49,0.7,3.2,68.13,185,4.29,13,40.5,129,4.06,6.53
2,0.29,0.3,0.7,0.44,0.03,0.06,0.14,0.12,0.08,0.0,0.04,0.08,0.04,0.09,0.19,0.11,0.2,0.2,0.04,0.05,0.11,0.56,0.22,0.07,11.83,4.93,0.95,0.56,2.45,72.88,212,4.56,14,39.5,123,4.3,6.82
3,0.26,0.26,0.74,0.5,0.03,0.06,0.12,0.11,0.01,0.01,0.04,0.24,0.04,0.09,0.15,0.1,0.19,0.32,0.3,0.17,0.14,0.58,0.17,0.04,12.1,6.23,2.18,0.61,2.59,72.5,226,4.49,19,36.5,120,4.25,7.0
4,0.18,0.34,0.66,0.5,0.03,0.08,0.11,0.13,0.08,0.01,0.1,0.14,0.06,0.09,0.1,0.06,0.15,0.3,0.17,0.1,0.09,0.56,0.18,0.12,11.23,3.88,0.93,0.7,2.49,72.0,195,4.46,11,35.0,115,4.32,7.03


In [72]:
scalable_features_df.columns

Index(['difficult_pct', 'slot_rate', 'wide_rate', 'contested_catch_rate',
       'cross_pct', 'corner_pct', 'out_pct', 'curl_pct', 'post_pct',
       'underneath_screen_pct', 'flat_pct', 'slant_pct', 'wr_screen_pct',
       'comeback_pct', 'go_pct', 'in_pct', 'deep_pct', 'play_action_pct',
       'rpo_pct', 'hurry_up_pct', 'deep_sideline_pct', 'possession_saver_pct',
       'conversion_pct', 'redzone_pct', 'adot', 'avg_yac', 'avg_yacon',
       'catch_rate', 'yprr', 'height_in', 'weight_lbs', '40', 'bench',
       'vertical', 'broad_jump', 'shuttle', '3_cone'],
      dtype='object')

In [73]:
scalable_features_df.loc[2, ['cross_pct', 'corner_pct', 'out_pct', 'curl_pct',  'post_pct', 'underneath_screen_pct', 'flat_pct', 'slant_pct', 'wr_screen_pct', 'comeback_pct', 'go_pct', 'in_pct']].sum()

0.9999999999999996

# PCA

## Feature scaling
- Using standardization to handle potential outliers

In [74]:
scalable_features_df.describe()

Unnamed: 0,difficult_pct,slot_rate,wide_rate,contested_catch_rate,cross_pct,corner_pct,out_pct,curl_pct,post_pct,underneath_screen_pct,flat_pct,slant_pct,wr_screen_pct,comeback_pct,go_pct,in_pct,deep_pct,play_action_pct,rpo_pct,hurry_up_pct,deep_sideline_pct,possession_saver_pct,conversion_pct,redzone_pct,adot,avg_yac,avg_yacon,catch_rate,yprr,height_in,weight_lbs,40,bench,vertical,broad_jump,shuttle,3_cone
count,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0
mean,0.22,0.38,0.6,0.36,0.07,0.06,0.11,0.16,0.05,0.01,0.07,0.1,0.08,0.08,0.13,0.08,0.17,0.19,0.09,0.13,0.1,0.51,0.19,0.05,10.73,4.24,1.12,0.61,1.35,72.33,199.08,4.48,14.34,35.97,122.5,4.26,6.98
std,0.15,0.23,0.24,0.29,0.12,0.07,0.09,0.18,0.09,0.07,0.11,0.12,0.13,0.1,0.13,0.07,0.15,0.17,0.12,0.16,0.13,0.22,0.12,0.06,5.65,3.2,1.72,0.19,0.98,2.43,15.92,0.1,3.94,2.7,9.41,0.14,0.18
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-4.0,0.0,0.0,0.0,0.0,66.38,155.0,4.25,2.0,28.5,10.0,3.8,6.51
25%,0.15,0.2,0.41,0.0,0.0,0.0,0.03,0.08,0.0,0.0,0.0,0.02,0.0,0.0,0.05,0.0,0.07,0.1,0.0,0.03,0.02,0.41,0.13,0.0,7.87,2.6,0.4,0.51,0.81,70.38,186.0,4.42,12.0,34.5,120.0,4.19,6.88
50%,0.2,0.31,0.67,0.39,0.04,0.04,0.11,0.13,0.04,0.0,0.04,0.08,0.03,0.06,0.11,0.07,0.16,0.19,0.07,0.09,0.08,0.54,0.2,0.04,10.73,3.93,0.86,0.62,1.23,72.63,201.0,4.49,14.0,35.5,123.0,4.25,7.0
75%,0.27,0.56,0.79,0.54,0.08,0.08,0.15,0.2,0.07,0.0,0.1,0.14,0.09,0.11,0.17,0.11,0.23,0.25,0.14,0.18,0.14,0.65,0.26,0.08,13.01,5.07,1.25,0.7,1.71,74.0,211.0,4.55,16.0,37.5,125.0,4.32,7.05
max,1.0,1.0,1.0,1.0,1.0,0.5,0.5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.33,1.0,1.0,1.0,1.0,1.0,1.0,0.75,0.44,45.0,34.0,18.0,1.0,10.0,77.38,238.0,4.75,29.0,45.0,140.0,5.01,7.64


In [75]:
mean = scalable_features_df.mean()
std = scalable_features_df.std()
scaled_features = (scalable_features_df - mean) / std

In [76]:
# Convert the scaled data to a NumPy array
scaled_features_array = scaled_features.to_numpy()

## 1. Covariance Matrix

In [77]:
# Transpose the data to get columns as features
transposed_data = scaled_features_array.T

In [78]:
scaled_features_array.shape, transposed_data.shape

((225, 37), (37, 225))

In [79]:
# Initialize an empty covariance matrix
n_features = len(transposed_data)
cov_matrix = [[0 for _ in range(n_features)] for _ in range(n_features)]
n_features

37

In [80]:
# Calculate the covariance matrix
for i in range(n_features): # Iterates over features 1 to 37
    for j in range(n_features): # While holding row i the same, iterates over features 1 to 37 for row j, which changes 
        cov_matrix[i][j] = fn.calculate_covariance(transposed_data[i], transposed_data[j])

In [81]:
cov_matrix_df = pd.DataFrame(cov_matrix)

## 2. Eigenvalues and eigenvectors

In [82]:
# Step 2: Compute the Eigenvalues and Eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

In [83]:
eigenvalues_df = pd.DataFrame(eigenvalues)
eigenvalues_df.head()

Unnamed: 0,0
0,5.53
1,3.06
2,2.39
3,2.25
4,2.07


In [84]:
eigenvectors_df = pd.DataFrame(eigenvectors)

In [85]:
eigenvectors_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36
0,0.2,-0.1,-0.05,0.27,0.04,-0.2,-0.05,-0.15,0.0,-0.25,0.06,-0.32,0.11,-0.07,-0.05,0.12,-0.02,-0.02,-0.0,-0.03,0.05,0.11,0.18,0.28,-0.08,0.32,-0.14,0.35,-0.08,0.07,-0.1,0.17,-0.09,0.29,-0.08,-0.15,0.24
1,-0.22,-0.21,0.22,0.13,-0.02,-0.07,-0.3,-0.04,0.32,-0.07,0.06,0.05,0.05,-0.01,-0.18,0.25,0.13,0.02,-0.0,0.26,0.62,-0.02,-0.06,-0.04,-0.08,-0.03,0.14,-0.05,0.04,0.05,0.01,-0.1,0.12,-0.02,0.05,0.06,0.05
2,0.24,0.19,-0.18,-0.12,0.01,0.03,0.25,0.07,-0.36,0.02,-0.08,-0.03,-0.17,0.11,0.19,-0.15,-0.1,-0.01,-0.0,0.26,0.65,0.03,0.05,0.01,-0.08,0.06,-0.01,0.05,-0.02,-0.09,-0.07,0.01,-0.04,-0.04,-0.06,-0.1,-0.1
3,0.06,0.14,0.13,-0.22,-0.25,-0.03,0.06,0.03,0.07,0.18,-0.05,-0.26,-0.0,0.13,-0.32,0.29,-0.02,-0.49,0.0,0.05,-0.05,-0.02,-0.15,-0.1,0.05,-0.1,-0.05,0.11,-0.02,-0.2,-0.32,-0.12,-0.2,-0.01,-0.19,-0.08,-0.01
4,-0.02,-0.08,-0.15,0.08,0.11,-0.14,0.22,-0.41,-0.09,0.05,0.37,-0.14,-0.03,0.22,0.12,0.32,-0.25,-0.14,0.31,-0.01,-0.02,0.08,0.01,-0.13,-0.07,-0.11,0.01,-0.33,0.09,0.01,0.15,0.01,0.18,0.06,-0.02,0.03,0.01


## 3. Sort eigenvectors and eigenvalues by eigenvalue magnitude

In [86]:
sorted_index = np.argsort(eigenvalues)[::-1] # argsort sorts the eigenvalues in ascending order by default
                                             # start:stop:step. Start and stop are ommitted, so the slice is the entire array. The step is -1, so the index decreases by 1 with each step.
                                             # This effectively reverses the order.
sorted_eigenvalues = eigenvalues[sorted_index]
sorted_eigenvectors = eigenvectors[:,sorted_index] # Sorts the columns by index in descending order for each eigenvalue magnitude. Leaves rows the same.

In [87]:
sorted_index

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 27, 28, 29, 30, 31, 34, 36, 35, 33, 32, 26, 25, 24, 23, 22, 21,
       20, 19, 18])

In [88]:
sorted_eigenvalues

array([5.53, 3.06, 2.39, 2.25, 2.07, 1.84, 1.54, 1.51, 1.48, 1.39, 1.2 ,
       1.15, 1.1 , 1.02, 0.95, 0.89, 0.82, 0.77, 0.69, 0.66, 0.62, 0.55,
       0.53, 0.45, 0.42, 0.4 , 0.37, 0.34, 0.22, 0.2 , 0.17, 0.15, 0.12,
       0.1 , 0.05, 0.04, 0.  ])

In [89]:
sorted_eigenvectors_df = pd.DataFrame(sorted_eigenvectors)
sorted_eigenvectors_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36
0,0.2,-0.1,-0.05,0.27,0.04,-0.2,-0.05,-0.15,0.0,-0.25,0.06,-0.32,0.11,-0.07,-0.05,0.12,-0.02,-0.02,0.35,-0.08,0.07,-0.1,0.17,-0.08,0.24,-0.15,0.29,-0.09,-0.14,0.32,-0.08,0.28,0.18,0.11,0.05,-0.03,-0.0
1,-0.22,-0.21,0.22,0.13,-0.02,-0.07,-0.3,-0.04,0.32,-0.07,0.06,0.05,0.05,-0.01,-0.18,0.25,0.13,0.02,-0.05,0.04,0.05,0.01,-0.1,0.05,0.05,0.06,-0.02,0.12,0.14,-0.03,-0.08,-0.04,-0.06,-0.02,0.62,0.26,-0.0
2,0.24,0.19,-0.18,-0.12,0.01,0.03,0.25,0.07,-0.36,0.02,-0.08,-0.03,-0.17,0.11,0.19,-0.15,-0.1,-0.01,0.05,-0.02,-0.09,-0.07,0.01,-0.06,-0.1,-0.1,-0.04,-0.04,-0.01,0.06,-0.08,0.01,0.05,0.03,0.65,0.26,-0.0
3,0.06,0.14,0.13,-0.22,-0.25,-0.03,0.06,0.03,0.07,0.18,-0.05,-0.26,-0.0,0.13,-0.32,0.29,-0.02,-0.49,0.11,-0.02,-0.2,-0.32,-0.12,-0.19,-0.01,-0.08,-0.01,-0.2,-0.05,-0.1,0.05,-0.1,-0.15,-0.02,-0.05,0.05,0.0
4,-0.02,-0.08,-0.15,0.08,0.11,-0.14,0.22,-0.41,-0.09,0.05,0.37,-0.14,-0.03,0.22,0.12,0.32,-0.25,-0.14,-0.33,0.09,0.01,0.15,0.01,-0.02,0.01,0.03,0.06,0.18,0.01,-0.11,-0.07,-0.13,0.01,0.08,-0.02,-0.01,0.31


## 4. Select subset of eigenvectors to form principal components

In [90]:
# Cumulative sum divided by sum.
# Each element represents the marginal variance explained by adding one more principal component.
cumulative_var_explained = np.cumsum(sorted_eigenvalues) / np.sum(sorted_eigenvalues)
cumulative_var_explained

array([0.15, 0.23, 0.3 , 0.36, 0.41, 0.46, 0.5 , 0.55, 0.59, 0.62, 0.66,
       0.69, 0.72, 0.74, 0.77, 0.79, 0.82, 0.84, 0.86, 0.87, 0.89, 0.9 ,
       0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.98, 0.99, 0.99, 1.  ,
       1.  , 1.  , 1.  , 1.  ])

In [91]:
# Finds the indices where cumulative variance explained is at least 95%.
# These indices determine how many PCs are needed to explain at least 95% of the total variance.
# [0][0] To access the first index from the first array
# +1 because Python is 0-indexed
# Returns the number of PCs needed to explain at least 95% of the variance.
num_components = np.where(cumulative_var_explained >= 0.95)[0][0] + 1 

In [92]:
np.where(cumulative_var_explained >= 0.95) # Actually an array nested within an array

(array([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]),)

In [93]:
num_components

26

In [94]:
pca_components = sorted_eigenvectors[:, :num_components]

In [95]:
pca_components_df = pd.DataFrame(pca_components)
pca_components_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25
0,0.2,-0.1,-0.05,0.27,0.04,-0.2,-0.05,-0.15,0.0,-0.25,0.06,-0.32,0.11,-0.07,-0.05,0.12,-0.02,-0.02,0.35,-0.08,0.07,-0.1,0.17,-0.08,0.24,-0.15
1,-0.22,-0.21,0.22,0.13,-0.02,-0.07,-0.3,-0.04,0.32,-0.07,0.06,0.05,0.05,-0.01,-0.18,0.25,0.13,0.02,-0.05,0.04,0.05,0.01,-0.1,0.05,0.05,0.06
2,0.24,0.19,-0.18,-0.12,0.01,0.03,0.25,0.07,-0.36,0.02,-0.08,-0.03,-0.17,0.11,0.19,-0.15,-0.1,-0.01,0.05,-0.02,-0.09,-0.07,0.01,-0.06,-0.1,-0.1
3,0.06,0.14,0.13,-0.22,-0.25,-0.03,0.06,0.03,0.07,0.18,-0.05,-0.26,-0.0,0.13,-0.32,0.29,-0.02,-0.49,0.11,-0.02,-0.2,-0.32,-0.12,-0.19,-0.01,-0.08
4,-0.02,-0.08,-0.15,0.08,0.11,-0.14,0.22,-0.41,-0.09,0.05,0.37,-0.14,-0.03,0.22,0.12,0.32,-0.25,-0.14,-0.33,0.09,0.01,0.15,0.01,-0.02,0.01,0.03


## 5. Transform the original data

In [96]:
pca_transformed_data = np.dot(scaled_features_array, pca_components)

In [97]:
# Creating a DataFrame of the PCA-transformed data
pca_df = pd.DataFrame(pca_transformed_data, columns=[f'PC{i+1}' for i in range(num_components)])

In [98]:
pca_df.head()

Unnamed: 0,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,PC11,PC12,PC13,PC14,PC15,PC16,PC17,PC18,PC19,PC20,PC21,PC22,PC23,PC24,PC25,PC26
0,0.36,0.56,0.5,-1.57,-0.33,-0.53,0.61,0.69,0.44,0.18,-0.67,-0.28,0.26,-0.29,0.65,0.15,-0.25,-0.54,-0.14,0.07,0.96,-0.05,0.14,-0.25,-0.03,0.08
1,0.17,-2.78,-0.27,-3.3,-0.85,0.11,1.11,-1.11,0.81,0.02,0.16,0.46,0.01,-0.64,-0.15,0.18,0.38,-0.01,0.7,0.35,0.0,0.96,-0.77,-0.45,-0.18,-0.99
2,1.07,0.52,0.12,-0.69,0.08,-0.44,0.24,-0.25,0.32,0.3,0.45,0.09,0.14,-0.73,-0.2,-0.31,0.28,-0.2,0.17,0.29,0.89,-0.24,0.54,0.13,-0.77,-1.13
3,0.86,1.14,1.24,-1.51,0.8,0.68,-0.54,-1.21,-0.16,0.14,-0.06,-0.77,0.0,0.35,-0.33,-0.55,-0.6,0.34,1.11,0.32,0.01,-0.12,0.41,-0.09,-0.35,0.07
4,-0.07,0.06,1.68,-0.75,-0.66,0.06,0.49,-0.24,-0.64,0.23,-0.69,-0.15,-0.26,-0.09,0.19,-0.14,-0.05,-0.29,-0.23,-0.83,0.21,0.51,-0.05,0.06,0.14,0.09


In [99]:
# Display the shape of the original and the PCA-transformed data
original_shape = scaled_features_array.shape
pca_shape = pca_transformed_data.shape
original_shape, pca_shape, pca_components.shape

((225, 37), (225, 26), (37, 26))

# KNN

In [100]:
pca_df['playing_style'] = aggregate['playing_style'].values

In [101]:
pca_df.head()

Unnamed: 0,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,PC11,PC12,PC13,PC14,PC15,PC16,PC17,PC18,PC19,PC20,PC21,PC22,PC23,PC24,PC25,PC26,playing_style
0,0.36,0.56,0.5,-1.57,-0.33,-0.53,0.61,0.69,0.44,0.18,-0.67,-0.28,0.26,-0.29,0.65,0.15,-0.25,-0.54,-0.14,0.07,0.96,-0.05,0.14,-0.25,-0.03,0.08,Versatile
1,0.17,-2.78,-0.27,-3.3,-0.85,0.11,1.11,-1.11,0.81,0.02,0.16,0.46,0.01,-0.64,-0.15,0.18,0.38,-0.01,0.7,0.35,0.0,0.96,-0.77,-0.45,-0.18,-0.99,Speedster
2,1.07,0.52,0.12,-0.69,0.08,-0.44,0.24,-0.25,0.32,0.3,0.45,0.09,0.14,-0.73,-0.2,-0.31,0.28,-0.2,0.17,0.29,0.89,-0.24,0.54,0.13,-0.77,-1.13,Route Technician
3,0.86,1.14,1.24,-1.51,0.8,0.68,-0.54,-1.21,-0.16,0.14,-0.06,-0.77,0.0,0.35,-0.33,-0.55,-0.6,0.34,1.11,0.32,0.01,-0.12,0.41,-0.09,-0.35,0.07,Big Speedster
4,-0.07,0.06,1.68,-0.75,-0.66,0.06,0.49,-0.24,-0.64,0.23,-0.69,-0.15,-0.26,-0.09,0.19,-0.14,-0.05,-0.29,-0.23,-0.83,0.21,0.51,-0.05,0.06,0.14,0.09,Route Technician


### Train and test split
- Train - Consists of the 23 receivers who were manually labeled with playing styles
- Test - The receivers who are not yet labeled with a playing style

In [102]:
train = pca_df[pca_df['playing_style'].notnull()]
test = pca_df[pca_df['playing_style'].isnull()]

In [103]:
train.shape, test.shape

((23, 27), (202, 27))

### Separating features from target variable

In [104]:
X_train = train.drop('playing_style', axis = 1)
y_train = train['playing_style']

X_test = test.drop('playing_style', axis = 1)
# y_test = test['playing_style']

### Features

In [105]:
pca_df.columns

Index(['PC1', 'PC2', 'PC3', 'PC4', 'PC5', 'PC6', 'PC7', 'PC8', 'PC9', 'PC10',
       'PC11', 'PC12', 'PC13', 'PC14', 'PC15', 'PC16', 'PC17', 'PC18', 'PC19',
       'PC20', 'PC21', 'PC22', 'PC23', 'PC24', 'PC25', 'PC26',
       'playing_style'],
      dtype='object')

In [106]:
features = ['PC1', 'PC2', 'PC3', 'PC4', 'PC5', 'PC6', 'PC7', 'PC8', 'PC9', 'PC10',
       'PC11', 'PC12', 'PC13', 'PC14', 'PC15', 'PC16', 'PC17', 'PC18', 'PC19',
       'PC20', 'PC21', 'PC22', 'PC23', 'PC24', 'PC25', 'PC26']

### Run the model on the test dataset
- With 4 nearest neighbors

In [107]:
X_test['playing_style'] = X_test.apply(lambda X_test: fn.knn(features, X_train, X_test, y_train, 6), axis = 1)

## Checking results

In [108]:
combined_playing_style = pd.concat([train['playing_style'], X_test['playing_style']])

# Separate dataframe to avoid altering the original aggregate df
final = aggregate
final['playing_style'] = combined_playing_style

In [109]:
pd.set_option('display.float_format', '{:.2f}'.format)

In [110]:
final = final[['player_name', 'playing_style', 'height_in', 'weight_lbs', '40', 'bench',
       'vertical', 'broad_jump', 'shuttle', '3_cone', 'cross_pct', 'corner_pct', 'out_pct', 'curl_pct', 'post_pct', 'underneath_screen_pct',
       'flat_pct', 'slant_pct', 'wr_screen_pct', 'comeback_pct', 'go_pct', 'in_pct', 'deep_pct', 'play_action_pct', 'rpo_pct',
       'hurry_up_pct', 'difficult_pct', 'deep_sideline_pct', 'possession_saver_pct', 'clutch_catch', 'conversion_pct', 'redzone_pct',
               'adot', 'avg_yac', 'avg_yacon', 'catch_rate', 'yprr', 'slot_rate', 'wide_rate', 'contested_catch_rate'
       ]]

In [111]:
final.head(10)

Unnamed: 0,player_name,playing_style,height_in,weight_lbs,40,bench,vertical,broad_jump,shuttle,3_cone,cross_pct,corner_pct,out_pct,curl_pct,post_pct,underneath_screen_pct,flat_pct,slant_pct,wr_screen_pct,comeback_pct,go_pct,in_pct,deep_pct,play_action_pct,rpo_pct,hurry_up_pct,difficult_pct,deep_sideline_pct,possession_saver_pct,clutch_catch,conversion_pct,redzone_pct,adot,avg_yac,avg_yacon,catch_rate,yprr,slot_rate,wide_rate,contested_catch_rate
0,Justin Jefferson,Versatile,73.25,202,4.43,14,37.5,126,4.27,7.02,0.03,0.1,0.21,0.1,0.07,0.0,0.09,0.07,0.07,0.1,0.11,0.08,0.15,0.24,0.02,0.18,0.18,0.11,0.58,6,0.21,0.1,10.1,4.88,1.03,0.7,2.62,0.3,0.69,0.56
1,Tyreek Hill,Speedster,68.13,185,4.29,13,40.5,129,4.06,6.53,0.09,0.04,0.13,0.11,0.11,0.0,0.04,0.11,0.08,0.1,0.14,0.05,0.22,0.41,0.15,0.03,0.16,0.09,0.65,3,0.2,0.04,12.39,4.05,0.49,0.7,3.2,0.42,0.54,0.52
2,Davante Adams,Route Technician,72.88,212,4.56,14,39.5,123,4.3,6.82,0.03,0.06,0.14,0.12,0.08,0.0,0.04,0.08,0.04,0.09,0.19,0.11,0.2,0.2,0.04,0.05,0.29,0.11,0.56,1,0.22,0.07,11.83,4.93,0.95,0.56,2.45,0.3,0.7,0.44
3,A.J. Brown,Big Speedster,72.5,226,4.49,19,36.5,120,4.25,7.0,0.03,0.06,0.12,0.11,0.01,0.01,0.04,0.24,0.04,0.09,0.15,0.1,0.19,0.32,0.3,0.17,0.26,0.14,0.58,2,0.17,0.04,12.1,6.23,2.18,0.61,2.59,0.26,0.74,0.5
4,Stefon Diggs,Route Technician,72.0,195,4.46,11,35.0,115,4.32,7.03,0.03,0.08,0.11,0.13,0.08,0.01,0.1,0.14,0.06,0.09,0.1,0.06,0.15,0.3,0.17,0.1,0.18,0.09,0.56,4,0.18,0.12,11.23,3.88,0.93,0.7,2.49,0.34,0.66,0.5
5,CeeDee Lamb,Slot,73.63,198,4.5,11,34.5,124,4.24,7.0,0.08,0.08,0.11,0.15,0.08,0.0,0.1,0.13,0.06,0.06,0.09,0.06,0.17,0.3,0.11,0.1,0.18,0.08,0.52,2,0.16,0.04,10.08,4.54,1.38,0.69,2.38,0.62,0.36,0.46
6,Jaylen Waddle,Route Technician,69.5,180,4.55,11,34.0,122,4.22,6.99,0.08,0.02,0.1,0.07,0.15,0.0,0.01,0.17,0.07,0.12,0.09,0.14,0.15,0.33,0.15,0.06,0.18,0.07,0.68,2,0.23,0.04,11.8,6.8,1.99,0.64,2.59,0.25,0.74,0.25
7,DeVonta Smith,Route Technician,72.25,170,4.53,9,34.0,131,4.22,6.95,0.07,0.07,0.15,0.15,0.03,0.02,0.05,0.09,0.13,0.13,0.07,0.04,0.15,0.14,0.17,0.18,0.15,0.1,0.45,1,0.2,0.05,9.68,5.16,1.03,0.7,1.98,0.25,0.75,0.42
8,Terry McLaurin,Versatile,72.13,208,4.35,18,37.5,125,4.15,7.01,0.08,0.07,0.12,0.1,0.03,0.0,0.05,0.11,0.09,0.08,0.16,0.1,0.23,0.27,0.16,0.08,0.22,0.14,0.61,2,0.26,0.04,12.81,5.12,1.86,0.64,2.04,0.21,0.79,0.65
9,Amon-Ra St. Brown,Slot,71.5,197,4.61,20,38.5,127,4.26,6.9,0.08,0.03,0.26,0.15,0.06,0.02,0.05,0.11,0.09,0.04,0.01,0.11,0.04,0.23,0.03,0.05,0.15,0.03,0.38,2,0.18,0.05,6.47,4.87,0.9,0.73,2.4,0.6,0.39,0.38


In [112]:
final['playing_style'].value_counts()

playing_style
Route Technician    58
Possession          50
Slot                45
Speedster           27
Versatile           20
Big Speedster       15
YAC Specialist      10
Name: count, dtype: int64

### Versatile

In [132]:
final[final['playing_style'] == 'Versatile'][['player_name', 'playing_style', '40', 'slant_pct', 'go_pct', 'post_pct', 'corner_pct', 'out_pct', 'in_pct', 'comeback_pct', 'curl_pct', 'flat_pct', 'contested_catch_rate', 'wide_rate']]

Unnamed: 0,player_name,playing_style,40,slant_pct,go_pct,post_pct,corner_pct,out_pct,in_pct,comeback_pct,curl_pct,flat_pct,contested_catch_rate,wide_rate
0,Justin Jefferson,Versatile,4.43,0.07,0.11,0.07,0.1,0.21,0.08,0.1,0.1,0.09,0.56,0.69
8,Terry McLaurin,Versatile,4.35,0.11,0.16,0.03,0.07,0.12,0.1,0.08,0.1,0.05,0.65,0.79
15,Ja'Marr Chase,Versatile,4.34,0.13,0.16,0.01,0.04,0.13,0.13,0.11,0.09,0.09,0.39,0.76
25,DJ Moore,Versatile,4.42,0.07,0.16,0.07,0.06,0.14,0.1,0.08,0.11,0.04,0.57,0.71
28,Donovan Peoples-Jones,Versatile,4.48,0.1,0.15,0.01,0.09,0.17,0.14,0.07,0.17,0.06,0.38,0.7
31,Zay Jones,Versatile,4.45,0.12,0.07,0.04,0.07,0.18,0.03,0.07,0.17,0.06,0.42,0.66
39,Darius Slayton,Versatile,4.39,0.1,0.15,0.03,0.01,0.1,0.15,0.1,0.21,0.04,0.53,0.71
52,Alec Pierce,Versatile,4.41,0.1,0.19,0.03,0.01,0.06,0.21,0.13,0.19,0.01,0.43,0.93
63,DJ Chark Jr.,Versatile,4.34,0.04,0.23,0.1,0.08,0.1,0.19,0.06,0.1,0.0,0.5,0.75
65,Terrace Marshall Jr.,Versatile,4.4,0.13,0.28,0.11,0.04,0.04,0.02,0.13,0.11,0.04,0.37,0.91


Looks good. Most of these guys are fast. Notable inclusions are:
- Julio Jones, despite being very big and fast, is a truly versatile receiver. I'm impressed that the model didn't classify him as "Big Speedster."
- Terry McLaurin, DJ Moore, and DJ Chark are great picks.

### Route technicians

In [114]:
final[final['playing_style'] == 'Route Technician'][['player_name', 'playing_style', 'cross_pct', 'curl_pct', 'flat_pct', 'out_pct', 'slant_pct', 'comeback_pct', 'corner_pct', 'post_pct', 'go_pct', 'wr_screen_pct', 'underneath_screen_pct']]

Unnamed: 0,player_name,playing_style,cross_pct,curl_pct,flat_pct,out_pct,slant_pct,comeback_pct,corner_pct,post_pct,go_pct,wr_screen_pct,underneath_screen_pct
2,Davante Adams,Route Technician,0.03,0.12,0.04,0.14,0.08,0.09,0.06,0.08,0.19,0.04,0.0
4,Stefon Diggs,Route Technician,0.03,0.13,0.1,0.11,0.14,0.09,0.08,0.08,0.1,0.06,0.01
6,Jaylen Waddle,Route Technician,0.08,0.07,0.01,0.1,0.17,0.12,0.02,0.15,0.09,0.07,0.0
7,DeVonta Smith,Route Technician,0.07,0.15,0.05,0.15,0.09,0.13,0.07,0.03,0.07,0.13,0.02
10,Amari Cooper,Route Technician,0.03,0.12,0.04,0.12,0.16,0.14,0.13,0.06,0.14,0.02,0.0
13,Garrett Wilson,Route Technician,0.05,0.1,0.07,0.09,0.14,0.1,0.1,0.05,0.14,0.08,0.01
20,Brandon Aiyuk,Route Technician,0.05,0.1,0.04,0.16,0.15,0.05,0.02,0.07,0.08,0.06,0.0
21,Jerry Jeudy,Route Technician,0.05,0.11,0.11,0.12,0.03,0.05,0.11,0.07,0.15,0.11,0.01
26,Diontae Johnson,Route Technician,0.03,0.17,0.03,0.17,0.05,0.19,0.03,0.03,0.16,0.05,0.0
34,George Pickens,Route Technician,0.0,0.11,0.02,0.13,0.04,0.27,0.1,0.04,0.17,0.01,0.0


Looks good for the most part. Some notable inclusions are:
- Amari Cooper - Highly praised for his crisp route running.
- DeSean Jackson - More of a speedster, and not a refined route runner. 42% of his routes are mostly vertical (go, corner, and post). 29% of his routes are curls. This does not make for a varied route tree.

### Possession

In [134]:
final[final['playing_style'] == 'Possession'][['player_name', 'playing_style', 'height_in', 'weight_lbs', 'slant_pct', 'in_pct', 'out_pct', 'post_pct', 'curl_pct', 'comeback_pct', 'deep_pct', 'possession_saver_pct', 'conversion_pct', 'redzone_pct', 'contested_catch_rate', 'avg_yacon', 'avg_yac']]

Unnamed: 0,player_name,playing_style,height_in,weight_lbs,slant_pct,in_pct,out_pct,post_pct,curl_pct,comeback_pct,deep_pct,possession_saver_pct,conversion_pct,redzone_pct,contested_catch_rate,avg_yacon,avg_yac
11,Mike Evans,Possession,76.75,231,0.09,0.07,0.14,0.09,0.16,0.13,0.24,0.68,0.23,0.08,0.65,0.79,2.81
18,Tee Higgins,Possession,75.63,216,0.13,0.11,0.17,0.06,0.13,0.14,0.15,0.57,0.13,0.07,0.62,1.36,3.86
23,Michael Pittman Jr.,Possession,76.0,223,0.21,0.18,0.09,0.02,0.21,0.06,0.04,0.42,0.19,0.06,0.5,1.13,3.61
24,Mike Williams,Possession,75.75,218,0.04,0.1,0.03,0.03,0.24,0.1,0.17,0.6,0.25,0.08,0.58,0.86,5.06
27,Drake London,Possession,75.88,219,0.13,0.09,0.16,0.07,0.1,0.09,0.13,0.61,0.18,0.12,0.54,0.6,3.21
29,Gabe Davis,Possession,74.0,216,0.08,0.11,0.12,0.1,0.17,0.19,0.26,0.72,0.29,0.09,0.35,0.44,3.04
30,Courtland Sutton,Possession,75.38,218,0.12,0.11,0.08,0.06,0.19,0.12,0.18,0.58,0.23,0.08,0.42,0.58,2.38
36,Joshua Palmer,Possession,73.25,210,0.04,0.07,0.21,0.04,0.21,0.12,0.1,0.48,0.21,0.05,0.54,0.68,3.54
40,DeAndre Hopkins,Possession,73.0,214,0.08,0.03,0.12,0.01,0.29,0.12,0.15,0.55,0.19,0.03,0.52,0.86,2.58
44,Mack Hollins,Possession,75.75,221,0.04,0.2,0.14,0.04,0.16,0.04,0.18,0.55,0.26,0.09,0.4,0.61,3.44


Looks good. Michael Pittman, Drake London, and Courtland Sutton are commonly thought of as possession receivers.

### Big Speedster

In [135]:
final[final['playing_style'] == 'Big Speedster'][['player_name', 'playing_style', 'weight_lbs', '40', 'bench', 'adot', 'avg_yac', 'go_pct', 'post_pct', 'corner_pct', 'deep_pct', 'play_action_pct', 'deep_sideline_pct']]

Unnamed: 0,player_name,playing_style,weight_lbs,40,bench,adot,avg_yac,go_pct,post_pct,corner_pct,deep_pct,play_action_pct,deep_sideline_pct
3,A.J. Brown,Big Speedster,226,4.49,19,12.1,6.23,0.15,0.01,0.06,0.19,0.32,0.14
14,DK Metcalf,Big Speedster,228,4.33,27,11.22,2.41,0.18,0.1,0.02,0.12,0.21,0.08
35,Allen Lazard,Big Speedster,227,4.55,17,12.49,4.13,0.21,0.07,0.07,0.24,0.25,0.19
71,Chase Claypool,Big Speedster,238,4.42,19,10.39,3.24,0.12,0.04,0.07,0.21,0.15,0.09
94,Equanimeous St. Brown,Big Speedster,214,4.48,20,13.03,4.24,0.11,0.03,0.16,0.21,0.32,0.16
95,Justin Watson,Big Speedster,215,4.44,20,18.74,4.0,0.26,0.06,0.09,0.41,0.32,0.21
129,Sterling Shepard,Big Speedster,194,4.48,20,9.38,4.23,0.08,0.0,0.04,0.17,0.29,0.08
131,Zach Pascal,Big Speedster,219,4.55,14,6.05,5.73,0.05,0.05,0.0,0.05,0.53,0.0
140,N'Keal Harry,Big Speedster,228,4.53,27,17.0,2.29,0.33,0.22,0.0,0.22,0.0,0.11
141,Breshad Perriman,Big Speedster,212,4.52,18,14.16,5.78,0.26,0.16,0.0,0.37,0.05,0.16


Looks to be fairly accurate, based on 40 time, bench press reps, average depth of target, average YAC, and percentage of vertical routes.
- Receivers with 40 times in the upper 4.5s should not be in this category.

### Speedster

In [117]:
final[final['playing_style'] == 'Speedster'][['player_name', 'playing_style', 'weight_lbs', '40', 'adot', 'go_pct', 'post_pct', 'corner_pct', 'deep_pct', 'deep_sideline_pct']]

Unnamed: 0,player_name,playing_style,weight_lbs,40,adot,go_pct,post_pct,corner_pct,deep_pct,deep_sideline_pct
1,Tyreek Hill,Speedster,185,4.29,12.39,0.14,0.11,0.04,0.22,0.09
16,Chris Olave,Speedster,187,4.39,14.17,0.17,0.04,0.06,0.24,0.14
17,Tyler Lockett,Speedster,182,4.4,10.6,0.09,0.08,0.06,0.16,0.06
42,Marquise Brown,Speedster,166,4.27,11.26,0.16,0.04,0.13,0.21,0.12
43,Brandin Cooks,Speedster,189,4.33,10.73,0.17,0.05,0.04,0.19,0.15
45,Marquez Valdes-Scantling,Speedster,206,4.37,13.9,0.19,0.07,0.06,0.21,0.06
64,Darnell Mooney,Speedster,176,4.38,11.98,0.18,0.03,0.1,0.23,0.2
72,Elijah Moore,Speedster,178,4.35,11.6,0.08,0.06,0.06,0.17,0.09
78,Isaiah McKenzie,Speedster,173,4.42,8.71,0.05,0.05,0.08,0.12,0.11
79,Trent Sherfield,Speedster,203,4.45,10.86,0.1,0.12,0.02,0.12,0.06


- The expected traits are well-represented: sub-4.4 40, sub-200 weight, high ADOT, and high utilization in deep passes and vertical routes.

### Slot

In [118]:
final[final['playing_style'] == 'Slot'][['player_name', 'playing_style', 'adot', 'avg_yac', 'slant_pct', 'out_pct', 'in_pct', 'curl_pct', 'flat_pct', 'comeback_pct', 'wr_screen_pct', 'rpo_pct', 'slot_rate']]

Unnamed: 0,player_name,playing_style,adot,avg_yac,slant_pct,out_pct,in_pct,curl_pct,flat_pct,comeback_pct,wr_screen_pct,rpo_pct,slot_rate
5,CeeDee Lamb,Slot,10.08,4.54,0.13,0.11,0.06,0.15,0.1,0.06,0.06,0.11,0.62
9,Amon-Ra St. Brown,Slot,6.47,4.87,0.11,0.26,0.11,0.15,0.05,0.04,0.09,0.03,0.6
12,Christian Kirk,Slot,9.17,4.42,0.12,0.27,0.1,0.11,0.11,0.05,0.07,0.08,0.75
19,Chris Godwin,Slot,5.76,5.07,0.11,0.1,0.16,0.15,0.1,0.06,0.18,0.06,0.73
22,JuJu Smith-Schuster,Slot,7.35,5.96,0.08,0.13,0.17,0.22,0.05,0.1,0.03,0.15,0.43
32,Cooper Kupp,Slot,7.18,5.63,0.02,0.09,0.07,0.2,0.08,0.05,0.2,0.07,0.56
33,Jakobi Meyers,Slot,9.92,3.52,0.07,0.18,0.08,0.09,0.07,0.03,0.08,0.05,0.7
37,Tyler Boyd,Slot,9.21,4.29,0.16,0.2,0.05,0.13,0.1,0.11,0.01,0.13,0.84
38,Keenan Allen,Slot,8.58,3.98,0.1,0.19,0.04,0.19,0.16,0.07,0.07,0.06,0.64
46,Curtis Samuel,Slot,6.68,4.52,0.08,0.07,0.05,0.17,0.12,0.05,0.15,0.17,0.71


- Chris Godwin, Cooker Kupp, Tyler Boyd, and Keenan Allen are here, as expected.
- The model seems to key off the high slot rate, high utilization in WR screens and direction-changing routes, lower ADOT, and higher YAC

### YAC Specialist

In [119]:
final[final['playing_style'] == 'YAC Specialist'][['player_name', 'playing_style', '40', 'adot', 'avg_yac', 'slant_pct', 'in_pct', 'out_pct', 'wr_screen_pct', 'underneath_screen_pct']]

Unnamed: 0,player_name,playing_style,40,adot,avg_yac,slant_pct,in_pct,out_pct,wr_screen_pct,underneath_screen_pct
48,Deebo Samuel,YAC Specialist,4.48,4.26,8.8,0.12,0.23,0.05,0.21,0.02
105,Laviska Shenault Jr.,YAC Specialist,4.58,-0.69,12.22,0.03,0.06,0.03,0.34,0.12
123,Kadarius Toney,YAC Specialist,4.38,3.75,6.75,0.05,0.1,0.05,0.2,0.05
128,Brandon Powell,YAC Specialist,4.59,1.5,7.67,0.03,0.06,0.09,0.38,0.06
179,Tim Jones,YAC Specialist,4.47,1.75,7.0,0.0,0.0,0.0,0.5,0.0
184,Dareke Young,YAC Specialist,4.44,0.5,11.5,0.0,0.0,0.0,0.5,0.0
207,Tyron Johnson,YAC Specialist,4.36,-2.0,10.0,0.0,0.0,0.0,1.0,0.0
208,Maurice Alexander,YAC Specialist,4.55,3.0,4.0,0.0,0.0,0.0,0.0,0.0
214,Erik Ezukanma,YAC Specialist,4.54,-4.0,7.0,0.0,0.0,0.0,1.0,0.0
218,DJ Turner,YAC Specialist,4.26,-3.0,0.0,0.0,0.0,0.0,0.0,1.0


- Not a lot of receivers, but that's expected. If you're not that great at running routes, but can still make big yardage gains once the ball is in your hands, you are usually asked to play running back.
- The model appears to be looking for very low ADOT (sometimes negative, meaning "behind the line of scrimmage"), high average YAC, and high utilization in screen plays.

### Spot checking select WRs

In [120]:
final[final['player_name'] == 'Nico Collins']['playing_style']

67    Versatile
Name: playing_style, dtype: object

In [121]:
final[final['player_name'] == 'Terry McLaurin']['playing_style']

8    Versatile
Name: playing_style, dtype: object

In [122]:
final[final['player_name'] == 'Josh Gordon']['playing_style']

221    Possession
Name: playing_style, dtype: object

In [123]:
final[final['player_name'] == 'Quez Watkins']['playing_style']

90    Slot
Name: playing_style, dtype: object

In [124]:
final[final['player_name'] == 'Chase Claypool']['playing_style']

71    Big Speedster
Name: playing_style, dtype: object

In [125]:
final[final['player_name'] == 'Michael Thomas']['playing_style']

124    Possession
Name: playing_style, dtype: object

In [126]:
final[final['player_name'] == 'DeAndre Hopkins']['playing_style']

40    Possession
Name: playing_style, dtype: object

In [127]:
final[final['player_name'] == 'Marquise Brown']['playing_style']

42    Speedster
Name: playing_style, dtype: object

In [128]:
final[final['player_name'] == 'Marquez Valdes-Scantling']['playing_style']

45    Speedster
Name: playing_style, dtype: object

In [129]:
final[final['player_name'] == 'Devin Duvernay']['playing_style']

83    Speedster
Name: playing_style, dtype: object

In [130]:
final[final['player_name'] == 'Cooper Kupp'][['playing_style', 'weight_lbs', 'slot_rate']]

Unnamed: 0,playing_style,weight_lbs,slot_rate
32,Slot,204,0.56


In [131]:
final[final['player_name'] == 'DeSean Jackson'][['player_name', 'playing_style', 'cross_pct', 'curl_pct', 'flat_pct', 'out_pct', 'slant_pct', 'comeback_pct', 'corner_pct', 'post_pct', 'go_pct', 'wr_screen_pct', 'underneath_screen_pct']]

Unnamed: 0,player_name,playing_style,cross_pct,curl_pct,flat_pct,out_pct,slant_pct,comeback_pct,corner_pct,post_pct,go_pct,wr_screen_pct,underneath_screen_pct
130,DeSean Jackson,Route Technician,0.0,0.29,0.0,0.12,0.06,0.0,0.12,0.06,0.24,0.06,0.0
