# 05 - KNN
- Assigns labels to ~10% of receivers
  - Receivers who are well-known representatives of their labels 
- Filters for the features I think are most important
  - Based on my domain expertise, I think utilization metrics are more relevant to a player's style than success metrics.
    - For example, a receiver can be considered a speedster, but not a statistically productive one. His lack of impressive stats should not exclude him from the "Speedster" category.
- Scales the features
- Performs PCA on the features
- Runs KNN with k = 4 on the principal components
- Demonstrates good labeling accuracy, based on domain expertise

In [1]:
import numpy as np
import pandas as pd
import warnings
import copy

from sklearn.model_selection import train_test_split

# Column and row display
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_seq_items', None)

# Max column width so we can read play descriptions
pd.set_option('display.max_colwidth', None)

np.set_printoptions(threshold=np.inf)


# Notebook cell width display
from IPython.display import display, HTML
display(HTML("<style>:root { --jp-notebook-max-width: 98% !important; }</style>"))

# Float appearance, Pandas and NumPy
pd.set_option('display.float_format', '{:.2f}'.format)
np.set_printoptions(suppress=True, precision = 2)

# Supress warnings
warnings.filterwarnings('ignore')

In [2]:
import sys
sys.path.append('../')
import functions as fn

In [3]:
aggregate = pd.read_csv('../working_exports/aggregate.csv')

# DATA PREPARATION

## Labeling select players
- To "train" the model, I am assigning playing style labels to receivers I think are representative of each style
- The labels are:
  - Versatile
    - A receiver who possesses good speed, route running skills, hands, ball tracking ability, mid-air body control, and catch radius.
  - Speedster
    - Generally a smaller receiver who relies mainly on his speed to get open.
    - May also be a good route runner, but still relies more on speed.
    - May also have superb acceleration and deceleration.
  - Physical - Speedster
    - A big, strong receiver who also has great speed.
    - Sometimes has good route running skills, but relies more on pure athleticism.
  - Physical - Possession
    - A big, strong receiver who is usually targeted for short/intermediate yardage just past the line to gain, particularly on 3rd or 4th down to save the team's possession.
    - Targeted in possession-saving situations because they have the physicality to make highly contested catches, as the line to gain is usually tightly defended.
    - May have good speed, but typically is a better route runner than track star.
   - Route Technician
     - An expert route runner with elite footwork, agility, acceleration, deceleration, and understanding of defensive movements.
     - Can run all types of routes very well, whether they are more straight-line or require sudden change of direction
     - May also have good speed.
     - Can be counted on for short, intermediate, and deep passes.
   - YAC Specialist
     - Not a very refined route runner, but great at catching short passes and gaining more yards after the catch.
     - Has elite agility, acceleration, and deceleration to quickly change directions and weave through defenders.
   - Slot
     - A smaller receiver who typically does not have good speed, but has good route running skills and can get open in crowded areas
     - Typically lines up in the slot (more inward, rather than near the sideline)
     - Can still sometimes free himself for deep passes
- With these labels assigned to representative receivers, the model should look for unlabeled receivers with similar attributes and assign the correct labels

In [4]:
aggregate['playing_style'] = None

aggregate.loc[aggregate['player_name'] == 'Tyreek Hill', 'playing_style'] = 'Speedster'
aggregate.loc[aggregate['player_name'] == 'Tyler Lockett', 'playing_style'] = 'Speedster'
aggregate.loc[aggregate['player_name'] == 'Marquise Brown', 'playing_style'] = 'Speedster'
aggregate.loc[aggregate['player_name'] == 'Marquez Valdes-Scantling', 'playing_style'] = 'Speedster'
aggregate.loc[aggregate['player_name'] == 'Justin Jefferson', 'playing_style'] = 'Versatile'
aggregate.loc[aggregate['player_name'] == 'Ja\'Marr Chase', 'playing_style'] = 'Versatile'
aggregate.loc[aggregate['player_name'] == 'Nico Collins', 'playing_style'] = 'Versatile'
aggregate.loc[aggregate['player_name'] == 'DK Metcalf', 'playing_style'] = 'Physical - Speedster'
aggregate.loc[aggregate['player_name'] == 'A.J. Brown', 'playing_style'] = 'Physical - Speedster'
aggregate.loc[aggregate['player_name'] == 'Chase Claypool', 'playing_style'] = 'Physical - Speedster'
aggregate.loc[aggregate['player_name'] == 'Josh Gordon', 'playing_style'] = 'Physical - Speedster'
aggregate.loc[aggregate['player_name'] == 'Tee Higgins', 'playing_style'] = 'Physical - Possession'
aggregate.loc[aggregate['player_name'] == 'Mike Evans', 'playing_style'] = 'Physical - Possession'
# aggregate.loc[aggregate['player_name'] == 'Michael Thomas', 'playing_style'] = 'Physical - Possession'
aggregate.loc[aggregate['player_name'] == 'DeAndre Hopkins', 'playing_style'] = 'Physical - Possession'
aggregate.loc[aggregate['player_name'] == 'Stefon Diggs', 'playing_style'] = 'Route Technician'
aggregate.loc[aggregate['player_name'] == 'Davante Adams', 'playing_style'] = 'Route Technician'
aggregate.loc[aggregate['player_name'] == 'Jaylen Waddle', 'playing_style'] = 'Route Technician'
aggregate.loc[aggregate['player_name'] == 'Deebo Samuel', 'playing_style'] = 'YAC Specialist'
aggregate.loc[aggregate['player_name'] == 'Kadarius Toney', 'playing_style'] = 'YAC Specialist'
aggregate.loc[aggregate['player_name'] == 'Brandon Powell', 'playing_style'] = 'YAC Specialist'
aggregate.loc[aggregate['player_name'] == 'CeeDee Lamb', 'playing_style'] = 'Slot'
aggregate.loc[aggregate['player_name'] == 'Amon-Ra St. Brown', 'playing_style'] = 'Slot'
aggregate.loc[aggregate['player_name'] == 'Christian Kirk', 'playing_style'] = 'Slot'

In [5]:
aggregate.columns

Index(['player_name', 'player_position', 'season_year', 'player_game_count',
       'receptions', 'target', 'yards', 'att_yards', 'yards_after_catch',
       'yards_after_contact', 'touchdown', 'routes', 'pass_plays',
       'contested_receptions', 'contested_targets', 'weather_attempt',
       'difficult_attempt', 'difficult_catch', 'difficult_success_rate',
       'difficult_pct', 'weather_catch', 'qb_bf_attempt', 'qb_bf_catch',
       'hurry_up_attempt', 'hurry_up_catch', 'possession_saver_attempt',
       'possession_saver_catch', 'conversion_attempt', 'conversion_catch',
       'redzone_attempt', 'redzone_catch', 'clutch_catch', 'deep_attempt',
       'deep_catch', 'deep_sideline_attempt', 'deep_sideline_catch',
       'large_yac_catch', 'tackle_breaker_catch', 'beast_catch',
       'play_action_attempt', 'play_action_catch', 'rpo_attempt', 'rpo_catch',
       'cross_attempt', 'cross_catch', 'corner_attempt', 'corner_catch',
       'out_attempt', 'out_catch', 'curl_attempt', 'curl

In [6]:
aggregate.isnull().sum()

player_name                         0
player_position                     0
season_year                         0
player_game_count                   0
receptions                          0
target                              0
yards                               0
att_yards                           0
yards_after_catch                   0
yards_after_contact                 0
touchdown                           0
routes                              0
pass_plays                          0
contested_receptions                0
contested_targets                   0
weather_attempt                     0
difficult_attempt                   0
difficult_catch                     0
difficult_success_rate              0
difficult_pct                       0
weather_catch                       0
qb_bf_attempt                       0
qb_bf_catch                         0
hurry_up_attempt                    0
hurry_up_catch                      0
possession_saver_attempt            0
possession_s

We'll drop categorical variables

In [7]:
# Ensuring that we only include numeric columns for PCA
# Dropping non-numeric columns (assuming non-numeric columns are 'player_name' and 'season_year')

scalable_features_df = aggregate.drop(['player_name', 'player_position', 'season_year', 'player_game_count','receptions', 'target', 'yards', 'att_yards', 'yards_after_catch',
                                       'yards_after_contact', 'touchdown', 'routes', 'pass_plays', 'contested_receptions', 'contested_targets', 'weather_attempt', 
                                       'weather_catch', 'difficult_attempt', 'difficult_catch', 'qb_bf_attempt', 'qb_bf_catch', 'hurry_up_attempt','hurry_up_catch', 'possession_saver_attempt', 'possession_saver_catch',
                                       'conversion_attempt', 'conversion_catch', 'redzone_attempt', 'redzone_catch', 'deep_attempt', 'deep_catch', 'deep_sideline_attempt', 'deep_sideline_catch', 'clutch_catch', 'conversion_catch', 'redzone_catch',
                                       'difficult_success_rate', 'cross_success_rate', 'curl_success_rate', 'post_success_rate', 'underneath_screen_success_rate',
                                        'flat_success_rate', 'slant_success_rate', 'wr_screen_success_rate', 'comeback_success_rate', 'go_success_rate',
                                        'in_success_rate', 'deep_success_rate', 'play_action_success_rate', 'rpo_success_rate', 'hurry_up_success_rate',
                                        'deep_sideline_success_rate', 'possession_saver_success_rate', 'route_rate', 'large_yac_catch', 'tackle_breaker_catch', 'beast_catch', 'play_action_attempt', 'play_action_catch', 'rpo_attempt', 'rpo_catch',
                                       'cross_attempt', 'cross_catch', 'corner_attempt', 'corner_catch', 'curl_attempt', 'out_attempt', 'out_catch', 'curl_catch', 'post_attempt', 'post_catch', 'underneath_screen_attempt',
                                       'underneath_screen_catch', 'flat_attempt', 'flat_catch', 'slant_attempt', 'slant_catch', 'wr_screen_attempt', 'wr_screen_catch',
                                       'comeback_attempt', 'comeback_catch', 'go_attempt', 'go_catch', 'in_attempt', 'in_catch', 'slot_snaps', 'wide_snaps', 'route_rate', 'playing_style'], axis=1)
scalable_features_df.head()

Unnamed: 0,difficult_pct,corner_success_rate,out_success_rate,slot_rate,wide_rate,contested_catch_rate,cross_pct,corner_pct,out_pct,curl_pct,post_pct,underneath_screen_pct,flat_pct,slant_pct,wr_screen_pct,comeback_pct,go_pct,in_pct,deep_pct,play_action_pct,rpo_pct,hurry_up_pct,deep_sideline_pct,possession_saver_pct,conversion_pct,redzone_pct,adot,avg_yac,avg_yacon,catch_rate,yprr,height_in,weight_lbs,40,bench,vertical,broad_jump,shuttle,3_cone
0,0.18,0.47,0.79,0.3,0.69,0.56,0.03,0.1,0.21,0.1,0.07,0.0,0.09,0.07,0.07,0.1,0.11,0.08,0.15,0.24,0.02,0.18,0.11,0.58,0.21,0.1,10.1,4.88,1.03,0.7,2.62,73.25,202,4.43,14,37.5,126,4.27,7.02
1,0.16,0.43,0.59,0.42,0.54,0.52,0.09,0.04,0.13,0.11,0.11,0.0,0.04,0.11,0.08,0.1,0.14,0.05,0.22,0.41,0.15,0.03,0.09,0.65,0.2,0.04,12.39,4.05,0.49,0.7,3.2,68.13,185,4.29,13,40.5,129,4.06,6.53
2,0.29,0.27,0.62,0.3,0.7,0.44,0.03,0.06,0.14,0.12,0.08,0.0,0.04,0.08,0.04,0.09,0.19,0.11,0.2,0.2,0.04,0.05,0.11,0.56,0.22,0.07,11.83,4.93,0.95,0.56,2.45,72.88,212,4.56,14,39.5,123,4.3,6.82
3,0.26,0.38,0.56,0.26,0.74,0.5,0.03,0.06,0.12,0.11,0.01,0.01,0.04,0.24,0.04,0.09,0.15,0.1,0.19,0.32,0.3,0.17,0.14,0.58,0.17,0.04,12.1,6.23,2.18,0.61,2.59,72.5,226,4.49,19,36.5,120,4.25,7.0
4,0.18,0.42,0.71,0.34,0.66,0.5,0.03,0.08,0.11,0.13,0.08,0.01,0.1,0.14,0.06,0.09,0.1,0.06,0.15,0.3,0.17,0.1,0.09,0.56,0.18,0.12,11.23,3.88,0.93,0.7,2.49,72.0,195,4.46,11,35.0,115,4.32,7.03


In [8]:
scalable_features_df.columns

Index(['difficult_pct', 'corner_success_rate', 'out_success_rate', 'slot_rate',
       'wide_rate', 'contested_catch_rate', 'cross_pct', 'corner_pct',
       'out_pct', 'curl_pct', 'post_pct', 'underneath_screen_pct', 'flat_pct',
       'slant_pct', 'wr_screen_pct', 'comeback_pct', 'go_pct', 'in_pct',
       'deep_pct', 'play_action_pct', 'rpo_pct', 'hurry_up_pct',
       'deep_sideline_pct', 'possession_saver_pct', 'conversion_pct',
       'redzone_pct', 'adot', 'avg_yac', 'avg_yacon', 'catch_rate', 'yprr',
       'height_in', 'weight_lbs', '40', 'bench', 'vertical', 'broad_jump',
       'shuttle', '3_cone'],
      dtype='object')

In [9]:
scalable_features_df.loc[2, ['cross_pct', 'corner_pct', 'out_pct', 'curl_pct',  'post_pct', 'underneath_screen_pct', 'flat_pct', 'slant_pct', 'wr_screen_pct', 'comeback_pct', 'go_pct', 'in_pct']].sum()

0.9999999999999996

In [10]:
scalable_features_df.loc[6, scalable_features_df.columns[6:18]].sum()

0.9999999999999994

# PCA

## Feature scaling
- Using standardization to handle potential outliers

In [11]:
scalable_features_df.describe()

Unnamed: 0,difficult_pct,corner_success_rate,out_success_rate,slot_rate,wide_rate,contested_catch_rate,cross_pct,corner_pct,out_pct,curl_pct,post_pct,underneath_screen_pct,flat_pct,slant_pct,wr_screen_pct,comeback_pct,go_pct,in_pct,deep_pct,play_action_pct,rpo_pct,hurry_up_pct,deep_sideline_pct,possession_saver_pct,conversion_pct,redzone_pct,adot,avg_yac,avg_yacon,catch_rate,yprr,height_in,weight_lbs,40,bench,vertical,broad_jump,shuttle,3_cone
count,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0
mean,0.22,0.28,0.47,0.38,0.6,0.36,0.07,0.06,0.11,0.16,0.05,0.01,0.07,0.1,0.08,0.08,0.13,0.08,0.17,0.19,0.09,0.13,0.1,0.51,0.19,0.05,10.73,4.24,1.12,0.61,1.35,72.33,199.08,4.48,14.34,35.97,122.5,4.26,6.98
std,0.15,0.34,0.38,0.23,0.24,0.29,0.12,0.07,0.09,0.18,0.09,0.07,0.11,0.12,0.13,0.1,0.13,0.07,0.15,0.17,0.12,0.16,0.13,0.22,0.12,0.06,5.65,3.2,1.72,0.19,0.98,2.43,15.92,0.1,3.94,2.7,9.41,0.14,0.18
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-4.0,0.0,0.0,0.0,0.0,66.38,155.0,4.25,2.0,28.5,10.0,3.8,6.51
25%,0.15,0.0,0.0,0.2,0.41,0.0,0.0,0.0,0.03,0.08,0.0,0.0,0.0,0.02,0.0,0.0,0.05,0.0,0.07,0.1,0.0,0.03,0.02,0.41,0.13,0.0,7.87,2.6,0.4,0.51,0.81,70.38,186.0,4.42,12.0,34.5,120.0,4.19,6.88
50%,0.2,0.0,0.57,0.31,0.67,0.39,0.04,0.04,0.11,0.13,0.04,0.0,0.04,0.08,0.03,0.06,0.11,0.07,0.16,0.19,0.07,0.09,0.08,0.54,0.2,0.04,10.73,3.93,0.86,0.62,1.23,72.63,201.0,4.49,14.0,35.5,123.0,4.25,7.0
75%,0.27,0.5,0.77,0.56,0.79,0.54,0.08,0.08,0.15,0.2,0.07,0.0,0.1,0.14,0.09,0.11,0.17,0.11,0.23,0.25,0.14,0.18,0.14,0.65,0.26,0.08,13.01,5.07,1.25,0.7,1.71,74.0,211.0,4.55,16.0,37.5,125.0,4.32,7.05
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5,0.5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.33,1.0,1.0,1.0,1.0,1.0,1.0,0.75,0.44,45.0,34.0,18.0,1.0,10.0,77.38,238.0,4.75,29.0,45.0,140.0,5.01,7.64


In [12]:
mean = scalable_features_df.mean()
std = scalable_features_df.std()
scaled_features = (scalable_features_df - mean) / std

In [13]:
# Convert the scaled data to a NumPy array
scaled_features_array = scaled_features.to_numpy()

## 1. Covariance Matrix

In [14]:
# Transpose the data to get columns as features
transposed_data = scaled_features_array.T

In [15]:
scaled_features_array.shape, transposed_data.shape

((225, 39), (39, 225))

In [16]:
# Initialize an empty covariance matrix
n_features = len(transposed_data)
cov_matrix = [[0 for _ in range(n_features)] for _ in range(n_features)]
n_features

39

In [17]:
# Calculate the covariance matrix
for i in range(n_features): # Iterates over features 1 to 37
    for j in range(n_features): # While holding row i the same, iterates over features 1 to 37 for row j, which changes 
        cov_matrix[i][j] = fn.calculate_covariance(transposed_data[i], transposed_data[j])

In [18]:
cov_matrix_df = pd.DataFrame(cov_matrix)

## 2. Eigenvalues and eigenvectors

In [19]:
# Step 2: Compute the Eigenvalues and Eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

In [20]:
eigenvalues_df = pd.DataFrame(eigenvalues)
eigenvalues_df.head()

Unnamed: 0,0
0,5.55
1,3.07
2,2.79
3,2.32
4,2.11


In [21]:
eigenvectors_df = pd.DataFrame(eigenvectors)

In [22]:
eigenvectors_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38
0,0.2,-0.09,-0.14,0.14,-0.11,-0.25,0.1,-0.14,-0.04,-0.23,0.01,0.34,0.06,0.12,-0.05,-0.11,0.06,0.0,0.03,0.05,-0.12,-0.22,0.26,-0.08,0.27,0.15,-0.03,-0.19,0.1,-0.27,0.21,0.1,0.28,-0.16,0.04,0.11,0.24,-0.15,-0.04
1,0.01,-0.07,0.31,-0.1,-0.06,-0.07,-0.25,-0.0,-0.01,-0.01,-0.33,0.02,0.2,-0.09,-0.04,-0.15,-0.16,0.0,0.01,-0.03,0.01,-0.05,0.04,-0.03,0.04,-0.05,0.17,0.2,0.07,0.05,-0.05,-0.16,-0.29,-0.12,-0.02,0.24,0.39,-0.26,0.34
2,-0.06,-0.03,0.39,-0.04,-0.09,-0.15,0.04,0.12,-0.03,0.07,0.16,0.03,-0.13,0.14,0.05,-0.13,0.2,0.0,0.0,-0.02,0.01,0.07,-0.09,-0.01,-0.11,-0.02,0.16,-0.27,0.43,0.03,0.07,-0.08,0.05,-0.08,0.4,-0.24,-0.21,-0.21,0.17
3,-0.22,-0.21,0.07,0.24,-0.03,-0.04,0.23,0.12,-0.33,-0.1,-0.07,-0.04,0.13,-0.02,-0.19,-0.22,-0.12,0.0,-0.28,0.61,0.03,0.06,-0.03,-0.09,-0.03,-0.14,-0.07,0.08,0.08,0.01,-0.03,0.02,-0.01,0.01,-0.0,0.01,-0.04,0.16,-0.03
4,0.24,0.19,-0.04,-0.21,0.02,0.01,-0.18,-0.1,0.37,0.05,0.06,0.01,-0.19,-0.12,0.21,0.11,0.11,0.0,-0.28,0.64,-0.03,-0.06,0.01,-0.09,0.06,0.01,0.05,0.02,-0.07,0.04,0.05,-0.03,0.02,-0.03,0.02,-0.09,0.03,-0.11,0.11


## 3. Sort eigenvectors and eigenvalues by eigenvalue magnitude

In [23]:
sorted_index = np.argsort(eigenvalues)[::-1] # argsort sorts the eigenvalues in ascending order by default
                                             # start:stop:step. Start and stop are ommitted, so the slice is the entire array. The step is -1, so the index decreases by 1 with each step.
                                             # This effectively reverses the order.
sorted_eigenvalues = eigenvalues[sorted_index]
sorted_eigenvectors = eigenvectors[:,sorted_index] # Sorts the columns by index in descending order for each eigenvalue magnitude. Leaves rows the same.

In [24]:
sorted_index

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 12, 11, 13, 14, 15, 16,
       26, 32, 33, 34, 35, 38, 37, 36, 31, 30, 29, 28, 27, 25, 24, 23, 22,
       21, 20, 19, 18, 17])

In [25]:
sorted_eigenvalues

array([5.55, 3.07, 2.79, 2.32, 2.11, 1.94, 1.6 , 1.53, 1.49, 1.39, 1.26,
       1.15, 1.15, 1.06, 0.95, 0.91, 0.85, 0.79, 0.72, 0.68, 0.65, 0.61,
       0.55, 0.52, 0.5 , 0.44, 0.4 , 0.37, 0.33, 0.32, 0.22, 0.19, 0.17,
       0.14, 0.12, 0.1 , 0.04, 0.04, 0.  ])

In [26]:
sorted_eigenvectors_df = pd.DataFrame(sorted_eigenvectors)
sorted_eigenvectors_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38
0,0.2,-0.09,-0.14,0.14,-0.11,-0.25,0.1,-0.14,-0.04,-0.23,0.01,0.06,0.34,0.12,-0.05,-0.11,0.06,-0.03,0.28,-0.16,0.04,0.11,-0.04,-0.15,0.24,0.1,0.21,-0.27,0.1,-0.19,0.15,0.27,-0.08,0.26,-0.22,-0.12,0.05,0.03,0.0
1,0.01,-0.07,0.31,-0.1,-0.06,-0.07,-0.25,-0.0,-0.01,-0.01,-0.33,0.2,0.02,-0.09,-0.04,-0.15,-0.16,0.17,-0.29,-0.12,-0.02,0.24,0.34,-0.26,0.39,-0.16,-0.05,0.05,0.07,0.2,-0.05,0.04,-0.03,0.04,-0.05,0.01,-0.03,0.01,0.0
2,-0.06,-0.03,0.39,-0.04,-0.09,-0.15,0.04,0.12,-0.03,0.07,0.16,-0.13,0.03,0.14,0.05,-0.13,0.2,0.16,0.05,-0.08,0.4,-0.24,0.17,-0.21,-0.21,-0.08,0.07,0.03,0.43,-0.27,-0.02,-0.11,-0.01,-0.09,0.07,0.01,-0.02,0.0,0.0
3,-0.22,-0.21,0.07,0.24,-0.03,-0.04,0.23,0.12,-0.33,-0.1,-0.07,0.13,-0.04,-0.02,-0.19,-0.22,-0.12,-0.07,-0.01,0.01,-0.0,0.01,-0.03,0.16,-0.04,0.02,-0.03,0.01,0.08,0.08,-0.14,-0.03,-0.09,-0.03,0.06,0.03,0.61,-0.28,0.0
4,0.24,0.19,-0.04,-0.21,0.02,0.01,-0.18,-0.1,0.37,0.05,0.06,-0.19,0.01,-0.12,0.21,0.11,0.11,0.05,0.02,-0.03,0.02,-0.09,0.11,-0.11,0.03,-0.03,0.05,0.04,-0.07,0.02,0.01,0.06,-0.09,0.01,-0.06,-0.03,0.64,-0.28,0.0


## 4. Select subset of eigenvectors to form principal components

In [27]:
# Cumulative sum divided by sum.
# Each element represents the marginal variance explained by adding one more principal component.
cumulative_var_explained = np.cumsum(sorted_eigenvalues) / np.sum(sorted_eigenvalues)
cumulative_var_explained

array([0.14, 0.22, 0.29, 0.35, 0.41, 0.46, 0.5 , 0.54, 0.57, 0.61, 0.64,
       0.67, 0.7 , 0.73, 0.75, 0.78, 0.8 , 0.82, 0.84, 0.85, 0.87, 0.89,
       0.9 , 0.91, 0.93, 0.94, 0.95, 0.96, 0.97, 0.97, 0.98, 0.98, 0.99,
       0.99, 1.  , 1.  , 1.  , 1.  , 1.  ])

In [28]:
# Finds the indices where cumulative variance explained is at least 95%.
# These indices determine how many PCs are needed to explain at least 95% of the total variance.
# [0][0] To access the first index from the first array
# +1 because Python is 0-indexed
# Returns the number of PCs needed to explain at least 95% of the variance.
num_components = np.where(cumulative_var_explained >= 0.95)[0][0] + 1 

In [29]:
np.where(cumulative_var_explained >= 0.95) # Actually an array nested within an array

(array([27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]),)

In [30]:
num_components

28

In [31]:
pca_components = sorted_eigenvectors[:, :num_components]

In [32]:
pca_components_df = pd.DataFrame(pca_components)
pca_components_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27
0,0.2,-0.09,-0.14,0.14,-0.11,-0.25,0.1,-0.14,-0.04,-0.23,0.01,0.06,0.34,0.12,-0.05,-0.11,0.06,-0.03,0.28,-0.16,0.04,0.11,-0.04,-0.15,0.24,0.1,0.21,-0.27
1,0.01,-0.07,0.31,-0.1,-0.06,-0.07,-0.25,-0.0,-0.01,-0.01,-0.33,0.2,0.02,-0.09,-0.04,-0.15,-0.16,0.17,-0.29,-0.12,-0.02,0.24,0.34,-0.26,0.39,-0.16,-0.05,0.05
2,-0.06,-0.03,0.39,-0.04,-0.09,-0.15,0.04,0.12,-0.03,0.07,0.16,-0.13,0.03,0.14,0.05,-0.13,0.2,0.16,0.05,-0.08,0.4,-0.24,0.17,-0.21,-0.21,-0.08,0.07,0.03
3,-0.22,-0.21,0.07,0.24,-0.03,-0.04,0.23,0.12,-0.33,-0.1,-0.07,0.13,-0.04,-0.02,-0.19,-0.22,-0.12,-0.07,-0.01,0.01,-0.0,0.01,-0.03,0.16,-0.04,0.02,-0.03,0.01
4,0.24,0.19,-0.04,-0.21,0.02,0.01,-0.18,-0.1,0.37,0.05,0.06,-0.19,0.01,-0.12,0.21,0.11,0.11,0.05,0.02,-0.03,0.02,-0.09,0.11,-0.11,0.03,-0.03,0.05,0.04


## 5. Transform the original data

In [33]:
pca_transformed_data = np.dot(scaled_features_array, pca_components)

In [34]:
# Creating a DataFrame of the PCA-transformed data
pca_df = pd.DataFrame(pca_transformed_data, columns=[f'PC{i+1}' for i in range(num_components)])

In [35]:
pca_df.head()

Unnamed: 0,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,PC11,PC12,PC13,PC14,PC15,PC16,PC17,PC18,PC19,PC20,PC21,PC22,PC23,PC24,PC25,PC26,PC27,PC28
0,0.29,0.37,2.0,-1.01,0.12,-0.15,-0.63,0.43,-0.14,0.06,-0.25,-0.47,0.22,0.51,0.65,-0.09,0.22,0.38,-0.01,0.03,0.37,0.49,-0.63,0.26,-0.1,0.34,-0.13,-0.64
1,0.14,-2.95,1.54,-2.64,0.46,1.13,-0.42,-1.29,-0.98,0.01,0.62,-0.18,-0.49,0.49,-0.27,-0.05,-0.46,-0.16,0.43,-0.62,-0.41,0.45,0.64,0.23,-0.95,0.29,0.82,-0.54
2,1.04,0.46,0.77,-0.47,0.21,-0.31,0.06,-0.22,-0.36,0.31,0.65,0.05,-0.07,0.65,-0.3,0.2,-0.24,0.41,0.04,-0.47,0.76,0.16,-0.59,-0.36,-0.16,-0.57,0.68,-0.77
3,0.84,1.04,1.28,-0.05,1.69,0.95,0.82,-0.88,-0.28,0.26,-0.2,-0.01,0.79,-0.23,-0.21,0.52,0.75,-0.19,0.42,-1.03,0.1,0.03,0.04,-0.58,0.11,-0.15,-0.16,0.44
4,-0.12,-0.1,1.88,0.48,-0.01,0.7,-0.16,-0.61,0.58,0.25,-0.28,-0.7,0.09,0.03,0.18,0.18,0.03,0.17,0.18,0.78,0.03,0.63,0.11,-0.05,-0.18,-0.02,-0.02,-0.46


In [36]:
# Display the shape of the original and the PCA-transformed data
original_shape = scaled_features_array.shape
pca_shape = pca_transformed_data.shape
original_shape, pca_shape, pca_components.shape

((225, 39), (225, 28), (39, 28))

# KNN

In [37]:
pca_df['playing_style'] = aggregate['playing_style'].values

In [38]:
pca_df.head()

Unnamed: 0,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,PC11,PC12,PC13,PC14,PC15,PC16,PC17,PC18,PC19,PC20,PC21,PC22,PC23,PC24,PC25,PC26,PC27,PC28,playing_style
0,0.29,0.37,2.0,-1.01,0.12,-0.15,-0.63,0.43,-0.14,0.06,-0.25,-0.47,0.22,0.51,0.65,-0.09,0.22,0.38,-0.01,0.03,0.37,0.49,-0.63,0.26,-0.1,0.34,-0.13,-0.64,Versatile
1,0.14,-2.95,1.54,-2.64,0.46,1.13,-0.42,-1.29,-0.98,0.01,0.62,-0.18,-0.49,0.49,-0.27,-0.05,-0.46,-0.16,0.43,-0.62,-0.41,0.45,0.64,0.23,-0.95,0.29,0.82,-0.54,Speedster
2,1.04,0.46,0.77,-0.47,0.21,-0.31,0.06,-0.22,-0.36,0.31,0.65,0.05,-0.07,0.65,-0.3,0.2,-0.24,0.41,0.04,-0.47,0.76,0.16,-0.59,-0.36,-0.16,-0.57,0.68,-0.77,Route Technician
3,0.84,1.04,1.28,-0.05,1.69,0.95,0.82,-0.88,-0.28,0.26,-0.2,-0.01,0.79,-0.23,-0.21,0.52,0.75,-0.19,0.42,-1.03,0.1,0.03,0.04,-0.58,0.11,-0.15,-0.16,0.44,Physical - Speedster
4,-0.12,-0.1,1.88,0.48,-0.01,0.7,-0.16,-0.61,0.58,0.25,-0.28,-0.7,0.09,0.03,0.18,0.18,0.03,0.17,0.18,0.78,0.03,0.63,0.11,-0.05,-0.18,-0.02,-0.02,-0.46,Route Technician


### Train and test split
- Train - Consists of the 23 receivers who were manually labeled with playing styles
- Test - The receivers who are not yet labeled with a playing style

In [39]:
train = pca_df[pca_df['playing_style'].notnull()]
test = pca_df[pca_df['playing_style'].isnull()]

In [40]:
train.shape, test.shape

((23, 29), (202, 29))

### Separating features from target variable

In [41]:
X_train = train.drop('playing_style', axis = 1)
y_train = train['playing_style']

X_test = test.drop('playing_style', axis = 1)
# y_test = test['playing_style']

### Features

In [42]:
pca_df.columns

Index(['PC1', 'PC2', 'PC3', 'PC4', 'PC5', 'PC6', 'PC7', 'PC8', 'PC9', 'PC10',
       'PC11', 'PC12', 'PC13', 'PC14', 'PC15', 'PC16', 'PC17', 'PC18', 'PC19',
       'PC20', 'PC21', 'PC22', 'PC23', 'PC24', 'PC25', 'PC26', 'PC27', 'PC28',
       'playing_style'],
      dtype='object')

In [43]:
features = ['PC1', 'PC2', 'PC3', 'PC4', 'PC5', 'PC6', 'PC7', 'PC8', 'PC9', 'PC10',
       'PC11', 'PC12', 'PC13', 'PC14', 'PC15', 'PC16', 'PC17', 'PC18', 'PC19',
       'PC20', 'PC21', 'PC22', 'PC23', 'PC24', 'PC25', 'PC26', 'PC27', 'PC28']

### Run the model on the test dataset
- With 4 nearest neighbors

In [44]:
X_test['playing_style'] = X_test.apply(lambda X_test: fn.knn(features, X_train, X_test, y_train, 5), axis = 1)

## Checking results

In [45]:
combined_playing_style = pd.concat([train['playing_style'], X_test['playing_style']])

# Separate dataframe to avoid altering the original aggregate df
final = aggregate
final['playing_style'] = combined_playing_style

In [46]:
pd.set_option('display.float_format', '{:.2f}'.format)

In [47]:
final = final[['player_name', 'playing_style', 'height_in', 'weight_lbs', '40', 'bench',
       'vertical', 'broad_jump', 'shuttle', '3_cone', 'cross_pct', 'corner_pct', 'out_pct', 'curl_pct', 'post_pct', 'underneath_screen_pct',
       'flat_pct', 'slant_pct', 'wr_screen_pct', 'comeback_pct', 'go_pct', 'in_pct', 'deep_pct', 'play_action_pct', 'rpo_pct',
       'hurry_up_pct', 'difficult_pct', 'deep_sideline_pct', 'possession_saver_pct', 'clutch_catch', 'conversion_pct', 'redzone_pct',
               'adot', 'avg_yac', 'avg_yacon', 'catch_rate', 'yprr', 'slot_rate', 'wide_rate', 'contested_catch_rate'
       ]]

In [48]:
final.head(10)

Unnamed: 0,player_name,playing_style,height_in,weight_lbs,40,bench,vertical,broad_jump,shuttle,3_cone,cross_pct,corner_pct,out_pct,curl_pct,post_pct,underneath_screen_pct,flat_pct,slant_pct,wr_screen_pct,comeback_pct,go_pct,in_pct,deep_pct,play_action_pct,rpo_pct,hurry_up_pct,difficult_pct,deep_sideline_pct,possession_saver_pct,clutch_catch,conversion_pct,redzone_pct,adot,avg_yac,avg_yacon,catch_rate,yprr,slot_rate,wide_rate,contested_catch_rate
0,Justin Jefferson,Versatile,73.25,202,4.43,14,37.5,126,4.27,7.02,0.03,0.1,0.21,0.1,0.07,0.0,0.09,0.07,0.07,0.1,0.11,0.08,0.15,0.24,0.02,0.18,0.18,0.11,0.58,6,0.21,0.1,10.1,4.88,1.03,0.7,2.62,0.3,0.69,0.56
1,Tyreek Hill,Speedster,68.13,185,4.29,13,40.5,129,4.06,6.53,0.09,0.04,0.13,0.11,0.11,0.0,0.04,0.11,0.08,0.1,0.14,0.05,0.22,0.41,0.15,0.03,0.16,0.09,0.65,3,0.2,0.04,12.39,4.05,0.49,0.7,3.2,0.42,0.54,0.52
2,Davante Adams,Route Technician,72.88,212,4.56,14,39.5,123,4.3,6.82,0.03,0.06,0.14,0.12,0.08,0.0,0.04,0.08,0.04,0.09,0.19,0.11,0.2,0.2,0.04,0.05,0.29,0.11,0.56,1,0.22,0.07,11.83,4.93,0.95,0.56,2.45,0.3,0.7,0.44
3,A.J. Brown,Physical - Speedster,72.5,226,4.49,19,36.5,120,4.25,7.0,0.03,0.06,0.12,0.11,0.01,0.01,0.04,0.24,0.04,0.09,0.15,0.1,0.19,0.32,0.3,0.17,0.26,0.14,0.58,2,0.17,0.04,12.1,6.23,2.18,0.61,2.59,0.26,0.74,0.5
4,Stefon Diggs,Route Technician,72.0,195,4.46,11,35.0,115,4.32,7.03,0.03,0.08,0.11,0.13,0.08,0.01,0.1,0.14,0.06,0.09,0.1,0.06,0.15,0.3,0.17,0.1,0.18,0.09,0.56,4,0.18,0.12,11.23,3.88,0.93,0.7,2.49,0.34,0.66,0.5
5,CeeDee Lamb,Slot,73.63,198,4.5,11,34.5,124,4.24,7.0,0.08,0.08,0.11,0.15,0.08,0.0,0.1,0.13,0.06,0.06,0.09,0.06,0.17,0.3,0.11,0.1,0.18,0.08,0.52,2,0.16,0.04,10.08,4.54,1.38,0.69,2.38,0.62,0.36,0.46
6,Jaylen Waddle,Route Technician,69.5,180,4.55,11,34.0,122,4.22,6.99,0.08,0.02,0.1,0.07,0.15,0.0,0.01,0.17,0.07,0.12,0.09,0.14,0.15,0.33,0.15,0.06,0.18,0.07,0.68,2,0.23,0.04,11.8,6.8,1.99,0.64,2.59,0.25,0.74,0.25
7,DeVonta Smith,Route Technician,72.25,170,4.53,9,34.0,131,4.22,6.95,0.07,0.07,0.15,0.15,0.03,0.02,0.05,0.09,0.13,0.13,0.07,0.04,0.15,0.14,0.17,0.18,0.15,0.1,0.45,1,0.2,0.05,9.68,5.16,1.03,0.7,1.98,0.25,0.75,0.42
8,Terry McLaurin,Route Technician,72.13,208,4.35,18,37.5,125,4.15,7.01,0.08,0.07,0.12,0.1,0.03,0.0,0.05,0.11,0.09,0.08,0.16,0.1,0.23,0.27,0.16,0.08,0.22,0.14,0.61,2,0.26,0.04,12.81,5.12,1.86,0.64,2.04,0.21,0.79,0.65
9,Amon-Ra St. Brown,Slot,71.5,197,4.61,20,38.5,127,4.26,6.9,0.08,0.03,0.26,0.15,0.06,0.02,0.05,0.11,0.09,0.04,0.01,0.11,0.04,0.23,0.03,0.05,0.15,0.03,0.38,2,0.18,0.05,6.47,4.87,0.9,0.73,2.4,0.6,0.39,0.38


In [49]:
final['playing_style'].value_counts()

playing_style
Route Technician         60
Slot                     46
Speedster                35
Physical - Possession    34
Versatile                24
Physical - Speedster     18
YAC Specialist            8
Name: count, dtype: int64

### Route technicians

In [50]:
final[final['playing_style'] == 'Route Technician']

Unnamed: 0,player_name,playing_style,height_in,weight_lbs,40,bench,vertical,broad_jump,shuttle,3_cone,cross_pct,corner_pct,out_pct,curl_pct,post_pct,underneath_screen_pct,flat_pct,slant_pct,wr_screen_pct,comeback_pct,go_pct,in_pct,deep_pct,play_action_pct,rpo_pct,hurry_up_pct,difficult_pct,deep_sideline_pct,possession_saver_pct,clutch_catch,conversion_pct,redzone_pct,adot,avg_yac,avg_yacon,catch_rate,yprr,slot_rate,wide_rate,contested_catch_rate
2,Davante Adams,Route Technician,72.88,212,4.56,14,39.5,123,4.3,6.82,0.03,0.06,0.14,0.12,0.08,0.0,0.04,0.08,0.04,0.09,0.19,0.11,0.2,0.2,0.04,0.05,0.29,0.11,0.56,1,0.22,0.07,11.83,4.93,0.95,0.56,2.45,0.3,0.7,0.44
4,Stefon Diggs,Route Technician,72.0,195,4.46,11,35.0,115,4.32,7.03,0.03,0.08,0.11,0.13,0.08,0.01,0.1,0.14,0.06,0.09,0.1,0.06,0.15,0.3,0.17,0.1,0.18,0.09,0.56,4,0.18,0.12,11.23,3.88,0.93,0.7,2.49,0.34,0.66,0.5
6,Jaylen Waddle,Route Technician,69.5,180,4.55,11,34.0,122,4.22,6.99,0.08,0.02,0.1,0.07,0.15,0.0,0.01,0.17,0.07,0.12,0.09,0.14,0.15,0.33,0.15,0.06,0.18,0.07,0.68,2,0.23,0.04,11.8,6.8,1.99,0.64,2.59,0.25,0.74,0.25
7,DeVonta Smith,Route Technician,72.25,170,4.53,9,34.0,131,4.22,6.95,0.07,0.07,0.15,0.15,0.03,0.02,0.05,0.09,0.13,0.13,0.07,0.04,0.15,0.14,0.17,0.18,0.15,0.1,0.45,1,0.2,0.05,9.68,5.16,1.03,0.7,1.98,0.25,0.75,0.42
8,Terry McLaurin,Route Technician,72.13,208,4.35,18,37.5,125,4.15,7.01,0.08,0.07,0.12,0.1,0.03,0.0,0.05,0.11,0.09,0.08,0.16,0.1,0.23,0.27,0.16,0.08,0.22,0.14,0.61,2,0.26,0.04,12.81,5.12,1.86,0.64,2.04,0.21,0.79,0.65
10,Amari Cooper,Route Technician,72.88,211,4.42,16,33.0,120,3.98,6.71,0.03,0.13,0.12,0.12,0.06,0.0,0.04,0.16,0.02,0.14,0.14,0.04,0.17,0.18,0.05,0.09,0.2,0.12,0.68,3,0.33,0.1,12.06,4.17,1.19,0.59,2.06,0.25,0.75,0.56
13,Garrett Wilson,Route Technician,71.75,183,4.38,12,36.0,123,4.36,6.99,0.05,0.1,0.09,0.1,0.05,0.01,0.07,0.14,0.08,0.1,0.14,0.09,0.12,0.17,0.1,0.09,0.24,0.08,0.59,4,0.22,0.1,10.53,4.63,2.3,0.56,1.85,0.36,0.63,0.36
20,Brandon Aiyuk,Route Technician,71.63,205,4.5,11,40.0,128,4.27,7.02,0.05,0.02,0.16,0.1,0.07,0.0,0.04,0.15,0.06,0.05,0.08,0.23,0.12,0.19,0.11,0.04,0.14,0.08,0.52,3,0.18,0.05,9.86,4.97,1.36,0.68,1.91,0.24,0.76,0.41
21,Jerry Jeudy,Route Technician,73.0,193,4.45,13,35.0,120,4.53,7.0,0.05,0.11,0.12,0.11,0.07,0.01,0.11,0.03,0.11,0.05,0.15,0.08,0.23,0.21,0.06,0.04,0.19,0.11,0.54,1,0.19,0.07,11.5,5.76,1.53,0.68,2.15,0.54,0.46,0.27
26,Diontae Johnson,Route Technician,70.5,183,4.53,15,33.5,123,4.45,7.09,0.03,0.03,0.17,0.17,0.03,0.0,0.03,0.05,0.05,0.19,0.16,0.08,0.15,0.16,0.07,0.09,0.24,0.08,0.53,3,0.2,0.07,10.44,2.73,0.85,0.58,1.44,0.13,0.87,0.36


Based on my understanding of football, for the most part, these receivers are correctly labeled as route technicians. Some notable inclusions are:
- Devin Duvernay - More of a speedster and not known for good route running.
- DeSean Jackson - Also more of a speedster than a refined route runner.

### Big and fast receivers

In [51]:
final[(final['weight_lbs'] >= 220) & (final['40'] < 4.5)][['player_name', 'playing_style', 'weight_lbs', '40', 'slant_pct', 'possession_saver_pct', 'conversion_pct', 'redzone_pct', 'contested_catch_rate']]

Unnamed: 0,player_name,playing_style,weight_lbs,40,slant_pct,possession_saver_pct,conversion_pct,redzone_pct,contested_catch_rate
3,A.J. Brown,Physical - Speedster,226,4.49,0.24,0.58,0.17,0.04,0.5
14,DK Metcalf,Physical - Speedster,228,4.33,0.12,0.55,0.19,0.16,0.48
71,Chase Claypool,Physical - Speedster,238,4.42,0.07,0.49,0.25,0.04,0.47
97,Julio Jones,Versatile,220,4.34,0.23,0.58,0.21,0.05,0.29
183,Dezmon Patmon,Physical - Speedster,228,4.48,0.0,0.67,0.33,0.17,0.0
184,Dareke Young,Physical - Possession,223,4.44,0.0,0.0,0.0,0.0,0.0
185,Simi Fehoko,Physical - Possession,222,4.43,0.25,0.25,0.25,0.0,0.5
191,Keith Kirkwood,Physical - Speedster,221,4.45,0.25,0.75,0.75,0.0,0.5
204,Miles Boykin,Versatile,220,4.42,0.0,0.67,0.33,0.0,1.0
209,Jalen Camp,Physical - Speedster,226,4.48,0.0,1.0,0.0,0.0,1.0


- I don't know enough about the receivers labeled "Physical - Possession" to determine the accuracy of their labels.
  - They all possess great speed, so they may belong in the "Physical - Speedster" category.
- Julio Jones, despite being very big and fast, is a truly versatile receiver. I'm impressed that the model didn't classify him as "Physical - Speedster."

### Physical - Possession

In [52]:
final[final['playing_style'] == 'Physical - Possession'][['player_name', 'playing_style', 'weight_lbs', '40', 'slant_pct', 'adot', 'possession_saver_pct', 'conversion_pct', 'redzone_pct', 'contested_catch_rate']]

Unnamed: 0,player_name,playing_style,weight_lbs,40,slant_pct,adot,possession_saver_pct,conversion_pct,redzone_pct,contested_catch_rate
11,Mike Evans,Physical - Possession,231,4.53,0.09,12.91,0.68,0.23,0.08,0.65
18,Tee Higgins,Physical - Possession,216,4.59,0.13,10.87,0.57,0.13,0.07,0.62
24,Mike Williams,Physical - Possession,218,4.54,0.04,12.03,0.6,0.25,0.08,0.58
27,Drake London,Physical - Possession,219,4.55,0.13,10.36,0.61,0.18,0.12,0.54
29,Gabe Davis,Physical - Possession,216,4.54,0.08,15.33,0.72,0.29,0.09,0.35
30,Courtland Sutton,Physical - Possession,218,4.54,0.12,12.28,0.58,0.23,0.08,0.42
40,DeAndre Hopkins,Physical - Possession,214,4.57,0.08,10.17,0.55,0.19,0.03,0.52
54,Noah Brown,Physical - Possession,222,4.56,0.09,10.81,0.54,0.3,0.09,0.41
60,Marvin Jones Jr.,Physical - Possession,199,4.46,0.05,13.35,0.75,0.25,0.12,0.67
77,Michael Gallup,Physical - Possession,205,4.51,0.15,11.07,0.7,0.35,0.12,0.42


### Physical - Speedster

In [53]:
final[final['playing_style'] == 'Physical - Speedster'][['player_name', 'playing_style', 'weight_lbs', '40', 'slant_pct', 'adot', 'possession_saver_pct', 'conversion_pct', 'redzone_pct', 'contested_catch_rate']]

Unnamed: 0,player_name,playing_style,weight_lbs,40,slant_pct,adot,possession_saver_pct,conversion_pct,redzone_pct,contested_catch_rate
3,A.J. Brown,Physical - Speedster,226,4.49,0.24,12.1,0.58,0.17,0.04,0.5
14,DK Metcalf,Physical - Speedster,228,4.33,0.12,11.22,0.55,0.19,0.16,0.48
35,Allen Lazard,Physical - Speedster,227,4.55,0.18,12.49,0.69,0.29,0.1,0.39
71,Chase Claypool,Physical - Speedster,238,4.42,0.07,10.39,0.49,0.25,0.04,0.47
94,Equanimeous St. Brown,Physical - Speedster,214,4.48,0.08,13.03,0.55,0.24,0.03,0.29
95,Justin Watson,Physical - Speedster,215,4.44,0.18,18.74,0.71,0.21,0.06,0.25
131,Zach Pascal,Physical - Speedster,219,4.55,0.11,6.05,0.47,0.26,0.0,0.67
140,N'Keal Harry,Physical - Speedster,228,4.53,0.22,17.0,0.67,0.22,0.11,0.75
141,Breshad Perriman,Physical - Speedster,212,4.52,0.05,14.16,0.58,0.37,0.0,0.0
144,Jalen Reagor,Physical - Speedster,206,4.47,0.0,13.54,0.62,0.15,0.0,0.0


### Versatile

In [54]:
final[final['playing_style'] == 'Versatile'][['player_name', 'playing_style', 'weight_lbs', '40', 'slant_pct', 'adot', 'possession_saver_pct', 'conversion_pct', 'redzone_pct', 'contested_catch_rate']]

Unnamed: 0,player_name,playing_style,weight_lbs,40,slant_pct,adot,possession_saver_pct,conversion_pct,redzone_pct,contested_catch_rate
0,Justin Jefferson,Versatile,202,4.43,0.07,10.1,0.58,0.21,0.1,0.56
15,Ja'Marr Chase,Versatile,201,4.34,0.13,8.99,0.54,0.18,0.1,0.39
23,Michael Pittman Jr.,Versatile,223,4.52,0.21,6.89,0.42,0.19,0.06,0.5
25,DJ Moore,Versatile,210,4.42,0.07,13.01,0.64,0.22,0.06,0.57
28,Donovan Peoples-Jones,Versatile,212,4.48,0.1,11.67,0.54,0.24,0.08,0.38
34,George Pickens,Versatile,195,4.47,0.04,14.76,0.69,0.31,0.07,0.68
44,Mack Hollins,Versatile,221,4.53,0.04,12.68,0.55,0.26,0.09,0.4
52,Alec Pierce,Versatile,211,4.41,0.1,11.72,0.58,0.22,0.05,0.43
56,DeVante Parker,Versatile,209,4.45,0.15,15.91,0.74,0.17,0.06,0.53
58,Corey Davis,Versatile,209,4.54,0.12,13.45,0.73,0.36,0.08,0.47


### Speedster

In [55]:
final[final['playing_style'] == 'Speedster'][['player_name', 'playing_style', 'weight_lbs', '40', 'slant_pct', 'adot', 'possession_saver_pct', 'conversion_pct', 'redzone_pct', 'contested_catch_rate']]

Unnamed: 0,player_name,playing_style,weight_lbs,40,slant_pct,adot,possession_saver_pct,conversion_pct,redzone_pct,contested_catch_rate
1,Tyreek Hill,Speedster,185,4.29,0.11,12.39,0.65,0.2,0.04,0.52
16,Chris Olave,Speedster,187,4.39,0.08,14.17,0.67,0.29,0.05,0.33
17,Tyler Lockett,Speedster,182,4.4,0.06,10.6,0.56,0.22,0.06,0.38
42,Marquise Brown,Speedster,166,4.27,0.05,11.26,0.5,0.23,0.05,0.65
43,Brandin Cooks,Speedster,189,4.33,0.05,10.73,0.54,0.25,0.06,0.56
45,Marquez Valdes-Scantling,Speedster,206,4.37,0.06,13.9,0.74,0.3,0.11,0.38
64,Darnell Mooney,Speedster,176,4.38,0.0,11.98,0.52,0.21,0.05,0.83
72,Elijah Moore,Speedster,178,4.35,0.05,11.6,0.58,0.26,0.08,0.17
78,Isaiah McKenzie,Speedster,173,4.42,0.17,8.71,0.43,0.17,0.05,0.44
79,Trent Sherfield,Speedster,203,4.45,0.12,10.86,0.61,0.31,0.12,0.62


- Jason Moore is rather big and has middling speed. He should not be in this category
- A few other receivers are small enough, but have > 4.5 40s.
- All other receivers are great examples of speedsters.

### Spot checking select WRs

In [56]:
final[final['player_name'] == 'Nico Collins']['playing_style']

67    Versatile
Name: playing_style, dtype: object

In [57]:
final[final['player_name'] == 'Terry McLaurin']['playing_style']

8    Route Technician
Name: playing_style, dtype: object

In [58]:
final[final['player_name'] == 'Josh Gordon']['playing_style']

221    Physical - Speedster
Name: playing_style, dtype: object

In [59]:
final[final['player_name'] == 'Quez Watkins']['playing_style']

90    Slot
Name: playing_style, dtype: object

In [60]:
final[final['player_name'] == 'Chase Claypool']['playing_style']

71    Physical - Speedster
Name: playing_style, dtype: object

In [61]:
final[final['player_name'] == 'Michael Thomas']['playing_style']

124    Physical - Possession
Name: playing_style, dtype: object

In [62]:
final[final['player_name'] == 'DeAndre Hopkins']['playing_style']

40    Physical - Possession
Name: playing_style, dtype: object

In [63]:
final[final['player_name'] == 'Marquise Brown']['playing_style']

42    Speedster
Name: playing_style, dtype: object

In [64]:
final[final['player_name'] == 'Marquez Valdes-Scantling']['playing_style']

45    Speedster
Name: playing_style, dtype: object

In [65]:
final[final['player_name'] == 'Devin Duvernay']['playing_style']

83    Speedster
Name: playing_style, dtype: object