# Weighted Average Combine Approach

Using 2020 NFL Combine data to better understand positional strengths to develop a data-based athleticism score

In [7]:
import math as math
import numpy as np
import pandas as pd
import zipfile
import os

Our first task is reading the csv from our Combine data export found on Pro Football Reference and converting it to a usable dataframe

In [8]:
combine = pd.read_csv("combine_data/2020combine.csv")
combine

Unnamed: 0,Player,Pos,School,College,Ht,Wt,40yd,Vertical,Bench,Broad Jump,3Cone,Shuttle,Drafted (tm/rnd/yr)
0,Trey Adams\AdamTr00,OL,Washington,College Stats,6-8,318,5.60,24.5,,92.0,,,
1,Hakeem Adeniji\AdenHa00,OL,Kansas,College Stats,6-4,302,5.17,34.0,26.0,115.0,,,Cincinnati Bengals / 6th / 180th pick / 2020
2,McTelvin Agim\AgimMc00,DL,Arkansas,College Stats,6-3,309,4.98,,27.0,,,,Denver Broncos / 3rd / 95th pick / 2020
3,Salvon Ahmed\AhmeSa00,RB,Washington,College Stats,5-11,197,4.62,34.5,,120.0,,,
4,Brandon Aiyuk\AiyuBr00,WR,Arizona State,College Stats,6-0,205,4.50,40.0,11.0,128.0,,,San Francisco 49ers / 1st / 25th pick / 2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...
332,D.J. Wonnum\WonnDJ00,DL,South Carolina,College Stats,6-5,258,4.73,34.5,20.0,123.0,7.25,4.44,Minnesota Vikings / 4th / 117th pick / 2020
333,Dom Wood-Anderson\WoodDo01,TE,Tennessee,College Stats,6-4,261,4.92,35.0,,119.0,,,
334,David Woodward\WoodDa04,LB,Utah State,College Stats,6-2,230,4.79,33.5,16.0,114.0,7.34,4.37,
335,Chase Young\YounCh04,DL,Ohio State,College Stats,6-5,264,,,,,,,Washington Football Team / 1st / 2nd pick / 2020


Since our height is not in a functional format, we define a function `parse_ht` that is able to convert our entries as floats that we can use for our analysis

In [9]:
def parse_ht(ht):
    ht_ = ht.split("-")
    ft_ = float(ht_[0])
    in_ = float(ht_[1])
    return (12*ft_) + in_

In [10]:
combine['HtNum'] = combine['Ht'].apply(lambda x: parse_ht(x))

To simply the process of hard-coding every position for all the events, we create an array that has all the unique positions except Kickers (stored as `unique_positions`) and an array of the events (stored as `events`)

In [11]:
events = ["HtNum", "Wt", "40yd", "Vertical", "Broad Jump", "3Cone", "Shuttle"]
unique_positions = combine.Pos.unique()
unique_positions = np.delete(unique_positions, 7)

We define a function `create_position_average` that takes in a string arguement (position) and calculates the average for all the combine events for the given position

In [12]:
def create_position_averages(pos):
    position = combine[combine["Pos"] == pos]
    avg = []
    for i in events:
        avg.append(position[i].mean())
    return avg

An example of the function for Quarterbacks is shown below

In [13]:
create_position_averages("QB")

[75.23529411764706,
 222.76470588235293,
 4.786153846153846,
 31.923076923076923,
 116.0,
 7.244166666666668,
 4.5075]

Our next task involves us looking at how to convert the raw averages to functional values that we can compare against each other. To do this, we calculate the mean and standard deviation of the population (all Combine athletes) and compute a z-score for each event by position. The z-score will then be converted to a percentile that can be used to make a radar graph.

We create a list similar to the one created by our `create_position_average` function for the mean and standard deviation of the Combine athletes stored in `combine_mean` and `combine_std` respectively.

In [14]:
combine_mean = []
combine_std = []
for i in events:
    combine_mean.append(combine[i].mean())
    combine_std.append(combine[i].std())

In [15]:
combine_std

[2.7168168292625676,
 44.852984694172754,
 0.2744519423513249,
 4.04320094672307,
 8.415460664781534,
 0.3810437652996868,
 0.2524083291467251]

The `compute_z` takes in the position as a string argument and uses the z-score formula below to calculate the z-score for each event by position:
$$ z =\frac{x_i-\mu}{\sigma} $$ 

In [16]:
def compute_z(pos):
    avg = create_position_averages(pos)
    z = []
    for i in np.arange(7):     
        z.append((avg[i] - combine_mean[i])/(combine_std[i]))
    return z

An example of the function for Quarterbacks is shown below

In [17]:
compute_z("QB")

[0.5442463193902192,
 -0.39974276534111314,
 0.17826575142885678,
 -0.4033742320536672,
 -0.34604568985754813,
 -0.21588571249081145,
 0.18694906051444668]

The `z_to_tile` converts the z-score to appropriate the percentile

In [18]:
def z_to_tile(z_score):
    return (.5 * (math.erf(z_score / 2 ** .5) + 1)) * 100

The last thing we do is apply these functions appropriately to get our final percentiles by position. We will print these out and manually add them to our Google Sheets to do further analysis

In [19]:
for i in unique_positions:
    tile = []
    z = compute_z(i)
    for j in z:
        tile.append(z_to_tile(j))
    print(tile)

[84.53840154231358, 95.21086716980048, 95.42932163760082, 12.499345679627, 7.8958677124236765, 93.2649217817442, 92.00210455758344]
[71.55470102482073, 84.05014719133568, 75.93785463008909, 21.13585497765763, 28.804052236835187, 75.39110953069952, 73.00109994595556]
[6.76960009573771, 26.45638433348837, 26.090790055527545, 67.21902318649876, 62.99493463321713, 28.098505769852107, 23.98019369119286]
[41.62124358233166, 21.44884585191573, 20.940063177123854, 73.88860221481143, 71.70795297999535, 30.64095091880974, 37.45715128188315]
[17.60459345248237, 14.569719154652333, 18.991969942953478, 74.0921093554199, 78.990108198, 20.01954589189907, 15.7716558128355]
[80.58007609022046, 58.609163512563015, 52.36810697760181, 54.391769755096774, 41.32101848206165, 37.690229330494084, 37.17100901092647]
[54.23249662797572, 49.30158083017649, 39.808179624942895, 55.65476855252203, 62.150273513992325, 27.229180101764623, 27.02044935643758]
[25.43861639165418, 21.834334425291363, 25.686481357211548, 