<a href="https://colab.research.google.com/github/runstats21/CFBSportStats/blob/main/preseason_ranking_predictor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**NCAAF Preseason Ranking**

133 FBS Teams will participate in the NCAAF 2023 Season. Your goal is to use preseason data to predict the AP poll in Week 6 (October 8th).

Submissions can be any mathmatical transformation of the features to rank the 133 teams. Each feauture will be represented by a variable (a,b,c,etc), and these are the only variables allowed in your formula. If your formula produces the same score for multiple teams, we will use alphabetical order as a tie-breaker.

This colab is to help you get started by importing the data, explaining the features, normalizing the data, and providing a space for students to experiment with feature weights.

Each ranking will be compared with the official week 6 AP top 25 Poll. The ranking with the lowest Least Squared Error with the official poll will be the winner. If a tiebreaker is needed, we will include other teams recieving votes that were just outside the top 25.


Features

(a) Enrollment - Schools student enrollment starting in fall 2023

(b) AP Votes 2022 - The ending AP Poll votes after the January 2023 Championship Game.

(c) New Head Coach -Boolean variable to indicate if this school got a new Head Coach. 1 means there is a new head coach, 0 means the same Head Coach as last year.

(d) Last Year Win Percentage - Each teams win percentage in 2022/2023 Season

(e) Transfer Portal - Transfer Portal Points, pulled from https://247sports.com/season/2023-football/transferteamrankings/

(f) Recruiting Portal

(g) Football Budget

(h) Returning Offensive Production

(i) Returning Defensive Production

(j) Offensive Yards Per Game

(k) Yards Allowed Per Game

# Imports and Data

In [None]:
import pandas as pd
import numpy as np
from math import log10, floor
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [None]:
from google.colab import drive
drive.mount('/drive')

Mounted at /drive


In [None]:
# EDIT this cell so you can get the preseason data from your drive
# you will need to copy and save the preseason data csv to your drive
# then replace ... in the following code with the file path to your csv
# for example if you save the csv to a folder in your drive called SportsAnalytics,
# your path would be '/drive/My Drive/SportsAnalytics/preseason_data.csv'

data = pd.read_csv('/drive/My Drive/Work/SportsAnalytics/September Meeting/preseason_data.csv')
data.index = data.index + 1

In [None]:
# you can look through the data by running this cell and changing number of teams you see

num_teams = 10
num_teams = data.shape[0] if num_teams > data.shape[0] else num_teams
data.head(num_teams)

Unnamed: 0,AA Team Name,Enrollment,AP Votes 2022,New Head Coach,Last Year Win Percentage,Transfer Portal,Recruiting Portal,Football Budget,Returning Offense Production,Returning Defense Production,Off Yards Per Game,Yards Per Game Allowed
1,Air Force,4181,40,0,0.769,0.0,130.58,67422052,0.47,0.7,397.2,254.4
2,Akron,14516,0,0,0.167,5.01,121.18,30290134,0.78,0.48,372.7,406.2
3,Alabama,38316,1303,0,0.846,34.64,328.0,195881911,0.43,0.38,477.1,318.2
4,Appalachian State,20641,0,0,0.5,3.93,161.99,38565701,0.36,0.3,455.3,347.2
5,Arizona,49471,0,0,0.417,36.16,204.72,124944926,0.72,0.42,461.9,467.7
6,Arizona State,77881,0,1,0.25,56.46,201.23,124008192,0.53,0.61,387.2,421.5
7,Arkansas,29068,0,0,0.538,62.13,228.3,144319041,0.58,0.51,471.4,465.2
8,Arkansas State,12863,0,0,0.25,0.0,168.05,32376515,0.49,0.62,314.8,405.1
9,Army,4594,0,0,0.5,0.0,110.02,62174875,0.64,0.64,366.1,359.0
10,Auburn,31526,0,1,0.417,73.98,242.9,151590763,0.72,0.71,378.5,395.2


# Normalizing Data

In [None]:
# in this cell, we will assign each feature to a letter
# YOUR FINAL SUBMISSION WILL NEED TO BE A FUNCTION IN TERMS OF THESE VARIABLES
a = data['Enrollment'].values
b = data['AP Votes 2022'].values
c = data['New Head Coach'].values
d = data['Last Year Win Percentage'].values
e = data['Transfer Portal'].values
f = data['Recruiting Portal'].values
g = data['Football Budget'].values
h = data['Returning Offense Production'].values
i = data['Returning Defense Production'].values
j = data['Off Yards Per Game'].values
k = data['Yards Per Game Allowed'].values

In [None]:
# a good start would be to normalize all of your variables
# one way of doing that is to subtract the lowest value from each list and then
# divide the list by the range (max - min)
# this will make it so each value is between 0 and 1
for index, current_list in enumerate([a, b, c, d, e, f, g, h, i, j, k]):
  letter = chr(ord('a') + index)
  current_min = np.min(current_list)
  current_range = abs(np.max(current_list) - current_min)
  if current_range == 0:
    current_range = 'Not Applicable'
  else:
    current_range = round(current_range, 2 - floor(log10(abs(current_range))))
  print(letter + '_min: ' + str(current_min) + ', ' + letter + '_range: ' + str(current_range))

a_min: 3832, a_range: 91300
b_min: 0, b_range: 1580
c_min: 0, c_range: 1
d_min: 0.083, d_range: 0.917
e_min: 0.0, e_range: 78.1
f_min: 13.85, f_range: 314.0
g_min: 20288342, g_range: 205000000
h_min: 0.19, h_range: 0.72
i_min: 0.26, i_range: 0.73
j_min: 228.1, j_range: 297.0
k_min: 254.4, k_range: 262.0


# Example 1

In [None]:
# using the min and range values, we will make a simple ranking by normalizing all stats, adding the good ones, and subtracting the bad ones
rankings = np.zeros(data.shape[0])

# enrollment
rankings = rankings + (a - 3832) / 91300.0
# ap votes
rankings = rankings + (b - 0) / 1580.0
# new head coach
rankings = rankings - (c - 0) / 1
# last year wins
rankings = rankings + (d - 0.083) / 0.917
# transfer portal
rankings = rankings + (e - 0) / 78.1
# recruiting portal
rankings = rankings + (f - 13.85) / 314.0
# football budget
rankings = rankings + (g - 20288342) / 205000000.0
# returning offensive productions
rankings = rankings + (h - 0.19) / 0.72
# returning defensive productions
rankings = rankings + (i - 0.26) / 0.73
# offensive yards
rankings = rankings + (j - 228.1) / 297.0
# yards allowed
rankings = rankings - (k - 254.4) / 262.0

In [None]:
# now we will add these rankings to the dataframe and sort the teams by rank
ranked_data = data.copy()
ranked_data['Ranking Score'] = rankings
ranked_data.sort_values(by=['Ranking Score', 'AA Team Name'], ascending=[False, True], inplace=True)
ranked_data.reset_index(drop=True, inplace=True)
ranked_data.index = ranked_data.index + 1

In [None]:
# you can look through the ranked data by running this cell and changing number of teams

num_teams = 25
num_teams = ranked_data.shape[0] if num_teams > ranked_data.shape[0] else num_teams
ranked_data.head(num_teams)

Unnamed: 0,AA Team Name,Enrollment,AP Votes 2022,New Head Coach,Last Year Win Percentage,Transfer Portal,Recruiting Portal,Football Budget,Returning Offense Production,Returning Defense Production,Off Yards Per Game,Yards Per Game Allowed,Ranking Score
1,Michigan,50278,1438,0,0.929,56.29,245.13,193559375,0.84,0.78,458.8,292.1,6.891927
2,Ohio State,61677,1394,0,0.846,46.9,290.72,225733418,0.57,0.77,490.7,321.5,6.686821
3,Florida State,45130,814,0,0.769,72.33,237.15,150777734,0.8,0.94,484.2,321.8,6.373182
4,Georgia,40118,1575,0,1.0,33.38,315.68,169026503,0.52,0.7,501.1,296.8,6.3269
5,LSU,35912,757,0,0.714,76.63,288.7,192770399,0.81,0.6,453.1,354.6,5.918466
6,Washington,52439,1097,0,0.846,40.34,223.92,149458923,0.74,0.73,515.8,372.7,5.799268
7,Penn State,47450,1200,0,0.846,34.65,270.1,170542050,0.55,0.75,433.6,323.5,5.6614
8,Tennessee,31701,1294,0,0.846,43.73,277.0,157108637,0.57,0.69,525.5,405.3,5.563903
9,USC,49318,795,0,0.786,74.14,280.44,62174875,0.75,0.78,506.6,423.9,5.551502
10,Alabama,38316,1303,0,0.846,34.64,328.0,195881911,0.43,0.38,477.1,318.2,5.427599


# Example 2

In [None]:
# we will do the same thing as the last one, but give different weights to each feature,
# depending on what we think is most important
rankings = np.zeros(data.shape[0])

# enrollment
rankings = rankings + 1 * (a - 3832) / 91300
# ap votes
rankings = rankings + 5 * (b - 0) / 1580
# new head coach
rankings = rankings - 1.5 * (c - 0) / 1
# last year wins
rankings = rankings + 3 * (d - 0.083) / 0.917
# transfer portal
rankings = rankings + 2 * (e - 0) / 78.1
# recruiting portal
rankings = rankings + 2 * (f - 13.85) / 314
# football budget
rankings = rankings + 1.5 * (g - 20288342) / 205000000
# returning offensive productions
rankings = rankings + 3 * (h - 0.19) / 0.72
# returning defensive productions
rankings = rankings + 3 * (i - 0.26) / 0.73
# offensive yards
rankings = rankings + 3.5 * (j - 228.1) / 297
# yards allowed
rankings = rankings - 3.5 * (k - 254.4) / 262

In [None]:
# now we will add these rankings to the dataframe and sort the teams by rank
ranked_data = data.copy()
ranked_data['Ranking Score'] = rankings
ranked_data.sort_values(by=['Ranking Score', 'AA Team Name'], ascending=[False, True], inplace=True)
ranked_data.reset_index(drop=True, inplace=True)
ranked_data.index = ranked_data.index + 1

In [None]:
# you can look through the ranked data by running this cell and changing number of teams

num_teams = 25
num_teams = ranked_data.shape[0] if num_teams > ranked_data.shape[0] else num_teams
ranked_data.head(num_teams)

Unnamed: 0,AA Team Name,Enrollment,AP Votes 2022,New Head Coach,Last Year Win Percentage,Transfer Portal,Recruiting Portal,Football Budget,Returning Offense Production,Returning Defense Production,Off Yards Per Game,Yards Per Game Allowed,Ranking Score
1,Michigan,50278,1438,0,0.929,56.29,245.13,193559375,0.84,0.78,458.8,292.1,19.069896
2,Georgia,40118,1575,0,1.0,33.38,315.68,169026503,0.52,0.7,501.1,296.8,18.081207
3,Ohio State,61677,1394,0,0.846,46.9,290.72,225733418,0.57,0.77,490.7,321.5,17.886393
4,Florida State,45130,814,0,0.769,72.33,237.15,150777734,0.8,0.94,484.2,321.8,16.95571
5,Washington,52439,1097,0,0.846,40.34,223.92,149458923,0.74,0.73,515.8,372.7,15.849535
6,Tennessee,31701,1294,0,0.846,43.73,277.0,157108637,0.57,0.69,525.5,405.3,15.532782
7,Penn State,47450,1200,0,0.846,34.65,270.1,170542050,0.55,0.75,433.6,323.5,15.402626
8,LSU,35912,757,0,0.714,76.63,288.7,192770399,0.81,0.6,453.1,354.6,15.079895
9,Alabama,38316,1303,0,0.846,34.64,328.0,195881911,0.43,0.38,477.1,318.2,14.745359
10,USC,49318,795,0,0.786,74.14,280.44,62174875,0.75,0.78,506.6,423.9,14.705015


# Example 3

In [None]:
# in this example, we will only use some of the features which is allowed
# this example we multiply features together instead of adding them
rankings = np.zeros(data.shape[0])

# ap votes
rankings = rankings + ((b - 0) / 1580) ** 10
# last year wins
rankings = rankings * ((d - 0.083) / 0.917) ** 5
# transfer portal
rankings = rankings * ((e - 0) / 78.1) ** 3
# recruiting portal
rankings = rankings * ((f - 13.85) / 314) ** 3
# returning offensive productions combined w yards
rankings = rankings * (((h - 0.19) / 0.72) * ((j - 228.1) / 297)) ** 2
# returning defensive productions
rankings = rankings * (((i - 0.26) / 0.73) * ((k - 254.4) / 262)) ** 2

In [None]:
# now we will add these rankings to the dataframe and sort the teams by rank
ranked_data = data.copy()
ranked_data['Ranking Score'] = rankings
ranked_data.sort_values(by=['Ranking Score', 'AA Team Name'], ascending=[False, True], inplace=True)
ranked_data.reset_index(drop=True, inplace=True)
ranked_data.index = ranked_data.index + 1

In [None]:
# you can look through the ranked data by running this cell and changing number of teams

num_teams = 25
num_teams = ranked_data.shape[0] if num_teams > ranked_data.shape[0] else num_teams
ranked_data.head(num_teams)

Unnamed: 0,AA Team Name,Enrollment,AP Votes 2022,New Head Coach,Last Year Win Percentage,Transfer Portal,Recruiting Portal,Football Budget,Returning Offense Production,Returning Defense Production,Off Yards Per Game,Yards Per Game Allowed,Ranking Score
1,Michigan,50278,1438,0,0.929,56.29,245.13,193559375,0.84,0.78,458.8,292.1,0.0002014532
2,Tennessee,31701,1294,0,0.846,43.73,277.0,157108637,0.57,0.69,525.5,405.3,0.0001798427
3,Ohio State,61677,1394,0,0.846,46.9,290.72,225733418,0.57,0.77,490.7,321.5,0.0001179649
4,Georgia,40118,1575,0,1.0,33.38,315.68,169026503,0.52,0.7,501.1,296.8,0.0001134496
5,TCU,11938,1484,0,0.867,53.68,235.07,62174875,0.33,0.71,455.0,408.2,8.007685e-05
6,Washington,52439,1097,0,0.846,40.34,223.92,149458923,0.74,0.73,515.8,372.7,1.982318e-05
7,USC,49318,795,0,0.786,74.14,280.44,62174875,0.75,0.78,506.6,423.9,1.629011e-05
8,Penn State,47450,1200,0,0.846,34.65,270.1,170542050,0.55,0.75,433.6,323.5,4.534481e-06
9,Florida State,45130,814,0,0.769,72.33,237.15,150777734,0.8,0.94,484.2,321.8,2.702196e-06
10,Oregon,22257,758,0,0.769,66.24,278.44,140565297,0.65,0.65,500.5,381.2,1.26796e-06
