# QB Stat Predictor

## Intro

<p>
Have got a large dataset from profootbal reference which has the stats from all games for almost every player <br>
I have taken Aaron Rodgers stats over every game <br>
The aim is to see if I can accurately predict upcoming stats. <br>
So model will be trained on data from beginning of player career. <br>
Then will be tested on this seasons (2025) games to see how it performs. 
</p>

<p>
I feel for the dataset I need to define all the column names for my own sake<br>
I'll only do the technical ones - some are self explanatory
<br><br>
Yds         = yards gained by passing
<br><br>
Y/A         = yards gained per pass attempt
<br><br>
AY/A        = adjusted yards per pass attempt - To give a more comprehensive view of a quarterback's    performance than a simple yards-per-attempt average by including the impact of touchdowns and interceptions. Touchdowns are weighted positively, while interceptions are weighted negatively, providing a more robust single-number metric for evaluating a quarterback. 
<br><br>
Rate        = passer rating
<br><br>
Sk          = sacks
<br><br>
Yds         = yards LOST due to sacking (change this to yds_sk)
<br><br>
Att         = rushing attempts (change to att_rush)
<br><br>
Yds         = total rushing yards (change to yds_rush)
<br><br>
TD          = rushing TD (change to td_rush)
<br><br>
Y/A         = rushing yards per attempt (change to Y/A_rush)
<br><br>

</p>

## Import Libraries

In [None]:
#import necessary libraries

import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## Import Datasets

In [44]:
#define path to datasets

#regular season
path_to_dataset = r'C:\Users\ronan\Downloads\Datasets\NFLdata\Aaron_rodgers.xlsx'
df_reg = pd.read_excel(path_to_dataset, header=1)

#playoffs
path_to_dataset = r'C:\Users\ronan\Downloads\Datasets\NFLdata\aaron_rodgers_poff.xlsx'
df_post = pd.read_excel(path_to_dataset, header=1)

In [45]:
df_reg.head()

Unnamed: 0,Rk,Gcar,Gtm,Week,Date,Team,Unnamed: 6,Opp,Result,GS,Cmp,Att,Cmp%,Yds,TD,Int,Y/A,AY/A,Rate,Sk,Yds.1,Att.1,Yds.2,TD.1,Y/A.1,Tgt,Rec,Yds.3,Y/R,TD.2,Ctch%,Y/Tgt,Sk.1,Comb,Solo,Ast,TFL,QBHits,Sfty,Fmb,FL,FF,FR,Yds.4,FRTD,OffSnp,Off%,DefSnp,Def%,STSnp,ST%
0,1.0,1.0,5.0,5,2005-10-09,GNB,,NOR,"W, 52-3",,1.0,1.0,100.0,0.0,0.0,0.0,0.0,0.0,79.2,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,,0.0,,,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
1,2.0,2.0,14.0,15,2005-12-19,GNB,@,BAL,"L, 3-48",,8.0,15.0,53.3,65.0,0.0,1.0,4.3,1.33,36.8,3.0,28.0,1.0,8.0,0.0,8.0,0.0,0.0,0.0,,0.0,,,0.0,0.0,0.0,0.0,0.0,,0.0,2.0,2.0,0.0,0.0,0.0,0.0,,,,,,
2,3.0,3.0,16.0,17,2006-01-01,GNB,,SEA,"W, 23-17",,0.0,0.0,,0.0,0.0,0.0,,,,0.0,,1.0,-1.0,0.0,-1.0,0.0,0.0,0.0,,0.0,,,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
3,4.0,4.0,4.0,4,2006-10-02,GNB,@,PHI,"L, 9-31",,2.0,3.0,66.7,14.0,0.0,0.0,4.7,4.67,77.1,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,,0.0,,,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
4,5.0,5.0,10.0,11,2006-11-19,GNB,,NWE,"L, 0-35",,4.0,12.0,33.3,32.0,0.0,0.0,2.7,2.67,42.4,3.0,18.0,2.0,11.0,0.0,5.5,0.0,0.0,0.0,,0.0,,,0.0,0.0,0.0,0.0,0.0,,0.0,1.0,1.0,0.0,0.0,0.0,0.0,,,,,,


## Prep the dataframes

In [46]:
#make column to say whether its a regular or post season game

#regular season
df_reg['phase'] = 'reg'
col_pos = df_reg.columns.get_loc('Date')+1
df_reg.insert(col_pos, 'phase', df_reg.pop('phase'))

#post season
df_post['phase'] = 'post'
col_pos = df_post.columns.get_loc('Date')+1
df_post.insert(col_pos, 'phase', df_post.pop('phase'))

In [47]:
# join the regular and post seasons dataframes together
# sort by date so its in chronological order

df = pd.concat([df_reg, df_post], ignore_index=True)
df = df.sort_values('Date').reset_index(drop=True)

In [48]:
#change the unnamed: 6 column to home/away

df = df.rename(columns={'Unnamed: 6' : 'Home/Away'})


# convert the values in home/away to home or away
# super bowl is down as N for neutral so account for that too
# write a lambda function to do this

df['Home/Away'] = df['Home/Away'].apply(
    lambda x: 'away' if x == '@' 
              else 'home' if pd.isna(x) 
              else 'neutral' if x == 'N' 
              else x
)



<p>
So a lot of the columns have the same names due to different sub headings <br>
Like passing, rushing and recieving. <br>
Going to drop the columns I don't think I need and then rename them for easier understanding.

</p>

In [49]:
df.head()

Unnamed: 0,Rk,Gcar,Gtm,Week,Date,phase,Team,Home/Away,Opp,Result,GS,Cmp,Att,Cmp%,Yds,TD,Int,Y/A,AY/A,Rate,Sk,Yds.1,Att.1,Yds.2,TD.1,Y/A.1,Tgt,Rec,Yds.3,Y/R,TD.2,Ctch%,Y/Tgt,Sk.1,Comb,Solo,Ast,TFL,QBHits,Sfty,Fmb,FL,FF,FR,Yds.4,FRTD,OffSnp,Off%,DefSnp,Def%,STSnp,ST%
0,1.0,1.0,5.0,5,2005-10-09,reg,GNB,home,NOR,"W, 52-3",,1.0,1.0,100.0,0.0,0.0,0.0,0.0,0.0,79.2,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,,0.0,,,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
1,2.0,2.0,14.0,15,2005-12-19,reg,GNB,away,BAL,"L, 3-48",,8.0,15.0,53.3,65.0,0.0,1.0,4.3,1.33,36.8,3.0,28.0,1.0,8.0,0.0,8.0,0.0,0.0,0.0,,0.0,,,0.0,0.0,0.0,0.0,0.0,,0.0,2.0,2.0,0.0,0.0,0.0,0.0,,,,,,
2,3.0,3.0,16.0,17,2006-01-01,reg,GNB,home,SEA,"W, 23-17",,0.0,0.0,,0.0,0.0,0.0,,,,0.0,,1.0,-1.0,0.0,-1.0,0.0,0.0,0.0,,0.0,,,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
3,4.0,4.0,4.0,4,2006-10-02,reg,GNB,away,PHI,"L, 9-31",,2.0,3.0,66.7,14.0,0.0,0.0,4.7,4.67,77.1,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,,0.0,,,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
4,5.0,5.0,10.0,11,2006-11-19,reg,GNB,home,NWE,"L, 0-35",,4.0,12.0,33.3,32.0,0.0,0.0,2.7,2.67,42.4,3.0,18.0,2.0,11.0,0.0,5.5,0.0,0.0,0.0,,0.0,,,0.0,0.0,0.0,0.0,0.0,,0.0,1.0,1.0,0.0,0.0,0.0,0.0,,,,,,


In [50]:
cols_to_drop = [
    'Gcar',
    'Rk',           #Rank - not needed
    'Gtm',          #Season game number for team - we have other trackers
    'GS',           #Games Started - empty columns
    'Tgt',          #Recieving stat
    'Rec',          #recieving stat
    'Yds.3',        #recieving stat
    'Y/R',          #recieving stat
    'TD.2',         #recieving stat
    'Ctch%',        #recieving stat
    'Y/Tgt',        #recieving stat
    'Sk.1',         #defensive stat
    'Comb',         #defensive stat
    'Solo',         #defensive state
    'Ast',          #defensive stat
    'TFL',          #D state
    'QBHits',       #D stat
    'Sfty',         #team stat
    'OffSnp',       #n/a
    'Off%',         #n/a
    'DefSnp',       #n/a
    'Def%',         #n/a
    'STSnp',        #n/a
    'ST%'           #n/a
]

df = df.drop(columns=cols_to_drop)

In [51]:
#make column titles lower case
df.columns = df.columns.str.lower()

In [52]:
# now rename the columns
df = df.rename(columns={
    'cmp' : 'cmp_pass',
    'att' : 'pass_att',
    'cmp%' : 'cmp%_pass',
    'y/a' : 'y/a_pass',
    'ay/a' : 'ay/a_pass',
    'yds' : 'pass_yds',
    'td' : 'pass_td',
    'yds.1' : 'sk_yds',
    'att.1' : 'rush_att',
    'yds.2' : 'rush_yds',
    'td.1' : 'rush_td',
    'y/a.1' : 'y/a_rush',
    'yds.4' : 'yds_fr'    
})

In [54]:
df.head()

Unnamed: 0,week,date,phase,team,home/away,opp,result,cmp_pass,pass_att,cmp%_pass,pass_yds,pass_td,int,y/a_pass,ay/a_pass,rate,sk,sk_yds,rush_att,rush_yds,rush_td,y/a_rush,fmb,fl,ff,fr,yds_fr,frtd
0,5,2005-10-09,reg,GNB,home,NOR,"W, 52-3",1.0,1.0,100.0,0.0,0.0,0.0,0.0,0.0,79.2,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0
1,15,2005-12-19,reg,GNB,away,BAL,"L, 3-48",8.0,15.0,53.3,65.0,0.0,1.0,4.3,1.33,36.8,3.0,28.0,1.0,8.0,0.0,8.0,2.0,2.0,0.0,0.0,0.0,0.0
2,17,2006-01-01,reg,GNB,home,SEA,"W, 23-17",0.0,0.0,,0.0,0.0,0.0,,,,0.0,,1.0,-1.0,0.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,2006-10-02,reg,GNB,away,PHI,"L, 9-31",2.0,3.0,66.7,14.0,0.0,0.0,4.7,4.67,77.1,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,0.0
4,11,2006-11-19,reg,GNB,home,NWE,"L, 0-35",4.0,12.0,33.3,32.0,0.0,0.0,2.7,2.67,42.4,3.0,18.0,2.0,11.0,0.0,5.5,1.0,1.0,0.0,0.0,0.0,0.0


In [53]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 276 entries, 0 to 275
Data columns (total 28 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   week       274 non-null    object        
 1   date       272 non-null    datetime64[ns]
 2   phase      276 non-null    object        
 3   team       272 non-null    object        
 4   home/away  276 non-null    object        
 5   opp        272 non-null    object        
 6   result     274 non-null    object        
 7   cmp_pass   274 non-null    float64       
 8   pass_att   274 non-null    float64       
 9   cmp%_pass  272 non-null    float64       
 10  pass_yds   274 non-null    float64       
 11  pass_td    274 non-null    float64       
 12  int        274 non-null    float64       
 13  y/a_pass   272 non-null    float64       
 14  ay/a_pass  272 non-null    float64       
 15  rate       272 non-null    float64       
 16  sk         274 non-null    float64       
 1

In [43]:
print(df['gcar'])

0        1.0
1        2.0
2        3.0
3        4.0
4        5.0
5        6.0
6        7.0
7        1.0
8        8.0
9        9.0
10      10.0
11      11.0
12      12.0
13      13.0
14      14.0
15      15.0
16      16.0
17      17.0
18      18.0
19      19.0
20      20.0
21      21.0
22      22.0
23      23.0
24      24.0
25      25.0
26      26.0
27      27.0
28      28.0
29      29.0
30      30.0
31      31.0
32      32.0
33      33.0
34      34.0
35      35.0
36      36.0
37      37.0
38      38.0
39      39.0
40       2.0
41      40.0
42      41.0
43      42.0
44      43.0
45      44.0
46      45.0
47      46.0
48      47.0
49      48.0
50      49.0
51      50.0
52      51.0
53      52.0
54      53.0
55      54.0
56       3.0
57       4.0
58       5.0
59       6.0
60      55.0
61      56.0
62      57.0
63      58.0
64      59.0
65      60.0
66      61.0
67      62.0
68      63.0
69      64.0
70      65.0
71      66.0
72      67.0
73      68.0
74      69.0
75       7.0
76      70.0