# NFL Sports Betting Capstone: Automated In-Season Data Scraper
_By Justin Tunley_

My first notebook for this project was pulling the historical data for teams between 2004-2022 from ESPN. This notebook focuses on pulling that same data from ESPN, but only for in-season data.

IMPORTANT FOR ACCURATE RESULTS: as the season continues and new games are played, the data being pulled is going to change on ESPN. This notebook will automatically update and format the new information properly, so long as you re-run the notebook. This means re-scraping from ESPN, rerunning all of the preprocessing and cleaning steps, and re-importing your new CSV. Remember: if you do not rerun this whole notebook, you will have outdated data.

If this notebook looks similar, it is because I duplicated the historical data notebook and changed the scraping method to grab something different. Everything else is essentially the same.

In [43]:
%pip install beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


In [44]:
%pip install html5lib

Note: you may need to restart the kernel to use updated packages.


In [45]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sportsreference as csr
import html5lib

import requests
from bs4 import BeautifulSoup

## Scraping from ESPN
#### This will be used for the bulk of our information

In [46]:
jets23 = pd.read_html('https://www.espn.com/nfl/team/stats/_/type/team/name/nyj')
# scraping data for THIS SEASON for the New York Jets

In [47]:
jets23[1].index = list(str(x[0]) for x in jets23[0].values)
jets23[1]
# 4 Goals: 
#    1. Split up offensive and defensive stats
#    2. Get each into a single row
#    3. Renaming columns
#    4. Dropping header rows

Unnamed: 0,NY Jets,Opponents
Total Points Per Game,14.8,21.6
Total Points,163,238
Total Touchdowns,13,22
1st Downs,,
Total 1st downs,159,213
Rushing 1st downs,44,80
Passing 1st downs,98,109
1st downs by penalty,17,24
3rd down efficiency,35-145,70-165
3rd down %,24.14,42.42


**Link to in-season data:**<br>

Jets 2023: https://www.espn.com/nfl/team/stats/_/type/team/name/nyj
    
Template: https://www.espn.com/nfl/team/stats/_/type/team/name/ ... **abbrev**
                                                            

**Link to historical data:**<br>

Jets 2022: https://www.espn.com/nfl/team/stats/_/type/team/name/nyj/season/2022/seasontype/2
    
Template: https://www.espn.com/nfl/team/stats/_/type/team/name/ **team** /season/  **year**  /seasontype/2


In [48]:
big_string='''
Cardinals - ari <br>
Falcons - atl <br>
Ravens - bal<br>
Bills - buf<br>
Panthers - car<br>
Bears - chi<br>
Bengals - cin<br>
Browns - cle<br>
Cowboys - dal<br>
Broncos - den<br>
Lions - det<br>
Packers - gb<br>
Texans - hou<br>
Colts - ind<br>
Jaguars - jax<br>
Chiefs - kc<br>
Raiders - lv<br>
Chargers - lac<br>
Rams - lar<br>
Dolphins - mia<br>
Vikings - min<br>
Patriots - ne<br>
Saints - no<br>
Giants - nyg<br>
Jets - nyj<br>
Eagles - phi<br>
Steelers - pit<br>
49ers - sf<br>
Seahawks - sea<br>
Buccaneers - tb<br>
Titans - ten<br>
Commanders - wsh
'''

In [49]:
abbrev = big_string.replace('<br>', '').split()[2::3]
abbrev

['ari',
 'atl',
 'bal',
 'buf',
 'car',
 'chi',
 'cin',
 'cle',
 'dal',
 'den',
 'det',
 'gb',
 'hou',
 'ind',
 'jax',
 'kc',
 'lv',
 'lac',
 'lar',
 'mia',
 'min',
 'ne',
 'no',
 'nyg',
 'nyj',
 'phi',
 'pit',
 'sf',
 'sea',
 'tb',
 'ten',
 'wsh']

In [50]:
# def get_season_df(team,season):
#         template = 'https://www.espn.com/nfl/team/stats/_/type/team/name/nyj/season/2022/seasontype/2'
#         url = template.format(team,season)
#         df_list = pd.read_html(url)
#         df_list[1].index=list(str(x[0]) for x in df_list[0].values)
#         new_columns = ['Offense--{}-{}'.format(team,season),'Opposing-offense-{}-{}'.format(team,season)]
#         df_list[1].columns = new_columns
#         return df_list[1]

# Repeats Arizona 2004 in every single row: break up string




# https://www.espn.com/nfl/team/stats/_/type/team/name/nyj

In [51]:
def get_season_df(team,season):
        template1 = 'https://www.espn.com/nfl/team/stats/_/type/team/name/'
        url = template1 + team
        print(url)
        df_list = pd.read_html(url)
#         curr_season = pull current year from datatime.year()
        
        df_list[1].index=list(str(x[0]) for x in df_list[0].values)
        new_columns = ['Offense--{}-2023'.format(team),'Opposing-offense-{}-2023'.format(team)]
        df_list[1].columns = new_columns
        return df_list[1]

In [52]:
year = 2023

In [53]:
all_df = []
for team in abbrev:
        new_df = get_season_df(team, year)
        all_df.append(new_df)

https://www.espn.com/nfl/team/stats/_/type/team/name/ari
https://www.espn.com/nfl/team/stats/_/type/team/name/atl
https://www.espn.com/nfl/team/stats/_/type/team/name/bal
https://www.espn.com/nfl/team/stats/_/type/team/name/buf
https://www.espn.com/nfl/team/stats/_/type/team/name/car
https://www.espn.com/nfl/team/stats/_/type/team/name/chi
https://www.espn.com/nfl/team/stats/_/type/team/name/cin
https://www.espn.com/nfl/team/stats/_/type/team/name/cle
https://www.espn.com/nfl/team/stats/_/type/team/name/dal
https://www.espn.com/nfl/team/stats/_/type/team/name/den
https://www.espn.com/nfl/team/stats/_/type/team/name/det
https://www.espn.com/nfl/team/stats/_/type/team/name/gb
https://www.espn.com/nfl/team/stats/_/type/team/name/hou
https://www.espn.com/nfl/team/stats/_/type/team/name/ind
https://www.espn.com/nfl/team/stats/_/type/team/name/jax
https://www.espn.com/nfl/team/stats/_/type/team/name/kc
https://www.espn.com/nfl/team/stats/_/type/team/name/lv
https://www.espn.com/nfl/team/stat

In [54]:
current_season_df = pd.concat(all_df, axis=1).T
current_season_df.head()

Unnamed: 0,Total Points Per Game,Total Points,Total Touchdowns,1st Downs,Total 1st downs,Rushing 1st downs,Passing 1st downs,1st downs by penalty,3rd down efficiency,3rd down %,...,FG: Good-Attempts,Touchback Percentage,Penalties,Total-Yards,Avg. Per Game (YDS),Time of Possession,Possession Time Seconds,Miscellaneous,Fumbles-Lost,Turnover Ratio
Offense--ari-2023,17.2,206,22,,218,76,118,24,54-153,35.29,...,19-22,84,,79-720,60.0,,27:46,,12-6,-1
Opposing-offense-ari-2023,26.8,321,38,,270,101,146,23,71-151,47.02,...,19-23,82,,73-599,49.917,,32:13,,7-5,161
Offense--atl-2023,19.4,213,21,,224,90,119,15,62-148,41.89,...,22-23,85,,59-528,48.0,,30:17,,11-9,-6
Opposing-offense-atl-2023,21.1,232,22,,204,61,120,23,50-142,35.21,...,26-27,81,,73-600,54.545,,29:42,,12-6,140
Offense--bal-2023,27.0,324,37,,251,108,116,27,66-152,43.42,...,21-26,84,,73-684,57.0,,31:49,,19-9,5


## Data Preprocessing and Cleaning

In [55]:
current_season_df.shape
# MAKE SURE YOU ARE COMPLETELY DONE SCRAPING BEFORE CONCATENATING
# YOU WILL NOT GET AN ERROR MESSAGE, YOU WILL JUST BE MISSING ROWS. MAKE SURE SHAPE IS 1216 X 50

(64, 50)

In [56]:
pd.set_option('display.max_columns',None)
# so that we can view all columns

In [57]:
current_season_df.tail()
# not organized by year, so can't just grab the last 'x' rows

Unnamed: 0,Total Points Per Game,Total Points,Total Touchdowns,1st Downs,Total 1st downs,Rushing 1st downs,Passing 1st downs,1st downs by penalty,3rd down efficiency,3rd down %,4th down efficiency,4th down %,Passing,Comp-Att,Net Passing Yards,Yards Per Pass Attempt,Net Passing Yards Per Game,Passing Touchdowns,Interceptions,Sacks-Yards Lost,Rushing,Rushing Attempts,Rushing Yards,Yards Per Rush Attempt,Rushing Yards Per Game,Rushing Touchdowns,Offense,Total Offensive Plays,Total Yards,Yards Per Game,Returns,Kickoffs: Total,Average Kickoff Return Yards,Punt: Total,Average Punt Return Yards,INT: Total,Average Interception Yards,Kicking,Net Average Punt Yards,Punt: Total Yards,FG: Good-Attempts,Touchback Percentage,Penalties,Total-Yards,Avg. Per Game (YDS),Time of Possession,Possession Time Seconds,Miscellaneous,Fumbles-Lost,Turnover Ratio
Opposing-offense-tb-2023,20.6,227,23,,227,66,146,15,65-146,44.52,7-14,50.0,,270-405,2946,7.8,267.8,16,9,31-206,,268,1055,3.9,95.9,7,,704,4207,382.5,,15-277,18.5,23-280,9.2,7-31,4.4,,44.3,"38-1,831",22-25,70,,70-588,53.455,,30:47,,13-8,157
Offense--ten-2023,16.8,185,17,,184,59,95,30,42-132,31.82,7-12,58.33,,194-317,1994,7.1,181.3,9,8,36-252,,265,1130,4.3,102.7,8,,618,3376,306.9,,13-264,20.3,27-214,11.1,3-38,12.7,,48.4,"49-2,594",22-23,53,,69-558,50.0,,29:04,,8-4,-4
Opposing-offense-ten-2023,20.4,224,20,,213,68,120,25,60-151,39.74,10-15,66.67,,237-349,2470,7.7,224.5,12,3,29-210,,323,1219,3.8,110.8,8,,701,3899,354.5,,19-415,21.8,20-221,7.9,8-67,8.4,,41.6,"46-2,128",28-28,67,,66-526,47.818,,31:22,,11-5,175
Offense--wsh-2023,20.5,246,27,,254,75,154,25,60-157,38.22,8-17,47.06,,323-486,2964,6.9,247.0,18,13,55-375,,257,1161,4.5,96.8,9,,798,4500,375.0,,17-455,26.8,28-207,8.5,6-26,4.3,,42.6,"52-2,444",17-21,96,,64-495,41.0,,31:41,,16-9,-9
Opposing-offense-wsh-2023,29.2,350,39,,236,71,149,16,64-162,39.51,9-15,60.0,,269-423,3175,7.9,264.6,28,6,35-155,,305,1357,4.4,113.1,7,,763,4687,390.6,,0-0,0.0,27-230,7.4,13-198,15.2,,44.3,"48-2,334",26-27,76,,73-586,48.833,,28:48,,12-7,132


In [58]:
current_season_df.columns

Index(['Total Points Per Game', 'Total Points', 'Total Touchdowns',
       '1st Downs', 'Total 1st downs', 'Rushing 1st downs',
       'Passing 1st downs', '1st downs by penalty', '3rd down efficiency',
       '3rd down %', '4th down efficiency', '4th down %', 'Passing',
       'Comp-Att', 'Net Passing Yards', 'Yards Per Pass Attempt',
       'Net Passing Yards Per Game', 'Passing Touchdowns', 'Interceptions',
       'Sacks-Yards Lost', 'Rushing', 'Rushing Attempts', 'Rushing Yards',
       'Yards Per Rush Attempt', 'Rushing Yards Per Game',
       'Rushing Touchdowns', 'Offense', 'Total Offensive Plays', 'Total Yards',
       'Yards Per Game', 'Returns', 'Kickoffs: Total',
       'Average Kickoff Return Yards', 'Punt: Total',
       'Average Punt Return Yards', 'INT: Total', 'Average Interception Yards',
       'Kicking', 'Net Average Punt Yards', 'Punt: Total Yards',
       'FG: Good-Attempts', 'Touchback Percentage', 'Penalties', 'Total-Yards',
       'Avg. Per Game (YDS)', 'Time of

In [59]:
#setting index as a column instead for easy manipulation 
current_season_df = current_season_df.reset_index()
current_season_df.rename(columns={'index':'SeasonID'}, inplace=True) #renaming new column 

In [60]:
current_season_df

Unnamed: 0,SeasonID,Total Points Per Game,Total Points,Total Touchdowns,1st Downs,Total 1st downs,Rushing 1st downs,Passing 1st downs,1st downs by penalty,3rd down efficiency,3rd down %,4th down efficiency,4th down %,Passing,Comp-Att,Net Passing Yards,Yards Per Pass Attempt,Net Passing Yards Per Game,Passing Touchdowns,Interceptions,Sacks-Yards Lost,Rushing,Rushing Attempts,Rushing Yards,Yards Per Rush Attempt,Rushing Yards Per Game,Rushing Touchdowns,Offense,Total Offensive Plays,Total Yards,Yards Per Game,Returns,Kickoffs: Total,Average Kickoff Return Yards,Punt: Total,Average Punt Return Yards,INT: Total,Average Interception Yards,Kicking,Net Average Punt Yards,Punt: Total Yards,FG: Good-Attempts,Touchback Percentage,Penalties,Total-Yards,Avg. Per Game (YDS),Time of Possession,Possession Time Seconds,Miscellaneous,Fumbles-Lost,Turnover Ratio
0,Offense--ari-2023,17.2,206,22,,218,76,118,24,54-153,35.29,8-25,32.00,,245-394,2109,6.0,175.8,10,9,33-241,,305,1461,4.8,121.8,11,,732,3811,317.6,,10-180,18.0,21-207,10.2,9-98,10.9,,42.0,"48-2,343",19-22,84,,79-720,60,,27:46,,12-6,-1
1,Opposing-offense-ari-2023,26.8,321,38,,270,101,146,23,71-151,47.02,6-13,46.15,,261-372,2621,7.7,218.4,21,9,32-233,,377,1681,4.5,140.1,16,,781,4535,377.9,,3-63,21.0,32-325,9.9,9-41,4.6,,43.2,"42-2,020",19-23,82,,73-599,49.917,,32:13,,7-5,161
2,Offense--atl-2023,19.4,213,21,,224,90,119,15,62-148,41.89,6-14,42.86,,216-343,2217,7.1,201.5,10,9,30-211,,352,1532,4.4,139.3,10,,725,3960,360.0,,11-193,17.5,16-88,12.3,6-124,20.7,,42.4,"46-2,196",22-23,85,,59-528,48,,30:17,,11-9,-6
3,Opposing-offense-atl-2023,21.1,232,22,,204,61,120,23,50-142,35.21,5-10,50.00,,227-361,2300,6.8,209.1,17,6,22-149,,304,1232,4.1,112.0,4,,687,3681,334.6,,7-153,21.9,20-245,5.5,9-139,15.4,,45.9,"48-2,292",26-27,81,,73-600,54.545,,29:42,,12-6,140
4,Offense--bal-2023,27.0,324,37,,251,108,116,27,66-152,43.42,4-10,40.00,,233-342,2490,7.8,207.5,14,5,29-166,,390,1903,4.9,158.6,22,,761,4559,379.9,,8-153,19.1,22-276,13.8,11-171,15.5,,40.8,"47-2,260",21-26,84,,73-684,57,,31:49,,19-9,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59,Opposing-offense-tb-2023,20.6,227,23,,227,66,146,15,65-146,44.52,7-14,50.00,,270-405,2946,7.8,267.8,16,9,31-206,,268,1055,3.9,95.9,7,,704,4207,382.5,,15-277,18.5,23-280,9.2,7-31,4.4,,44.3,"38-1,831",22-25,70,,70-588,53.455,,30:47,,13-8,157
60,Offense--ten-2023,16.8,185,17,,184,59,95,30,42-132,31.82,7-12,58.33,,194-317,1994,7.1,181.3,9,8,36-252,,265,1130,4.3,102.7,8,,618,3376,306.9,,13-264,20.3,27-214,11.1,3-38,12.7,,48.4,"49-2,594",22-23,53,,69-558,50,,29:04,,8-4,-4
61,Opposing-offense-ten-2023,20.4,224,20,,213,68,120,25,60-151,39.74,10-15,66.67,,237-349,2470,7.7,224.5,12,3,29-210,,323,1219,3.8,110.8,8,,701,3899,354.5,,19-415,21.8,20-221,7.9,8-67,8.4,,41.6,"46-2,128",28-28,67,,66-526,47.818,,31:22,,11-5,175
62,Offense--wsh-2023,20.5,246,27,,254,75,154,25,60-157,38.22,8-17,47.06,,323-486,2964,6.9,247.0,18,13,55-375,,257,1161,4.5,96.8,9,,798,4500,375.0,,17-455,26.8,28-207,8.5,6-26,4.3,,42.6,"52-2,444",17-21,96,,64-495,41,,31:41,,16-9,-9


In [61]:
current_season_df['Games'] = 17
# Set all current season to 17 games

In [62]:
# Currently, each team has 2 rows per season, one displaying offense and one displaying opposing offense (defense)
# We will be combining these into one row per team.

In [63]:
# New dataframe for just the offense
offDF = current_season_df.loc[current_season_df['SeasonID'].str.contains('Offense--'),:]

In [64]:
# New dataframe for just the defense
defDF = current_season_df.loc[current_season_df['SeasonID'].str.contains('Opposing-'),:]

In [65]:
offDF.reset_index(inplace=True)
offDF.drop(columns='index')

Unnamed: 0,SeasonID,Total Points Per Game,Total Points,Total Touchdowns,1st Downs,Total 1st downs,Rushing 1st downs,Passing 1st downs,1st downs by penalty,3rd down efficiency,3rd down %,4th down efficiency,4th down %,Passing,Comp-Att,Net Passing Yards,Yards Per Pass Attempt,Net Passing Yards Per Game,Passing Touchdowns,Interceptions,Sacks-Yards Lost,Rushing,Rushing Attempts,Rushing Yards,Yards Per Rush Attempt,Rushing Yards Per Game,Rushing Touchdowns,Offense,Total Offensive Plays,Total Yards,Yards Per Game,Returns,Kickoffs: Total,Average Kickoff Return Yards,Punt: Total,Average Punt Return Yards,INT: Total,Average Interception Yards,Kicking,Net Average Punt Yards,Punt: Total Yards,FG: Good-Attempts,Touchback Percentage,Penalties,Total-Yards,Avg. Per Game (YDS),Time of Possession,Possession Time Seconds,Miscellaneous,Fumbles-Lost,Turnover Ratio,Games
0,Offense--ari-2023,17.2,206,22,,218,76,118,24,54-153,35.29,8-25,32.0,,245-394,2109,6.0,175.8,10,9,33-241,,305,1461,4.8,121.8,11,,732,3811,317.6,,10-180,18.0,21-207,10.2,9-98,10.9,,42.0,"48-2,343",19-22,84,,79-720,60,,27:46,,12-6,-1,17
1,Offense--atl-2023,19.4,213,21,,224,90,119,15,62-148,41.89,6-14,42.86,,216-343,2217,7.1,201.5,10,9,30-211,,352,1532,4.4,139.3,10,,725,3960,360.0,,11-193,17.5,16-88,12.3,6-124,20.7,,42.4,"46-2,196",22-23,85,,59-528,48,,30:17,,11-9,-6,17
2,Offense--bal-2023,27.0,324,37,,251,108,116,27,66-152,43.42,4-10,40.0,,233-342,2490,7.8,207.5,14,5,29-166,,390,1903,4.9,158.6,22,,761,4559,379.9,,8-153,19.1,22-276,13.8,11-171,15.5,,40.8,"47-2,260",21-26,84,,73-684,57,,31:49,,19-9,5,17
3,Offense--buf-2023,27.3,328,39,,271,104,149,18,77-155,49.68,7-12,58.33,,295-433,3131,7.4,260.9,24,13,15-83,,332,1468,4.4,122.3,14,,780,4682,390.2,,11-239,21.7,21-186,12.4,11-84,7.6,,39.4,"34-1,514",18-23,57,,84-696,58,,31:20,,12-7,1,17
4,Offense--car-2023,15.7,173,17,,205,67,111,27,60-161,37.27,15-26,57.69,,251-409,1906,5.5,173.3,11,8,43-339,,267,1019,3.8,92.6,3,,719,3264,296.7,,15-447,29.8,25-231,6.7,5-202,40.4,,44.7,"52-2,456",18-21,59,,74-590,53,,30:54,,13-6,-7,17
5,Offense--chi-2023,20.2,242,25,,233,98,125,10,71-164,43.29,10-19,52.63,,237-365,2227,6.7,185.6,15,12,35-219,,376,1652,4.4,137.7,9,,776,4098,341.5,,14-299,21.4,11-79,14.7,13-117,9.0,,38.3,"42-1,919",23-25,80,,80-699,58,,31:43,,18-9,-4,17
6,Offense--cin-2023,19.3,212,23,,202,52,129,21,48-139,34.53,5-11,45.46,,271-406,2375,6.4,215.9,17,7,31-229,,220,834,3.8,75.8,4,,657,3438,312.5,,13-302,23.2,21-272,6.9,12-110,9.2,,41.9,"54-2,401",17-21,78,,55-463,42,,29:33,,7-2,10,17
7,Offense--cle-2023,21.7,239,22,,216,88,103,25,52-166,31.33,9-18,50.0,,217-391,1993,5.7,181.2,9,13,33-225,,367,1534,4.2,139.5,11,,791,3752,341.1,,7-162,23.1,30-247,10.7,9-78,8.7,,45.4,"55-2,773",28-31,64,,71-572,52,,33:18,,20-10,-7,17
8,Offense--dal-2023,32.3,388,44,,280,82,166,32,80-165,48.49,8-17,47.06,,303-431,3161,7.8,263.4,26,7,27-187,,341,1404,4.1,117.0,11,,799,4752,396.0,,13-290,22.3,16-92,8.1,13-249,19.2,,46.8,"31-1,580",26-26,94,,90-764,63,,32:03,,9-3,8,17
9,Offense--den-2023,22.4,246,25,,194,69,99,26,53-137,38.69,4-9,44.44,,218-319,2035,6.9,185.0,20,4,33-164,,286,1271,4.4,115.5,3,,638,3470,315.5,,7-235,33.6,10-188,6.5,10-81,8.1,,42.3,"44-1,999",24-26,91,,76-566,51,,28:55,,14-10,8,17


In [66]:
offNames = list('OFF_{}'.format(col) for col in offDF.columns)
# renaming all column headers in this dataframe to begin with off_. This is important for distinguishing between columns
# of the same name when we re-concatenate.

In [67]:
offDF.columns = offNames

In [68]:
offDF.head()

Unnamed: 0,OFF_index,OFF_SeasonID,OFF_Total Points Per Game,OFF_Total Points,OFF_Total Touchdowns,OFF_1st Downs,OFF_Total 1st downs,OFF_Rushing 1st downs,OFF_Passing 1st downs,OFF_1st downs by penalty,OFF_3rd down efficiency,OFF_3rd down %,OFF_4th down efficiency,OFF_4th down %,OFF_Passing,OFF_Comp-Att,OFF_Net Passing Yards,OFF_Yards Per Pass Attempt,OFF_Net Passing Yards Per Game,OFF_Passing Touchdowns,OFF_Interceptions,OFF_Sacks-Yards Lost,OFF_Rushing,OFF_Rushing Attempts,OFF_Rushing Yards,OFF_Yards Per Rush Attempt,OFF_Rushing Yards Per Game,OFF_Rushing Touchdowns,OFF_Offense,OFF_Total Offensive Plays,OFF_Total Yards,OFF_Yards Per Game,OFF_Returns,OFF_Kickoffs: Total,OFF_Average Kickoff Return Yards,OFF_Punt: Total,OFF_Average Punt Return Yards,OFF_INT: Total,OFF_Average Interception Yards,OFF_Kicking,OFF_Net Average Punt Yards,OFF_Punt: Total Yards,OFF_FG: Good-Attempts,OFF_Touchback Percentage,OFF_Penalties,OFF_Total-Yards,OFF_Avg. Per Game (YDS),OFF_Time of Possession,OFF_Possession Time Seconds,OFF_Miscellaneous,OFF_Fumbles-Lost,OFF_Turnover Ratio,OFF_Games
0,0,Offense--ari-2023,17.2,206,22,,218,76,118,24,54-153,35.29,8-25,32.0,,245-394,2109,6.0,175.8,10,9,33-241,,305,1461,4.8,121.8,11,,732,3811,317.6,,10-180,18.0,21-207,10.2,9-98,10.9,,42.0,"48-2,343",19-22,84,,79-720,60,,27:46,,12-6,-1,17
1,2,Offense--atl-2023,19.4,213,21,,224,90,119,15,62-148,41.89,6-14,42.86,,216-343,2217,7.1,201.5,10,9,30-211,,352,1532,4.4,139.3,10,,725,3960,360.0,,11-193,17.5,16-88,12.3,6-124,20.7,,42.4,"46-2,196",22-23,85,,59-528,48,,30:17,,11-9,-6,17
2,4,Offense--bal-2023,27.0,324,37,,251,108,116,27,66-152,43.42,4-10,40.0,,233-342,2490,7.8,207.5,14,5,29-166,,390,1903,4.9,158.6,22,,761,4559,379.9,,8-153,19.1,22-276,13.8,11-171,15.5,,40.8,"47-2,260",21-26,84,,73-684,57,,31:49,,19-9,5,17
3,6,Offense--buf-2023,27.3,328,39,,271,104,149,18,77-155,49.68,7-12,58.33,,295-433,3131,7.4,260.9,24,13,15-83,,332,1468,4.4,122.3,14,,780,4682,390.2,,11-239,21.7,21-186,12.4,11-84,7.6,,39.4,"34-1,514",18-23,57,,84-696,58,,31:20,,12-7,1,17
4,8,Offense--car-2023,15.7,173,17,,205,67,111,27,60-161,37.27,15-26,57.69,,251-409,1906,5.5,173.3,11,8,43-339,,267,1019,3.8,92.6,3,,719,3264,296.7,,15-447,29.8,25-231,6.7,5-202,40.4,,44.7,"52-2,456",18-21,59,,74-590,53,,30:54,,13-6,-7,17


In [69]:
defDF.reset_index(inplace=True)
defDF.drop(columns='index')
# do it again for the defense

Unnamed: 0,SeasonID,Total Points Per Game,Total Points,Total Touchdowns,1st Downs,Total 1st downs,Rushing 1st downs,Passing 1st downs,1st downs by penalty,3rd down efficiency,3rd down %,4th down efficiency,4th down %,Passing,Comp-Att,Net Passing Yards,Yards Per Pass Attempt,Net Passing Yards Per Game,Passing Touchdowns,Interceptions,Sacks-Yards Lost,Rushing,Rushing Attempts,Rushing Yards,Yards Per Rush Attempt,Rushing Yards Per Game,Rushing Touchdowns,Offense,Total Offensive Plays,Total Yards,Yards Per Game,Returns,Kickoffs: Total,Average Kickoff Return Yards,Punt: Total,Average Punt Return Yards,INT: Total,Average Interception Yards,Kicking,Net Average Punt Yards,Punt: Total Yards,FG: Good-Attempts,Touchback Percentage,Penalties,Total-Yards,Avg. Per Game (YDS),Time of Possession,Possession Time Seconds,Miscellaneous,Fumbles-Lost,Turnover Ratio,Games
0,Opposing-offense-ari-2023,26.8,321,38,,270,101,146,23,71-151,47.02,6-13,46.15,,261-372,2621,7.7,218.4,21,9,32-233,,377,1681,4.5,140.1,16,,781,4535,377.9,,3-63,21.0,32-325,9.9,9-41,4.6,,43.2,"42-2,020",19-23,82,,73-599,49.917,,32:13,,7-5,161,17
1,Opposing-offense-atl-2023,21.1,232,22,,204,61,120,23,50-142,35.21,5-10,50.0,,227-361,2300,6.8,209.1,17,6,22-149,,304,1232,4.1,112.0,4,,687,3681,334.6,,7-153,21.9,20-245,5.5,9-139,15.4,,45.9,"48-2,292",26-27,81,,73-600,54.545,,29:42,,12-6,140,17
2,Opposing-offense-bal-2023,15.6,187,16,,209,66,117,26,64-177,36.16,8-22,36.36,,270-446,2060,5.5,171.7,10,11,47-374,,286,1227,4.3,102.3,4,,779,3661,305.1,,10-220,22.0,22-304,12.5,5-54,10.8,,41.6,"58-2,687",25-26,67,,76-596,49.667,,28:54,,14-8,133,17
3,Opposing-offense-buf-2023,18.9,227,25,,228,65,136,27,60-156,38.46,9-16,56.25,,261-389,2439,6.9,203.3,16,11,41-235,,300,1400,4.7,116.7,8,,730,4074,339.5,,23-470,20.4,14-173,8.9,13-71,5.5,,45.2,"49-2,401",19-20,72,,75-582,48.5,,29:20,,19-10,161,17
4,Opposing-offense-car-2023,26.5,292,35,,215,84,112,19,48-137,35.04,6-8,75.0,,211-322,1976,6.5,179.6,13,5,18-126,,326,1374,4.2,124.9,18,,666,3476,316.0,,13-322,24.8,20-134,9.2,8-165,20.6,,44.5,"51-2,498",16-16,75,,67-491,44.636,,29:05,,5-2,151,17
5,Opposing-offense-chi-2023,24.7,296,34,,225,58,147,20,70-152,46.05,8-15,53.33,,297-434,2874,6.9,239.5,23,13,17-105,,275,948,3.4,79.0,6,,726,3927,327.3,,9-226,25.1,21-309,7.2,12-131,10.9,,44.3,"40-1,849",19-25,74,,51-395,32.917,,28:16,,13-4,145,17
6,Opposing-offense-cin-2023,22.0,242,25,,241,88,133,20,61-138,44.2,2-8,25.0,,230-361,2746,8.1,249.6,13,12,28-169,,307,1536,5.0,139.6,12,,696,4451,404.6,,7-120,17.1,20-138,13.0,7-57,8.1,,40.7,"42-1,982",22-23,75,,60-552,50.182,,30:26,,12-7,135,17
7,Opposing-offense-cle-2023,19.0,209,25,,142,55,71,16,38-140,27.14,4-7,57.14,,159-286,1562,6.3,142.0,10,9,34-237,,281,1165,4.1,105.9,12,,601,2964,269.5,,15-272,18.1,26-277,8.2,13-216,16.6,,44.4,"73-3,487",11-16,83,,77-659,59.909,,26:41,,13-7,140,17
8,Opposing-offense-dal-2023,18.3,220,27,,211,68,112,31,58-157,36.94,13-27,48.15,,218-366,2173,6.6,181.1,17,13,38-237,,313,1272,4.1,106.0,10,,717,3682,306.8,,3-93,31.0,16-130,5.8,7-99,14.1,,44.3,"48-2,263",11-15,72,,78-717,59.75,,27:56,,13-5,139,17
9,Opposing-offense-den-2023,25.5,280,31,,238,84,129,25,46-127,36.22,12-24,50.0,,250-365,2563,7.5,233.0,19,10,23-174,,316,1707,5.4,155.2,11,,704,4444,404.0,,3-53,17.7,21-136,18.8,4-20,5.0,,42.8,"26-1,302",20-23,85,,81-747,67.909,,31:04,,25-12,128,17


In [70]:
defNames = list('DEF_{}'.format(col) for col in defDF.columns)

In [71]:
defDF.columns = defNames

In [72]:
defDF.head()

Unnamed: 0,DEF_index,DEF_SeasonID,DEF_Total Points Per Game,DEF_Total Points,DEF_Total Touchdowns,DEF_1st Downs,DEF_Total 1st downs,DEF_Rushing 1st downs,DEF_Passing 1st downs,DEF_1st downs by penalty,DEF_3rd down efficiency,DEF_3rd down %,DEF_4th down efficiency,DEF_4th down %,DEF_Passing,DEF_Comp-Att,DEF_Net Passing Yards,DEF_Yards Per Pass Attempt,DEF_Net Passing Yards Per Game,DEF_Passing Touchdowns,DEF_Interceptions,DEF_Sacks-Yards Lost,DEF_Rushing,DEF_Rushing Attempts,DEF_Rushing Yards,DEF_Yards Per Rush Attempt,DEF_Rushing Yards Per Game,DEF_Rushing Touchdowns,DEF_Offense,DEF_Total Offensive Plays,DEF_Total Yards,DEF_Yards Per Game,DEF_Returns,DEF_Kickoffs: Total,DEF_Average Kickoff Return Yards,DEF_Punt: Total,DEF_Average Punt Return Yards,DEF_INT: Total,DEF_Average Interception Yards,DEF_Kicking,DEF_Net Average Punt Yards,DEF_Punt: Total Yards,DEF_FG: Good-Attempts,DEF_Touchback Percentage,DEF_Penalties,DEF_Total-Yards,DEF_Avg. Per Game (YDS),DEF_Time of Possession,DEF_Possession Time Seconds,DEF_Miscellaneous,DEF_Fumbles-Lost,DEF_Turnover Ratio,DEF_Games
0,1,Opposing-offense-ari-2023,26.8,321,38,,270,101,146,23,71-151,47.02,6-13,46.15,,261-372,2621,7.7,218.4,21,9,32-233,,377,1681,4.5,140.1,16,,781,4535,377.9,,3-63,21.0,32-325,9.9,9-41,4.6,,43.2,"42-2,020",19-23,82,,73-599,49.917,,32:13,,7-5,161,17
1,3,Opposing-offense-atl-2023,21.1,232,22,,204,61,120,23,50-142,35.21,5-10,50.0,,227-361,2300,6.8,209.1,17,6,22-149,,304,1232,4.1,112.0,4,,687,3681,334.6,,7-153,21.9,20-245,5.5,9-139,15.4,,45.9,"48-2,292",26-27,81,,73-600,54.545,,29:42,,12-6,140,17
2,5,Opposing-offense-bal-2023,15.6,187,16,,209,66,117,26,64-177,36.16,8-22,36.36,,270-446,2060,5.5,171.7,10,11,47-374,,286,1227,4.3,102.3,4,,779,3661,305.1,,10-220,22.0,22-304,12.5,5-54,10.8,,41.6,"58-2,687",25-26,67,,76-596,49.667,,28:54,,14-8,133,17
3,7,Opposing-offense-buf-2023,18.9,227,25,,228,65,136,27,60-156,38.46,9-16,56.25,,261-389,2439,6.9,203.3,16,11,41-235,,300,1400,4.7,116.7,8,,730,4074,339.5,,23-470,20.4,14-173,8.9,13-71,5.5,,45.2,"49-2,401",19-20,72,,75-582,48.5,,29:20,,19-10,161,17
4,9,Opposing-offense-car-2023,26.5,292,35,,215,84,112,19,48-137,35.04,6-8,75.0,,211-322,1976,6.5,179.6,13,5,18-126,,326,1374,4.2,124.9,18,,666,3476,316.0,,13-322,24.8,20-134,9.2,8-165,20.6,,44.5,"51-2,498",16-16,75,,67-491,44.636,,29:05,,5-2,151,17


In [73]:
defDF.shape

(32, 53)

In [74]:
offDF.shape

(32, 53)

In [75]:
final_df = pd.merge(offDF, defDF, left_index=True, right_index=True)

In [76]:
final_df.head()

Unnamed: 0,OFF_index,OFF_SeasonID,OFF_Total Points Per Game,OFF_Total Points,OFF_Total Touchdowns,OFF_1st Downs,OFF_Total 1st downs,OFF_Rushing 1st downs,OFF_Passing 1st downs,OFF_1st downs by penalty,OFF_3rd down efficiency,OFF_3rd down %,OFF_4th down efficiency,OFF_4th down %,OFF_Passing,OFF_Comp-Att,OFF_Net Passing Yards,OFF_Yards Per Pass Attempt,OFF_Net Passing Yards Per Game,OFF_Passing Touchdowns,OFF_Interceptions,OFF_Sacks-Yards Lost,OFF_Rushing,OFF_Rushing Attempts,OFF_Rushing Yards,OFF_Yards Per Rush Attempt,OFF_Rushing Yards Per Game,OFF_Rushing Touchdowns,OFF_Offense,OFF_Total Offensive Plays,OFF_Total Yards,OFF_Yards Per Game,OFF_Returns,OFF_Kickoffs: Total,OFF_Average Kickoff Return Yards,OFF_Punt: Total,OFF_Average Punt Return Yards,OFF_INT: Total,OFF_Average Interception Yards,OFF_Kicking,OFF_Net Average Punt Yards,OFF_Punt: Total Yards,OFF_FG: Good-Attempts,OFF_Touchback Percentage,OFF_Penalties,OFF_Total-Yards,OFF_Avg. Per Game (YDS),OFF_Time of Possession,OFF_Possession Time Seconds,OFF_Miscellaneous,OFF_Fumbles-Lost,OFF_Turnover Ratio,OFF_Games,DEF_index,DEF_SeasonID,DEF_Total Points Per Game,DEF_Total Points,DEF_Total Touchdowns,DEF_1st Downs,DEF_Total 1st downs,DEF_Rushing 1st downs,DEF_Passing 1st downs,DEF_1st downs by penalty,DEF_3rd down efficiency,DEF_3rd down %,DEF_4th down efficiency,DEF_4th down %,DEF_Passing,DEF_Comp-Att,DEF_Net Passing Yards,DEF_Yards Per Pass Attempt,DEF_Net Passing Yards Per Game,DEF_Passing Touchdowns,DEF_Interceptions,DEF_Sacks-Yards Lost,DEF_Rushing,DEF_Rushing Attempts,DEF_Rushing Yards,DEF_Yards Per Rush Attempt,DEF_Rushing Yards Per Game,DEF_Rushing Touchdowns,DEF_Offense,DEF_Total Offensive Plays,DEF_Total Yards,DEF_Yards Per Game,DEF_Returns,DEF_Kickoffs: Total,DEF_Average Kickoff Return Yards,DEF_Punt: Total,DEF_Average Punt Return Yards,DEF_INT: Total,DEF_Average Interception Yards,DEF_Kicking,DEF_Net Average Punt Yards,DEF_Punt: Total Yards,DEF_FG: Good-Attempts,DEF_Touchback Percentage,DEF_Penalties,DEF_Total-Yards,DEF_Avg. Per Game (YDS),DEF_Time of Possession,DEF_Possession Time Seconds,DEF_Miscellaneous,DEF_Fumbles-Lost,DEF_Turnover Ratio,DEF_Games
0,0,Offense--ari-2023,17.2,206,22,,218,76,118,24,54-153,35.29,8-25,32.0,,245-394,2109,6.0,175.8,10,9,33-241,,305,1461,4.8,121.8,11,,732,3811,317.6,,10-180,18.0,21-207,10.2,9-98,10.9,,42.0,"48-2,343",19-22,84,,79-720,60,,27:46,,12-6,-1,17,1,Opposing-offense-ari-2023,26.8,321,38,,270,101,146,23,71-151,47.02,6-13,46.15,,261-372,2621,7.7,218.4,21,9,32-233,,377,1681,4.5,140.1,16,,781,4535,377.9,,3-63,21.0,32-325,9.9,9-41,4.6,,43.2,"42-2,020",19-23,82,,73-599,49.917,,32:13,,7-5,161,17
1,2,Offense--atl-2023,19.4,213,21,,224,90,119,15,62-148,41.89,6-14,42.86,,216-343,2217,7.1,201.5,10,9,30-211,,352,1532,4.4,139.3,10,,725,3960,360.0,,11-193,17.5,16-88,12.3,6-124,20.7,,42.4,"46-2,196",22-23,85,,59-528,48,,30:17,,11-9,-6,17,3,Opposing-offense-atl-2023,21.1,232,22,,204,61,120,23,50-142,35.21,5-10,50.0,,227-361,2300,6.8,209.1,17,6,22-149,,304,1232,4.1,112.0,4,,687,3681,334.6,,7-153,21.9,20-245,5.5,9-139,15.4,,45.9,"48-2,292",26-27,81,,73-600,54.545,,29:42,,12-6,140,17
2,4,Offense--bal-2023,27.0,324,37,,251,108,116,27,66-152,43.42,4-10,40.0,,233-342,2490,7.8,207.5,14,5,29-166,,390,1903,4.9,158.6,22,,761,4559,379.9,,8-153,19.1,22-276,13.8,11-171,15.5,,40.8,"47-2,260",21-26,84,,73-684,57,,31:49,,19-9,5,17,5,Opposing-offense-bal-2023,15.6,187,16,,209,66,117,26,64-177,36.16,8-22,36.36,,270-446,2060,5.5,171.7,10,11,47-374,,286,1227,4.3,102.3,4,,779,3661,305.1,,10-220,22.0,22-304,12.5,5-54,10.8,,41.6,"58-2,687",25-26,67,,76-596,49.667,,28:54,,14-8,133,17
3,6,Offense--buf-2023,27.3,328,39,,271,104,149,18,77-155,49.68,7-12,58.33,,295-433,3131,7.4,260.9,24,13,15-83,,332,1468,4.4,122.3,14,,780,4682,390.2,,11-239,21.7,21-186,12.4,11-84,7.6,,39.4,"34-1,514",18-23,57,,84-696,58,,31:20,,12-7,1,17,7,Opposing-offense-buf-2023,18.9,227,25,,228,65,136,27,60-156,38.46,9-16,56.25,,261-389,2439,6.9,203.3,16,11,41-235,,300,1400,4.7,116.7,8,,730,4074,339.5,,23-470,20.4,14-173,8.9,13-71,5.5,,45.2,"49-2,401",19-20,72,,75-582,48.5,,29:20,,19-10,161,17
4,8,Offense--car-2023,15.7,173,17,,205,67,111,27,60-161,37.27,15-26,57.69,,251-409,1906,5.5,173.3,11,8,43-339,,267,1019,3.8,92.6,3,,719,3264,296.7,,15-447,29.8,25-231,6.7,5-202,40.4,,44.7,"52-2,456",18-21,59,,74-590,53,,30:54,,13-6,-7,17,9,Opposing-offense-car-2023,26.5,292,35,,215,84,112,19,48-137,35.04,6-8,75.0,,211-322,1976,6.5,179.6,13,5,18-126,,326,1374,4.2,124.9,18,,666,3476,316.0,,13-322,24.8,20-134,9.2,8-165,20.6,,44.5,"51-2,498",16-16,75,,67-491,44.636,,29:05,,5-2,151,17


In [77]:
final_df.shape

(32, 106)

In [78]:
final_df.duplicated().sum()
# thank god

0

In [79]:
final_df.columns

Index(['OFF_index', 'OFF_SeasonID', 'OFF_Total Points Per Game',
       'OFF_Total Points', 'OFF_Total Touchdowns', 'OFF_1st Downs',
       'OFF_Total 1st downs', 'OFF_Rushing 1st downs', 'OFF_Passing 1st downs',
       'OFF_1st downs by penalty',
       ...
       'DEF_Touchback Percentage', 'DEF_Penalties', 'DEF_Total-Yards',
       'DEF_Avg. Per Game (YDS)', 'DEF_Time of Possession',
       'DEF_Possession Time Seconds', 'DEF_Miscellaneous', 'DEF_Fumbles-Lost',
       'DEF_Turnover Ratio', 'DEF_Games'],
      dtype='object', length=106)

In [80]:
pd.set_option('display.max_rows',None)
final_df.isna().sum()
# this is also good. 1st downs, passing, rushing, offense, returns and all other columns with nulls are headers
# on the original dataset. They describe the columns that follow, and are therefore unnecessary. They can be dropped.

OFF_index                            0
OFF_SeasonID                         0
OFF_Total Points Per Game            0
OFF_Total Points                     0
OFF_Total Touchdowns                 0
OFF_1st Downs                       32
OFF_Total 1st downs                  0
OFF_Rushing 1st downs                0
OFF_Passing 1st downs                0
OFF_1st downs by penalty             0
OFF_3rd down efficiency              0
OFF_3rd down %                       0
OFF_4th down efficiency              0
OFF_4th down %                       0
OFF_Passing                         32
OFF_Comp-Att                         0
OFF_Net Passing Yards                0
OFF_Yards Per Pass Attempt           0
OFF_Net Passing Yards Per Game       0
OFF_Passing Touchdowns               0
OFF_Interceptions                    0
OFF_Sacks-Yards Lost                 0
OFF_Rushing                         32
OFF_Rushing Attempts                 0
OFF_Rushing Yards                    0
OFF_Yards Per Rush Attemp

In [81]:
# We'll start by dropping the header columns (columns used to classify the columns that follow them on ESPN). 
final_df = final_df.drop(columns=['OFF_1st Downs', 'OFF_Passing', 'OFF_Offense', 'OFF_Returns', 'OFF_Kicking', 'OFF_Penalties', 'OFF_Time of Possession', 'OFF_Miscellaneous'])

In [82]:
final_df.head()
# check to make sure it worked

Unnamed: 0,OFF_index,OFF_SeasonID,OFF_Total Points Per Game,OFF_Total Points,OFF_Total Touchdowns,OFF_Total 1st downs,OFF_Rushing 1st downs,OFF_Passing 1st downs,OFF_1st downs by penalty,OFF_3rd down efficiency,OFF_3rd down %,OFF_4th down efficiency,OFF_4th down %,OFF_Comp-Att,OFF_Net Passing Yards,OFF_Yards Per Pass Attempt,OFF_Net Passing Yards Per Game,OFF_Passing Touchdowns,OFF_Interceptions,OFF_Sacks-Yards Lost,OFF_Rushing,OFF_Rushing Attempts,OFF_Rushing Yards,OFF_Yards Per Rush Attempt,OFF_Rushing Yards Per Game,OFF_Rushing Touchdowns,OFF_Total Offensive Plays,OFF_Total Yards,OFF_Yards Per Game,OFF_Kickoffs: Total,OFF_Average Kickoff Return Yards,OFF_Punt: Total,OFF_Average Punt Return Yards,OFF_INT: Total,OFF_Average Interception Yards,OFF_Net Average Punt Yards,OFF_Punt: Total Yards,OFF_FG: Good-Attempts,OFF_Touchback Percentage,OFF_Total-Yards,OFF_Avg. Per Game (YDS),OFF_Possession Time Seconds,OFF_Fumbles-Lost,OFF_Turnover Ratio,OFF_Games,DEF_index,DEF_SeasonID,DEF_Total Points Per Game,DEF_Total Points,DEF_Total Touchdowns,DEF_1st Downs,DEF_Total 1st downs,DEF_Rushing 1st downs,DEF_Passing 1st downs,DEF_1st downs by penalty,DEF_3rd down efficiency,DEF_3rd down %,DEF_4th down efficiency,DEF_4th down %,DEF_Passing,DEF_Comp-Att,DEF_Net Passing Yards,DEF_Yards Per Pass Attempt,DEF_Net Passing Yards Per Game,DEF_Passing Touchdowns,DEF_Interceptions,DEF_Sacks-Yards Lost,DEF_Rushing,DEF_Rushing Attempts,DEF_Rushing Yards,DEF_Yards Per Rush Attempt,DEF_Rushing Yards Per Game,DEF_Rushing Touchdowns,DEF_Offense,DEF_Total Offensive Plays,DEF_Total Yards,DEF_Yards Per Game,DEF_Returns,DEF_Kickoffs: Total,DEF_Average Kickoff Return Yards,DEF_Punt: Total,DEF_Average Punt Return Yards,DEF_INT: Total,DEF_Average Interception Yards,DEF_Kicking,DEF_Net Average Punt Yards,DEF_Punt: Total Yards,DEF_FG: Good-Attempts,DEF_Touchback Percentage,DEF_Penalties,DEF_Total-Yards,DEF_Avg. Per Game (YDS),DEF_Time of Possession,DEF_Possession Time Seconds,DEF_Miscellaneous,DEF_Fumbles-Lost,DEF_Turnover Ratio,DEF_Games
0,0,Offense--ari-2023,17.2,206,22,218,76,118,24,54-153,35.29,8-25,32.0,245-394,2109,6.0,175.8,10,9,33-241,,305,1461,4.8,121.8,11,732,3811,317.6,10-180,18.0,21-207,10.2,9-98,10.9,42.0,"48-2,343",19-22,84,79-720,60,27:46,12-6,-1,17,1,Opposing-offense-ari-2023,26.8,321,38,,270,101,146,23,71-151,47.02,6-13,46.15,,261-372,2621,7.7,218.4,21,9,32-233,,377,1681,4.5,140.1,16,,781,4535,377.9,,3-63,21.0,32-325,9.9,9-41,4.6,,43.2,"42-2,020",19-23,82,,73-599,49.917,,32:13,,7-5,161,17
1,2,Offense--atl-2023,19.4,213,21,224,90,119,15,62-148,41.89,6-14,42.86,216-343,2217,7.1,201.5,10,9,30-211,,352,1532,4.4,139.3,10,725,3960,360.0,11-193,17.5,16-88,12.3,6-124,20.7,42.4,"46-2,196",22-23,85,59-528,48,30:17,11-9,-6,17,3,Opposing-offense-atl-2023,21.1,232,22,,204,61,120,23,50-142,35.21,5-10,50.0,,227-361,2300,6.8,209.1,17,6,22-149,,304,1232,4.1,112.0,4,,687,3681,334.6,,7-153,21.9,20-245,5.5,9-139,15.4,,45.9,"48-2,292",26-27,81,,73-600,54.545,,29:42,,12-6,140,17
2,4,Offense--bal-2023,27.0,324,37,251,108,116,27,66-152,43.42,4-10,40.0,233-342,2490,7.8,207.5,14,5,29-166,,390,1903,4.9,158.6,22,761,4559,379.9,8-153,19.1,22-276,13.8,11-171,15.5,40.8,"47-2,260",21-26,84,73-684,57,31:49,19-9,5,17,5,Opposing-offense-bal-2023,15.6,187,16,,209,66,117,26,64-177,36.16,8-22,36.36,,270-446,2060,5.5,171.7,10,11,47-374,,286,1227,4.3,102.3,4,,779,3661,305.1,,10-220,22.0,22-304,12.5,5-54,10.8,,41.6,"58-2,687",25-26,67,,76-596,49.667,,28:54,,14-8,133,17
3,6,Offense--buf-2023,27.3,328,39,271,104,149,18,77-155,49.68,7-12,58.33,295-433,3131,7.4,260.9,24,13,15-83,,332,1468,4.4,122.3,14,780,4682,390.2,11-239,21.7,21-186,12.4,11-84,7.6,39.4,"34-1,514",18-23,57,84-696,58,31:20,12-7,1,17,7,Opposing-offense-buf-2023,18.9,227,25,,228,65,136,27,60-156,38.46,9-16,56.25,,261-389,2439,6.9,203.3,16,11,41-235,,300,1400,4.7,116.7,8,,730,4074,339.5,,23-470,20.4,14-173,8.9,13-71,5.5,,45.2,"49-2,401",19-20,72,,75-582,48.5,,29:20,,19-10,161,17
4,8,Offense--car-2023,15.7,173,17,205,67,111,27,60-161,37.27,15-26,57.69,251-409,1906,5.5,173.3,11,8,43-339,,267,1019,3.8,92.6,3,719,3264,296.7,15-447,29.8,25-231,6.7,5-202,40.4,44.7,"52-2,456",18-21,59,74-590,53,30:54,13-6,-7,17,9,Opposing-offense-car-2023,26.5,292,35,,215,84,112,19,48-137,35.04,6-8,75.0,,211-322,1976,6.5,179.6,13,5,18-126,,326,1374,4.2,124.9,18,,666,3476,316.0,,13-322,24.8,20-134,9.2,8-165,20.6,,44.5,"51-2,498",16-16,75,,67-491,44.636,,29:05,,5-2,151,17


In [83]:
final_df = final_df.drop(columns=['DEF_1st Downs', 'DEF_Passing', 'DEF_Offense', 'DEF_Returns', 'DEF_Kicking', 'DEF_Penalties', 'DEF_Time of Possession', 'DEF_Miscellaneous'])
# same rows as before but defense

In [84]:
final_df = final_df.drop(columns=['OFF_Rushing', 'DEF_Rushing', 'DEF_SeasonID'])
# forgot rushing for both so dropping those. Drop year because its redundant, and def_season ID because we can 
# identify defensive stats by their new column name

In [85]:
final_df.head()

Unnamed: 0,OFF_index,OFF_SeasonID,OFF_Total Points Per Game,OFF_Total Points,OFF_Total Touchdowns,OFF_Total 1st downs,OFF_Rushing 1st downs,OFF_Passing 1st downs,OFF_1st downs by penalty,OFF_3rd down efficiency,OFF_3rd down %,OFF_4th down efficiency,OFF_4th down %,OFF_Comp-Att,OFF_Net Passing Yards,OFF_Yards Per Pass Attempt,OFF_Net Passing Yards Per Game,OFF_Passing Touchdowns,OFF_Interceptions,OFF_Sacks-Yards Lost,OFF_Rushing Attempts,OFF_Rushing Yards,OFF_Yards Per Rush Attempt,OFF_Rushing Yards Per Game,OFF_Rushing Touchdowns,OFF_Total Offensive Plays,OFF_Total Yards,OFF_Yards Per Game,OFF_Kickoffs: Total,OFF_Average Kickoff Return Yards,OFF_Punt: Total,OFF_Average Punt Return Yards,OFF_INT: Total,OFF_Average Interception Yards,OFF_Net Average Punt Yards,OFF_Punt: Total Yards,OFF_FG: Good-Attempts,OFF_Touchback Percentage,OFF_Total-Yards,OFF_Avg. Per Game (YDS),OFF_Possession Time Seconds,OFF_Fumbles-Lost,OFF_Turnover Ratio,OFF_Games,DEF_index,DEF_Total Points Per Game,DEF_Total Points,DEF_Total Touchdowns,DEF_Total 1st downs,DEF_Rushing 1st downs,DEF_Passing 1st downs,DEF_1st downs by penalty,DEF_3rd down efficiency,DEF_3rd down %,DEF_4th down efficiency,DEF_4th down %,DEF_Comp-Att,DEF_Net Passing Yards,DEF_Yards Per Pass Attempt,DEF_Net Passing Yards Per Game,DEF_Passing Touchdowns,DEF_Interceptions,DEF_Sacks-Yards Lost,DEF_Rushing Attempts,DEF_Rushing Yards,DEF_Yards Per Rush Attempt,DEF_Rushing Yards Per Game,DEF_Rushing Touchdowns,DEF_Total Offensive Plays,DEF_Total Yards,DEF_Yards Per Game,DEF_Kickoffs: Total,DEF_Average Kickoff Return Yards,DEF_Punt: Total,DEF_Average Punt Return Yards,DEF_INT: Total,DEF_Average Interception Yards,DEF_Net Average Punt Yards,DEF_Punt: Total Yards,DEF_FG: Good-Attempts,DEF_Touchback Percentage,DEF_Total-Yards,DEF_Avg. Per Game (YDS),DEF_Possession Time Seconds,DEF_Fumbles-Lost,DEF_Turnover Ratio,DEF_Games
0,0,Offense--ari-2023,17.2,206,22,218,76,118,24,54-153,35.29,8-25,32.0,245-394,2109,6.0,175.8,10,9,33-241,305,1461,4.8,121.8,11,732,3811,317.6,10-180,18.0,21-207,10.2,9-98,10.9,42.0,"48-2,343",19-22,84,79-720,60,27:46,12-6,-1,17,1,26.8,321,38,270,101,146,23,71-151,47.02,6-13,46.15,261-372,2621,7.7,218.4,21,9,32-233,377,1681,4.5,140.1,16,781,4535,377.9,3-63,21.0,32-325,9.9,9-41,4.6,43.2,"42-2,020",19-23,82,73-599,49.917,32:13,7-5,161,17
1,2,Offense--atl-2023,19.4,213,21,224,90,119,15,62-148,41.89,6-14,42.86,216-343,2217,7.1,201.5,10,9,30-211,352,1532,4.4,139.3,10,725,3960,360.0,11-193,17.5,16-88,12.3,6-124,20.7,42.4,"46-2,196",22-23,85,59-528,48,30:17,11-9,-6,17,3,21.1,232,22,204,61,120,23,50-142,35.21,5-10,50.0,227-361,2300,6.8,209.1,17,6,22-149,304,1232,4.1,112.0,4,687,3681,334.6,7-153,21.9,20-245,5.5,9-139,15.4,45.9,"48-2,292",26-27,81,73-600,54.545,29:42,12-6,140,17
2,4,Offense--bal-2023,27.0,324,37,251,108,116,27,66-152,43.42,4-10,40.0,233-342,2490,7.8,207.5,14,5,29-166,390,1903,4.9,158.6,22,761,4559,379.9,8-153,19.1,22-276,13.8,11-171,15.5,40.8,"47-2,260",21-26,84,73-684,57,31:49,19-9,5,17,5,15.6,187,16,209,66,117,26,64-177,36.16,8-22,36.36,270-446,2060,5.5,171.7,10,11,47-374,286,1227,4.3,102.3,4,779,3661,305.1,10-220,22.0,22-304,12.5,5-54,10.8,41.6,"58-2,687",25-26,67,76-596,49.667,28:54,14-8,133,17
3,6,Offense--buf-2023,27.3,328,39,271,104,149,18,77-155,49.68,7-12,58.33,295-433,3131,7.4,260.9,24,13,15-83,332,1468,4.4,122.3,14,780,4682,390.2,11-239,21.7,21-186,12.4,11-84,7.6,39.4,"34-1,514",18-23,57,84-696,58,31:20,12-7,1,17,7,18.9,227,25,228,65,136,27,60-156,38.46,9-16,56.25,261-389,2439,6.9,203.3,16,11,41-235,300,1400,4.7,116.7,8,730,4074,339.5,23-470,20.4,14-173,8.9,13-71,5.5,45.2,"49-2,401",19-20,72,75-582,48.5,29:20,19-10,161,17
4,8,Offense--car-2023,15.7,173,17,205,67,111,27,60-161,37.27,15-26,57.69,251-409,1906,5.5,173.3,11,8,43-339,267,1019,3.8,92.6,3,719,3264,296.7,15-447,29.8,25-231,6.7,5-202,40.4,44.7,"52-2,456",18-21,59,74-590,53,30:54,13-6,-7,17,9,26.5,292,35,215,84,112,19,48-137,35.04,6-8,75.0,211-322,1976,6.5,179.6,13,5,18-126,326,1374,4.2,124.9,18,666,3476,316.0,13-322,24.8,20-134,9.2,8-165,20.6,44.5,"51-2,498",16-16,75,67-491,44.636,29:05,5-2,151,17


In [86]:
final_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 87 columns):
 #   Column                            Non-Null Count  Dtype 
---  ------                            --------------  ----- 
 0   OFF_index                         32 non-null     int64 
 1   OFF_SeasonID                      32 non-null     object
 2   OFF_Total Points Per Game         32 non-null     object
 3   OFF_Total Points                  32 non-null     object
 4   OFF_Total Touchdowns              32 non-null     object
 5   OFF_Total 1st downs               32 non-null     object
 6   OFF_Rushing 1st downs             32 non-null     object
 7   OFF_Passing 1st downs             32 non-null     object
 8   OFF_1st downs by penalty          32 non-null     object
 9   OFF_3rd down efficiency           32 non-null     object
 10  OFF_3rd down %                    32 non-null     object
 11  OFF_4th down efficiency           32 non-null     object
 12  OFF_4th down %          

In [87]:
final_df.to_csv('In-Season_Wk13.csv', index=False)
# You are going to want to change the name you export to, or it will override the previous week's data. I am naming 
# them for weeks so I can understand what I'm looking at in my folder.

# Name of week 11 includes all data played UP TO AND DURING WEEK 11. This is used to make predictions on Week 12

## Feature Engineering Steps

In [88]:
fun = pd.read_csv('/Users/justintunley/Documents/BrainStation/Capstone/In-Season_2023_Week11.csv')
# I felt like it was a fitting name....

In [89]:
fun.head()

Unnamed: 0,SeasonID,PPG,Tot_TDs_PG,1st_Downs_PG,Rush_1st_Downs_PG,Pass_1st_Downs_PG,OFF_1st_by_pen_PG,3rd_Conv_Rate,4th_Conv_Rate,Pass_Comp_Rate,Pass_Yds_PG,Pass_Yds_Per_Attempt,Pass_Tds_PG,Off_Int_PG,Rush_Att_PG,Yds_Per_Rush,Rush_Yds_PG,Rush_Tds_PG,Off_Plays_PG,Tot_Yds_PG,Kickoffs_Returned_PG,Avg_K_Return_Yds,Punts_Returned_PG,Avg_P_Return_Yds,Int_Forced_PG,Avg_I_Return_Yds,Yds_Per_Punt,Punts_PG,FG_Conv_Rate,Touchback_Rate,Penalties_PG,Avg_Pen_Yds_PG,Avg_TOP,Fum_Lost_PG,Games,DEF_PPG_Against,DEF_Tot_Tds_PG_Against,DEF_1st_Downs_PG_Against,DEF_Rush_1st_Downs_PG_Against,DEF_Pass_1st_Downs_PG_Against,DEF_1st_by_pen_PG,DEF_3rd_Conv_Rate,DEF_4th_Conv_Rate,DEF_Pass_Comp_Rate,DEF_Pass_Yds_Per_Attempt,DEF_Pass_Yds_PG,DEF_Pass_Tds_PG,DEF_Int_PG,DEF_Rush_Att_PG,DEF_Yds_Per_Rush,DEF_Rush_Yds_PG,DEF_Rush_Tds_PG,DEF_Tot_Plays_PG,DEF_YPG_Against,DEF_Kickoffs_Returned_PG,DEF_Avg_K_Return_Yds,DEF_Punts_Returned_PG,DEF_Avg_P_Return_Yds,DEF_Avg_I_Return_Yds,DEF_Yds_Per_Punt_Against,DEF_Punts_PG,DEF_FG_Conv_Rate,DEF_Touchback_Rate,DEF_Penalties_PG,DEF_Avg_Pen_Yds_PG,DEF_Avg_TOP,DEF_Fum_Lost_PG,Sacks_Taken_PG,Sack_Yds_Lost_PG,FG_Att_PG,FG_Good_PG,Pass_Att_PG,DEF_Pass_Att_PG,DEF_Sacks_PG,DEF_Sack_Yds_PG,DEF_FG_Att_PG,DEF_FG_Good_PG
0,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,111.176471,6.0,0.529412,0.529412,17.058824,4.8,126.2,0.588235,39.294118,316.5,0.588235,18.0,1.235294,10.1,0.470588,10.5,42.1,2.529412,0.904762,0.85,4.352941,62,0.466833,0.352941,17,25.8,1.941176,14.411765,5.411765,7.647059,1.352941,46.76,36.36,0.696165,7.7,217.5,1.0,0.470588,20.235294,4.2,132.1,0.882353,42.058824,370.7,0.176471,21.0,1.764706,9.9,4.6,43.5,2.352941,0.857143,0.8,4.0,52.364,0.526333,0.294118,1.705882,12.0,1.235294,1.117647,20.529412,19.941176,1.882353,13.705882,1.235294,1.058824
1,Offense--atl-2023,18.9,1.058824,11.705882,4.411765,6.470588,0.823529,41.43,46.15,0.630435,120.529412,7.0,0.529412,0.411765,18.294118,4.2,130.4,0.529412,39.0,356.4,0.529412,16.7,0.882353,12.6,0.294118,6.4,42.4,2.588235,0.954545,0.83,3.176471,49,0.504833,0.529412,17,21.7,1.294118,10.705882,3.235294,6.294118,1.176471,34.38,50.0,0.628483,6.6,200.4,1.0,0.294118,16.235294,3.9,108.4,0.235294,36.470588,322.9,0.411765,21.9,1.117647,5.3,16.9,46.1,2.764706,1.0,0.83,3.941176,55.0,0.488333,0.294118,1.764706,12.411765,1.294118,1.235294,18.941176,19.0,1.235294,8.294118,1.235294,1.235294
2,Offense--bal-2023,27.6,2.058824,13.588235,5.764706,6.294118,1.529412,44.6,42.86,0.693548,136.823529,8.0,0.764706,0.294118,20.882353,4.8,155.1,1.235294,40.705882,380.5,0.470588,19.1,1.235294,13.4,0.588235,17.1,41.3,2.588235,0.826087,0.85,4.0,58,0.523333,0.529412,17,16.1,0.882353,11.352941,3.588235,6.235294,1.529412,35.19,36.84,0.599502,5.5,169.7,0.529412,0.588235,15.705882,4.3,103.7,0.235294,41.941176,305.3,0.529412,22.1,1.235294,13.0,10.8,41.1,3.235294,0.96,0.65,4.176471,50.455,0.4845,0.294118,1.588235,9.0,1.352941,1.117647,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765
3,Offense--buf-2023,26.7,2.058824,14.235294,5.352941,7.882353,1.0,48.12,54.55,0.696335,164.647059,7.5,1.294118,0.705882,17.176471,4.4,117.7,0.705882,40.470588,379.1,0.588235,21.4,0.941176,11.9,0.588235,8.2,39.6,1.764706,0.842105,0.58,4.294118,56,0.505,0.411765,17,17.3,1.176471,12.0,3.294118,7.294118,1.411765,38.62,56.25,0.678771,6.9,204.2,0.764706,0.588235,15.764706,4.5,110.5,0.352941,39.117647,335.4,1.352941,20.4,0.764706,9.3,5.5,45.1,2.588235,0.947368,0.7,4.176471,50.182,0.489,0.529412,0.823529,4.470588,1.117647,0.941176,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824
4,Offense--car-2023,16.3,0.941176,10.941176,3.470588,6.117647,1.352941,37.24,56.52,0.616402,102.588235,5.4,0.647059,0.470588,13.823529,3.9,92.3,0.117647,38.352941,297.4,0.705882,31.3,1.352941,6.6,0.294118,40.4,44.7,2.705882,0.85,0.63,4.117647,55,0.506,0.294118,17,27.5,1.941176,11.764706,4.588235,6.058824,1.117647,36.51,71.43,0.656463,6.5,179.2,0.764706,0.294118,17.764706,4.3,129.4,0.941176,36.058824,321.1,0.588235,24.7,1.058824,9.7,20.6,43.2,2.588235,1.0,0.79,3.470588,43.8,0.487167,0.117647,2.294118,18.058824,1.176471,1.0,22.235294,17.294118,1.0,7.352941,0.882353,0.882353


In [90]:
fun.shape

(32, 77)

In [91]:
fun = fun.replace(',','', regex=True)

In [92]:
fun.rename(columns={'OFF_SeasonID':'SeasonID'}, inplace=True)

In [93]:
fun.rename(columns={'OFF_Games': 'Games'}, inplace=True)

In [94]:
fun.head()

Unnamed: 0,SeasonID,PPG,Tot_TDs_PG,1st_Downs_PG,Rush_1st_Downs_PG,Pass_1st_Downs_PG,OFF_1st_by_pen_PG,3rd_Conv_Rate,4th_Conv_Rate,Pass_Comp_Rate,Pass_Yds_PG,Pass_Yds_Per_Attempt,Pass_Tds_PG,Off_Int_PG,Rush_Att_PG,Yds_Per_Rush,Rush_Yds_PG,Rush_Tds_PG,Off_Plays_PG,Tot_Yds_PG,Kickoffs_Returned_PG,Avg_K_Return_Yds,Punts_Returned_PG,Avg_P_Return_Yds,Int_Forced_PG,Avg_I_Return_Yds,Yds_Per_Punt,Punts_PG,FG_Conv_Rate,Touchback_Rate,Penalties_PG,Avg_Pen_Yds_PG,Avg_TOP,Fum_Lost_PG,Games,DEF_PPG_Against,DEF_Tot_Tds_PG_Against,DEF_1st_Downs_PG_Against,DEF_Rush_1st_Downs_PG_Against,DEF_Pass_1st_Downs_PG_Against,DEF_1st_by_pen_PG,DEF_3rd_Conv_Rate,DEF_4th_Conv_Rate,DEF_Pass_Comp_Rate,DEF_Pass_Yds_Per_Attempt,DEF_Pass_Yds_PG,DEF_Pass_Tds_PG,DEF_Int_PG,DEF_Rush_Att_PG,DEF_Yds_Per_Rush,DEF_Rush_Yds_PG,DEF_Rush_Tds_PG,DEF_Tot_Plays_PG,DEF_YPG_Against,DEF_Kickoffs_Returned_PG,DEF_Avg_K_Return_Yds,DEF_Punts_Returned_PG,DEF_Avg_P_Return_Yds,DEF_Avg_I_Return_Yds,DEF_Yds_Per_Punt_Against,DEF_Punts_PG,DEF_FG_Conv_Rate,DEF_Touchback_Rate,DEF_Penalties_PG,DEF_Avg_Pen_Yds_PG,DEF_Avg_TOP,DEF_Fum_Lost_PG,Sacks_Taken_PG,Sack_Yds_Lost_PG,FG_Att_PG,FG_Good_PG,Pass_Att_PG,DEF_Pass_Att_PG,DEF_Sacks_PG,DEF_Sack_Yds_PG,DEF_FG_Att_PG,DEF_FG_Good_PG
0,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,111.176471,6.0,0.529412,0.529412,17.058824,4.8,126.2,0.588235,39.294118,316.5,0.588235,18.0,1.235294,10.1,0.470588,10.5,42.1,2.529412,0.904762,0.85,4.352941,62,0.466833,0.352941,17,25.8,1.941176,14.411765,5.411765,7.647059,1.352941,46.76,36.36,0.696165,7.7,217.5,1.0,0.470588,20.235294,4.2,132.1,0.882353,42.058824,370.7,0.176471,21.0,1.764706,9.9,4.6,43.5,2.352941,0.857143,0.8,4.0,52.364,0.526333,0.294118,1.705882,12.0,1.235294,1.117647,20.529412,19.941176,1.882353,13.705882,1.235294,1.058824
1,Offense--atl-2023,18.9,1.058824,11.705882,4.411765,6.470588,0.823529,41.43,46.15,0.630435,120.529412,7.0,0.529412,0.411765,18.294118,4.2,130.4,0.529412,39.0,356.4,0.529412,16.7,0.882353,12.6,0.294118,6.4,42.4,2.588235,0.954545,0.83,3.176471,49,0.504833,0.529412,17,21.7,1.294118,10.705882,3.235294,6.294118,1.176471,34.38,50.0,0.628483,6.6,200.4,1.0,0.294118,16.235294,3.9,108.4,0.235294,36.470588,322.9,0.411765,21.9,1.117647,5.3,16.9,46.1,2.764706,1.0,0.83,3.941176,55.0,0.488333,0.294118,1.764706,12.411765,1.294118,1.235294,18.941176,19.0,1.235294,8.294118,1.235294,1.235294
2,Offense--bal-2023,27.6,2.058824,13.588235,5.764706,6.294118,1.529412,44.6,42.86,0.693548,136.823529,8.0,0.764706,0.294118,20.882353,4.8,155.1,1.235294,40.705882,380.5,0.470588,19.1,1.235294,13.4,0.588235,17.1,41.3,2.588235,0.826087,0.85,4.0,58,0.523333,0.529412,17,16.1,0.882353,11.352941,3.588235,6.235294,1.529412,35.19,36.84,0.599502,5.5,169.7,0.529412,0.588235,15.705882,4.3,103.7,0.235294,41.941176,305.3,0.529412,22.1,1.235294,13.0,10.8,41.1,3.235294,0.96,0.65,4.176471,50.455,0.4845,0.294118,1.588235,9.0,1.352941,1.117647,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765
3,Offense--buf-2023,26.7,2.058824,14.235294,5.352941,7.882353,1.0,48.12,54.55,0.696335,164.647059,7.5,1.294118,0.705882,17.176471,4.4,117.7,0.705882,40.470588,379.1,0.588235,21.4,0.941176,11.9,0.588235,8.2,39.6,1.764706,0.842105,0.58,4.294118,56,0.505,0.411765,17,17.3,1.176471,12.0,3.294118,7.294118,1.411765,38.62,56.25,0.678771,6.9,204.2,0.764706,0.588235,15.764706,4.5,110.5,0.352941,39.117647,335.4,1.352941,20.4,0.764706,9.3,5.5,45.1,2.588235,0.947368,0.7,4.176471,50.182,0.489,0.529412,0.823529,4.470588,1.117647,0.941176,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824
4,Offense--car-2023,16.3,0.941176,10.941176,3.470588,6.117647,1.352941,37.24,56.52,0.616402,102.588235,5.4,0.647059,0.470588,13.823529,3.9,92.3,0.117647,38.352941,297.4,0.705882,31.3,1.352941,6.6,0.294118,40.4,44.7,2.705882,0.85,0.63,4.117647,55,0.506,0.294118,17,27.5,1.941176,11.764706,4.588235,6.058824,1.117647,36.51,71.43,0.656463,6.5,179.2,0.764706,0.294118,17.764706,4.3,129.4,0.941176,36.058824,321.1,0.588235,24.7,1.058824,9.7,20.6,43.2,2.588235,1.0,0.79,3.470588,43.8,0.487167,0.117647,2.294118,18.058824,1.176471,1.0,22.235294,17.294118,1.0,7.352941,0.882353,0.882353


In [95]:
# Here we will be going feature by feature doing 1 of 4 things:
#     1. If it is a rate of calculation on a per-game basis, we leave as is
#     2. If it is a season total that is not already calculated, we divide by the games column
#     3. If is a header or unnecessary, we drop
#     4. In some cases, we can split info into two new columns to get more valuable information
# For all of them, we will rename the header

In [96]:
# Note: We can do a lot of these together in single steps, but for ease of understanding we went one-by-one

In [97]:
# Note to self: remember our target variables will be PPG and PA_PG

In [98]:
fun.rename(columns={'OFF_Total Points Per Game':'PPG'}, inplace=True)

In [100]:
fun = fun.drop(columns=['OFF_Total Points'])
# this is the same as PPG but multiplied by number of games. Therefore redundant and can drop

KeyError: "['OFF_Total Points'] not found in axis"

In [109]:
fun.head()

Unnamed: 0,SeasonID,PPG,Tot_TDs_PG,1st_Downs_PG,Rush_1st_Downs_PG,Pass_1st_Downs_PG,OFF_1st_by_pen_PG,3rd_Conv_Rate,4th_Conv_Rate,Pass_Comp_Rate,Pass_Yds_PG,Pass_Yds_Per_Attempt,Pass_Tds_PG,Off_Int_PG,Rush_Att_PG,Yds_Per_Rush,Rush_Yds_PG,Rush_Tds_PG,Off_Plays_PG,Tot_Yds_PG,Kickoffs_Returned_PG,Avg_K_Return_Yds,Punts_Returned_PG,Avg_P_Return_Yds,Int_Forced_PG,Avg_I_Return_Yds,Yds_Per_Punt,Punts_PG,FG_Conv_Rate,Touchback_Rate,Penalties_PG,Avg_Pen_Yds_PG,Avg_TOP,Fum_Lost_PG,Games,DEF_PPG_Against,DEF_Tot_Tds_PG_Against,DEF_1st_Downs_PG_Against,DEF_Rush_1st_Downs_PG_Against,DEF_Pass_1st_Downs_PG_Against,DEF_1st_by_pen_PG,DEF_3rd_Conv_Rate,DEF_4th_Conv_Rate,DEF_Pass_Comp_Rate,DEF_Pass_Yds_Per_Attempt,DEF_Pass_Yds_PG,DEF_Pass_Tds_PG,DEF_Int_PG,DEF_Rush_Att_PG,DEF_Yds_Per_Rush,DEF_Rush_Yds_PG,DEF_Rush_Tds_PG,DEF_Tot_Plays_PG,DEF_YPG_Against,DEF_Kickoffs_Returned_PG,DEF_Avg_K_Return_Yds,DEF_Punts_Returned_PG,DEF_Avg_P_Return_Yds,DEF_Avg_I_Return_Yds,DEF_Yds_Per_Punt_Against,DEF_Punts_PG,DEF_FG_Conv_Rate,DEF_Touchback_Rate,DEF_Penalties_PG,DEF_Avg_Pen_Yds_PG,DEF_Avg_TOP,DEF_Fum_Lost_PG,Sacks_Taken_PG,Sack_Yds_Lost_PG,FG_Att_PG,FG_Good_PG,Pass_Att_PG,DEF_Pass_Att_PG,DEF_Sacks_PG,DEF_Sack_Yds_PG,DEF_FG_Att_PG,DEF_FG_Good_PG
0,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,111.176471,6.0,0.529412,0.529412,17.058824,4.8,126.2,0.588235,39.294118,316.5,0.588235,18.0,1.235294,10.1,0.470588,10.5,42.1,2.529412,0.904762,0.85,4.352941,62,0.466833,0.352941,17,25.8,1.941176,14.411765,5.411765,7.647059,1.352941,46.76,36.36,0.696165,7.7,217.5,1.0,0.470588,20.235294,4.2,132.1,0.882353,42.058824,370.7,0.176471,21.0,1.764706,9.9,4.6,43.5,2.352941,0.857143,0.8,4.0,52.364,0.526333,0.294118,1.705882,12.0,1.235294,1.117647,20.529412,19.941176,1.882353,13.705882,1.235294,1.058824
1,Offense--atl-2023,18.9,1.058824,11.705882,4.411765,6.470588,0.823529,41.43,46.15,0.630435,120.529412,7.0,0.529412,0.411765,18.294118,4.2,130.4,0.529412,39.0,356.4,0.529412,16.7,0.882353,12.6,0.294118,6.4,42.4,2.588235,0.954545,0.83,3.176471,49,0.504833,0.529412,17,21.7,1.294118,10.705882,3.235294,6.294118,1.176471,34.38,50.0,0.628483,6.6,200.4,1.0,0.294118,16.235294,3.9,108.4,0.235294,36.470588,322.9,0.411765,21.9,1.117647,5.3,16.9,46.1,2.764706,1.0,0.83,3.941176,55.0,0.488333,0.294118,1.764706,12.411765,1.294118,1.235294,18.941176,19.0,1.235294,8.294118,1.235294,1.235294
2,Offense--bal-2023,27.6,2.058824,13.588235,5.764706,6.294118,1.529412,44.6,42.86,0.693548,136.823529,8.0,0.764706,0.294118,20.882353,4.8,155.1,1.235294,40.705882,380.5,0.470588,19.1,1.235294,13.4,0.588235,17.1,41.3,2.588235,0.826087,0.85,4.0,58,0.523333,0.529412,17,16.1,0.882353,11.352941,3.588235,6.235294,1.529412,35.19,36.84,0.599502,5.5,169.7,0.529412,0.588235,15.705882,4.3,103.7,0.235294,41.941176,305.3,0.529412,22.1,1.235294,13.0,10.8,41.1,3.235294,0.96,0.65,4.176471,50.455,0.4845,0.294118,1.588235,9.0,1.352941,1.117647,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765
3,Offense--buf-2023,26.7,2.058824,14.235294,5.352941,7.882353,1.0,48.12,54.55,0.696335,164.647059,7.5,1.294118,0.705882,17.176471,4.4,117.7,0.705882,40.470588,379.1,0.588235,21.4,0.941176,11.9,0.588235,8.2,39.6,1.764706,0.842105,0.58,4.294118,56,0.505,0.411765,17,17.3,1.176471,12.0,3.294118,7.294118,1.411765,38.62,56.25,0.678771,6.9,204.2,0.764706,0.588235,15.764706,4.5,110.5,0.352941,39.117647,335.4,1.352941,20.4,0.764706,9.3,5.5,45.1,2.588235,0.947368,0.7,4.176471,50.182,0.489,0.529412,0.823529,4.470588,1.117647,0.941176,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824
4,Offense--car-2023,16.3,0.941176,10.941176,3.470588,6.117647,1.352941,37.24,56.52,0.616402,102.588235,5.4,0.647059,0.470588,13.823529,3.9,92.3,0.117647,38.352941,297.4,0.705882,31.3,1.352941,6.6,0.294118,40.4,44.7,2.705882,0.85,0.63,4.117647,55,0.506,0.294118,17,27.5,1.941176,11.764706,4.588235,6.058824,1.117647,36.51,71.43,0.656463,6.5,179.2,0.764706,0.294118,17.764706,4.3,129.4,0.941176,36.058824,321.1,0.588235,24.7,1.058824,9.7,20.6,43.2,2.588235,1.0,0.79,3.470588,43.8,0.487167,0.117647,2.294118,18.058824,1.176471,1.0,22.235294,17.294118,1.0,7.352941,0.882353,0.882353


In [110]:
fun['OFF_Total Touchdowns'] = (fun['OFF_Total Touchdowns'] / fun['Games'])
# we want touchdowns per game

KeyError: 'OFF_Total Touchdowns'

In [None]:
fun.rename(columns={'OFF_Total Touchdowns':'Tot_TDs_PG'}, inplace=True)
# rename column accordingly

In [None]:
fun.head()
# check to make sure it worked

In [None]:
fun['OFF_Total 1st downs'] = (fun['OFF_Total 1st downs'] / fun['Games'])

In [None]:
fun.rename(columns={'OFF_Total 1st downs':'1st_Downs_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['OFF_Rushing 1st downs'] = (fun['OFF_Rushing 1st downs'] / fun['Games'])

In [None]:
fun.rename(columns={'OFF_Rushing 1st downs':'Rush_1st_Downs_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['OFF_Passing 1st downs'] = (fun['OFF_Passing 1st downs'] / fun['Games'])

In [None]:
fun.rename(columns={'OFF_Passing 1st downs':'Pass_1st_Downs_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['OFF_1st downs by penalty'] = (fun['OFF_1st downs by penalty'] / fun['Games'])

In [None]:
fun.rename(columns={'OFF_1st downs by penalty':'OFF_1st_by_pen_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun = fun.drop(columns='OFF_3rd down efficiency')

In [None]:
fun.rename(columns={'OFF_3rd down %':'3rd_Conv_rate'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun = fun.drop(columns='OFF_4th down efficiency')

In [None]:
fun.rename(columns={'OFF_4th down %':'4th_Conv_Rate'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Comp-Att':'Pass_Comp_Rate'}, inplace=True)
# MUST ALSO TURN THIS INTO A RATE

In [None]:
fun.head()

In [None]:
# New Problem: We need to break up the string as two integers and divide one by the other to get a rate
# We will do this a few times so remember this methodology

In [None]:
test1 = fun['Pass_Comp_Rate'].str.split('-', expand=True)

In [None]:
type(test1[0][1])

In [None]:
test1 = test1.rename(columns={0:'Completed', 1:'Attempts'})

In [None]:
test1['Completed'] = test1['Completed'].astype(int)
test1['Attempts'] = test1['Attempts'].astype(int)

In [None]:
fun['Pass_Comp_Rate'] = test1['Completed'] / test1['Attempts']

In [None]:
fun.head()
# LETS GOOOOOO

In [None]:
# if len(testing) == 2:
#     comps = int(testing[0])
#     atts = int(testing[1])
#     if comps != 0:
#         done = comps / atts
#     else:
#         print('Error: division by zero')
# else:
#     print('Error: Invalid input string format')

# Not worth writing the loop lets just do it

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Net Passing Yards':'Pass_Yds_PG'}, inplace=True)

In [None]:
fun['Pass_Yds_PG'] = (fun['Pass_Yds_PG'] / fun['Games'])

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Yards Per Pass Attempt':'Pass_Yds_Per_Attempt'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun = fun.drop(columns='OFF_Net Passing Yards Per Game')

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Passing Touchdowns':'Pass_Tds_PG'}, inplace=True)

In [None]:
fun['Pass_Tds_PG'] = (fun['Pass_Tds_PG'] / fun['Games'])

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Interceptions':'Off_Int_PG'}, inplace=True)

In [None]:
fun['Off_Int_PG'] = (fun['Off_Int_PG'] / fun['Games'])

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Interceptions':'Off_Int_PG'}, inplace=True)

In [None]:
# REMEMBER TO COME BACK FOR OFF_SACKS-YARDS LOST

In [None]:
fun['OFF_Rushing Attempts'] = (fun['OFF_Rushing Attempts'] / fun['Games'])

In [None]:
fun.rename(columns={'OFF_Rushing Attempts':'Rush_Att_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun = fun.drop(columns='OFF_Rushing Yards')

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Yards Per Rush Attempt':'Yds_Per_Rush'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Rushing Yards Per Game':'Rush_Yds_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['OFF_Rushing Touchdowns'] = (fun['OFF_Rushing Touchdowns'] / fun['Games'])

In [111]:
fun.rename(columns={'OFF_Rushing Touchdowns':'Rush_Tds_PG'}, inplace=True)

In [112]:
fun.head()

Unnamed: 0,SeasonID,PPG,Tot_TDs_PG,1st_Downs_PG,Rush_1st_Downs_PG,Pass_1st_Downs_PG,OFF_1st_by_pen_PG,3rd_Conv_Rate,4th_Conv_Rate,Pass_Comp_Rate,Pass_Yds_PG,Pass_Yds_Per_Attempt,Pass_Tds_PG,Off_Int_PG,Rush_Att_PG,Yds_Per_Rush,Rush_Yds_PG,Rush_Tds_PG,Off_Plays_PG,Tot_Yds_PG,Kickoffs_Returned_PG,Avg_K_Return_Yds,Punts_Returned_PG,Avg_P_Return_Yds,Int_Forced_PG,Avg_I_Return_Yds,Yds_Per_Punt,Punts_PG,FG_Conv_Rate,Touchback_Rate,Penalties_PG,Avg_Pen_Yds_PG,Avg_TOP,Fum_Lost_PG,Games,DEF_PPG_Against,DEF_Tot_Tds_PG_Against,DEF_1st_Downs_PG_Against,DEF_Rush_1st_Downs_PG_Against,DEF_Pass_1st_Downs_PG_Against,DEF_1st_by_pen_PG,DEF_3rd_Conv_Rate,DEF_4th_Conv_Rate,DEF_Pass_Comp_Rate,DEF_Pass_Yds_Per_Attempt,DEF_Pass_Yds_PG,DEF_Pass_Tds_PG,DEF_Int_PG,DEF_Rush_Att_PG,DEF_Yds_Per_Rush,DEF_Rush_Yds_PG,DEF_Rush_Tds_PG,DEF_Tot_Plays_PG,DEF_YPG_Against,DEF_Kickoffs_Returned_PG,DEF_Avg_K_Return_Yds,DEF_Punts_Returned_PG,DEF_Avg_P_Return_Yds,DEF_Avg_I_Return_Yds,DEF_Yds_Per_Punt_Against,DEF_Punts_PG,DEF_FG_Conv_Rate,DEF_Touchback_Rate,DEF_Penalties_PG,DEF_Avg_Pen_Yds_PG,DEF_Avg_TOP,DEF_Fum_Lost_PG,Sacks_Taken_PG,Sack_Yds_Lost_PG,FG_Att_PG,FG_Good_PG,Pass_Att_PG,DEF_Pass_Att_PG,DEF_Sacks_PG,DEF_Sack_Yds_PG,DEF_FG_Att_PG,DEF_FG_Good_PG
0,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,111.176471,6.0,0.529412,0.529412,17.058824,4.8,126.2,0.588235,39.294118,316.5,0.588235,18.0,1.235294,10.1,0.470588,10.5,42.1,2.529412,0.904762,0.85,4.352941,62,0.466833,0.352941,17,25.8,1.941176,14.411765,5.411765,7.647059,1.352941,46.76,36.36,0.696165,7.7,217.5,1.0,0.470588,20.235294,4.2,132.1,0.882353,42.058824,370.7,0.176471,21.0,1.764706,9.9,4.6,43.5,2.352941,0.857143,0.8,4.0,52.364,0.526333,0.294118,1.705882,12.0,1.235294,1.117647,20.529412,19.941176,1.882353,13.705882,1.235294,1.058824
1,Offense--atl-2023,18.9,1.058824,11.705882,4.411765,6.470588,0.823529,41.43,46.15,0.630435,120.529412,7.0,0.529412,0.411765,18.294118,4.2,130.4,0.529412,39.0,356.4,0.529412,16.7,0.882353,12.6,0.294118,6.4,42.4,2.588235,0.954545,0.83,3.176471,49,0.504833,0.529412,17,21.7,1.294118,10.705882,3.235294,6.294118,1.176471,34.38,50.0,0.628483,6.6,200.4,1.0,0.294118,16.235294,3.9,108.4,0.235294,36.470588,322.9,0.411765,21.9,1.117647,5.3,16.9,46.1,2.764706,1.0,0.83,3.941176,55.0,0.488333,0.294118,1.764706,12.411765,1.294118,1.235294,18.941176,19.0,1.235294,8.294118,1.235294,1.235294
2,Offense--bal-2023,27.6,2.058824,13.588235,5.764706,6.294118,1.529412,44.6,42.86,0.693548,136.823529,8.0,0.764706,0.294118,20.882353,4.8,155.1,1.235294,40.705882,380.5,0.470588,19.1,1.235294,13.4,0.588235,17.1,41.3,2.588235,0.826087,0.85,4.0,58,0.523333,0.529412,17,16.1,0.882353,11.352941,3.588235,6.235294,1.529412,35.19,36.84,0.599502,5.5,169.7,0.529412,0.588235,15.705882,4.3,103.7,0.235294,41.941176,305.3,0.529412,22.1,1.235294,13.0,10.8,41.1,3.235294,0.96,0.65,4.176471,50.455,0.4845,0.294118,1.588235,9.0,1.352941,1.117647,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765
3,Offense--buf-2023,26.7,2.058824,14.235294,5.352941,7.882353,1.0,48.12,54.55,0.696335,164.647059,7.5,1.294118,0.705882,17.176471,4.4,117.7,0.705882,40.470588,379.1,0.588235,21.4,0.941176,11.9,0.588235,8.2,39.6,1.764706,0.842105,0.58,4.294118,56,0.505,0.411765,17,17.3,1.176471,12.0,3.294118,7.294118,1.411765,38.62,56.25,0.678771,6.9,204.2,0.764706,0.588235,15.764706,4.5,110.5,0.352941,39.117647,335.4,1.352941,20.4,0.764706,9.3,5.5,45.1,2.588235,0.947368,0.7,4.176471,50.182,0.489,0.529412,0.823529,4.470588,1.117647,0.941176,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824
4,Offense--car-2023,16.3,0.941176,10.941176,3.470588,6.117647,1.352941,37.24,56.52,0.616402,102.588235,5.4,0.647059,0.470588,13.823529,3.9,92.3,0.117647,38.352941,297.4,0.705882,31.3,1.352941,6.6,0.294118,40.4,44.7,2.705882,0.85,0.63,4.117647,55,0.506,0.294118,17,27.5,1.941176,11.764706,4.588235,6.058824,1.117647,36.51,71.43,0.656463,6.5,179.2,0.764706,0.294118,17.764706,4.3,129.4,0.941176,36.058824,321.1,0.588235,24.7,1.058824,9.7,20.6,43.2,2.588235,1.0,0.79,3.470588,43.8,0.487167,0.117647,2.294118,18.058824,1.176471,1.0,22.235294,17.294118,1.0,7.352941,0.882353,0.882353


In [113]:
fun['OFF_Total Offensive Plays'] = (fun['OFF_Total Offensive Plays'] / fun['Games'])

KeyError: 'OFF_Total Offensive Plays'

In [None]:
fun.rename(columns={'OFF_Total Offensive Plays':'Off_Plays_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun = fun.drop(columns='OFF_Total Yards')

In [None]:
fun.rename(columns={'OFF_Yards Per Game':'Tot_Yds_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
# ignore 'OFF_Kickoffs: Total' again. Come back later
# ignore 'OFF_Punt: Total' for now and come back later

In [None]:
fun.rename(columns={'OFF_Average Kickoff Return Yards':'Avg_K_Return_Yds'}, inplace=True)

In [None]:
fun.rename(columns={'OFF_Average Punt Return Yards':'Avg_P_Return_Yds'}, inplace=True)

In [None]:
fun.head()

In [None]:
# COME BACK TO OFF_INT: TOTAL

In [None]:
fun.rename(columns={'OFF_Average Interception Yards':'Avg_I_Return_Yds'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Net Average Punt Yards':'Yds_Per_Punt'}, inplace=True)

In [None]:
fun.head()

In [None]:
# REMEMBER OFF_PUNT:TOTAL YARDS
# REMEMBER OFF_FG: Good-Attempts

In [None]:
fun.rename(columns={'OFF_Touchback Percentage':'Touchback_Rate'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Total-Yards':'Total_Penalties-Yds'}, inplace=True)

In [None]:
# REMEMBER TOTAL_PENALTIES-YDS

In [None]:
fun.head()

In [None]:
fun.rename(columns={'OFF_Avg. Per Game (YDS)':'Avg_Pen_Yds_PG'}, inplace=True)

In [None]:
fun.rename(columns={'OFF_Possession Time Seconds':'Avg_TOP'}, inplace=True)

In [None]:
fun.head()

In [None]:
# REMEMBER OFF_FUMBLES-LOST

In [None]:
fun = fun.drop(columns='OFF_Turnover Ratio')
# this column genuinely makes no sense according to ESPN's explanation
# because is it just a calculation of ints + fumbles on offense and defense,
# we can drop it since it will be redundant

In [None]:
fun.rename(columns={'OFF_Year':'Year'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun.rename(columns={'3rd_Conv_rate':'3rd_Conv_Rate'}, inplace=True)

In [None]:
fun.rename(columns={'DEF_Total Points Per Game':'DEF_PPG_Against'}, inplace=True)

In [None]:
fun = fun.drop(columns='DEF_Total Points')

In [None]:
fun.head()

In [None]:
fun['DEF_Total Touchdowns'] = (fun['DEF_Total Touchdowns'] / fun['Games'])

In [None]:
fun.rename(columns={'DEF_Total Touchdowns':'DEF_Tot_Tds_PG_Against'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['DEF_Total 1st downs'] = (fun['DEF_Total 1st downs'] / fun['Games'])

In [None]:
fun.rename(columns={'DEF_Total 1st downs':'DEF_1st_Downs_PG_Against'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['DEF_Rushing 1st downs'] = (fun['DEF_Rushing 1st downs'] / fun['Games'])

In [None]:
fun.rename(columns={'DEF_Rushing 1st downs':'DEF_Rush_1st_Downs_PG_Against'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['DEF_Passing 1st downs'] = (fun['DEF_Passing 1st downs'] / fun['Games'])

In [None]:
fun.rename(columns={'DEF_Passing 1st downs':'DEF_Pass_1st_Downs_PG_Against'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['DEF_1st downs by penalty'] = (fun['DEF_1st downs by penalty'] / fun['Games'])

In [None]:
fun.rename(columns={'DEF_1st downs by penalty':'DEF_1st_by_pen_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun = fun.drop(columns='DEF_3rd down efficiency')

In [None]:
fun.rename(columns={'DEF_3rd down %':'DEF_3rd_Conv_Rate'}, inplace=True)

In [None]:
fun = fun.drop(columns='DEF_4th down efficiency')

In [None]:
fun.rename(columns={'DEF_4th down %':'DEF_4th_Conv_Rate'}, inplace=True)

In [None]:
fun.head()

In [None]:
# COME BACK TO DEF_COMP-ATT

In [None]:
fun.head()

In [None]:
fun = fun.drop(columns = 'DEF_Net Passing Yards')

In [None]:
fun.rename(columns={'DEF_Yards Per Pass Attempt':'DEF_Pass_Yds_Per_Attempt'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun.rename(columns={'DEF_Net Passing Yards Per Game':'DEF_Pass_Yds_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['DEF_Passing Touchdowns'] = fun['DEF_Passing Touchdowns'] / fun['Games']

In [None]:
fun.rename(columns={'DEF_Passing Touchdowns':'DEF_Pass_Tds_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['DEF_Interceptions'] = fun['DEF_Interceptions'] / fun['Games']

In [None]:
fun.rename(columns={'DEF_Interceptions':'DEF_Int_PG'}, inplace=True)

In [None]:
fun['DEF_Rushing Attempts'] = fun['DEF_Rushing Attempts'] / fun['Games']

In [None]:
fun.rename(columns={'DEF_Rushing Attempts':'DEF_Rush_Att_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
def_cols = fun['DEF_Int_PG']
def_cols.head()

In [114]:
fun['DEF_Int_PG'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 32 entries, 0 to 31
Series name: DEF_Int_PG
Non-Null Count  Dtype  
--------------  -----  
32 non-null     float64
dtypes: float64(1)
memory usage: 388.0 bytes


In [115]:
fun = fun.drop(columns='DEF_Rushing Yards')

KeyError: "['DEF_Rushing Yards'] not found in axis"

In [None]:
fun.rename(columns={'DEF_Rushing Yards Per Game':'DEF_Rush_Yds_PG'}, inplace=True)

In [None]:
fun.rename(columns={'DEF_Yards Per Rush Attempt':'DEF_Yds_Per_Rush'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['DEF_Rushing Touchdowns'] = fun['DEF_Rushing Touchdowns'] / fun['Games']

In [None]:
fun.rename(columns={'DEF_Rushing Touchdowns':'DEF_Rush_Tds_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['DEF_Total Offensive Plays'] = fun['DEF_Total Offensive Plays'] / fun['Games']

In [None]:
fun.rename(columns={'DEF_Total Offensive Plays':'DEF_Tot_Plays_PG'}, inplace=True)

In [None]:
fun = fun.drop(columns = 'DEF_Total Yards')

In [None]:
fun.rename(columns={'DEF_Yards Per Game':'DEF_YPG_Against'}, inplace=True)

In [None]:
fun.head()

In [None]:
# Avg_K_Return_Yds

In [None]:
fun.rename(columns={'DEF_Average Kickoff Return Yards':'DEF_Avg_K_Return_Yds'}, inplace=True)

In [None]:
fun.rename(columns={'DEF_Average Punt Return Yards':'DEF_Avg_P_Return_Yds'}, inplace=True)

In [None]:
fun.rename(columns={'DEF_Average Interception Yards':'DEF_Avg_I_Return_Yds'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun = fun.drop(columns = 'DEF_Games')

In [None]:
fun = fun.drop(columns = 'DEF_Turnover Ratio')

In [None]:
fun.head()

In [None]:
fun.rename(columns={'DEF_Net Average Punt Yards':'DEF_Yds_Per_Punt_Against'}, inplace=True)

In [None]:
fun.rename(columns={'DEF_Touchback Percentage':'DEF_Touchback_Rate'}, inplace=True)

In [None]:
fun.rename(columns={'DEF_Total-Yards':'DEF_Total_Penalties-Yds'}, inplace=True)

In [None]:
fun.head()
# Avg_Pen_Yds_PG

In [None]:
fun.rename(columns={'DEF_Avg. Per Game (YDS)':'DEF_Avg_Pen_Yds_PG'}, inplace=True)

In [None]:
fun.rename(columns={'DEF_Possession Time Seconds':'DEF_Avg_TOP'}, inplace=True)

In [None]:
fun.head()

### We've left a number of columns untouched because they have multiple values within them. In some cases, both numbers provide valuable team information (example: sacks and yards lost by sacks). In other cases, calculations including both numbers provide valuable team information (completions and attempts can calculate completion percentage).
Below is my process for dealing with these 'weird' columns

In [None]:
# So what's left?

# OFFENSE:
# 'OFF_Sacks-Yards Lost'
# 'OFF_Kickoffs: Total'
# 'OFF_Punt: Total'
# 'OFF_INT: Total'
# 'OFF_Punt: Total Yards'
# 'OFF_FG: Good-Attempts'
# 'Total_Penalties-Yds'
# 'OFF_Fumbles-Lost'

# DEFENSE
# 'DEF_Comp-Att'
# 'DEF_Sacks-Yards Lost'
# 'DEF_Kickoffs: Total'
# 'DEF_Punt: Total'
# 'DEF_INT: Total'
# 'DEF_Punt: Total Yards'
# 'DEF_FG: Good-Attempts'
# 'DEF_Total_Penalties-Yds'
# 'DEF_Fumbles-Lost'


# For all of these, we will do what we did to OFF_Comp-Att. The following is a basic template template...
#     tempDF = DF['Metric'].str.split('-', expanded=True)
#     rename columns in tempDF accordingly
#     change type from string to integer
#     set each column in tempDF to column in funDF
#     remove old column

In [None]:
pd.set_option('display.max_columns',None)
# displaying all columns for ease of manipulation. Not necessary

In [None]:
test2 = fun['OFF_Sacks-Yards Lost'].str.split('-', expand=True)
# creating a new dataframe with both sacks and yards lost from sacks

In [None]:
test2 = test2.rename(columns={0:'Sacks_Taken', 1:'Sack_Yds_Lost'})
# kind of unnecessary, but renames columns in new DF

In [None]:
test2.head()
# sanity check to make sure everything it good

In [None]:
test2['Sacks_Taken'] = test2['Sacks_Taken'].astype(int)
test2['Sack_Yds_Lost'] = test2['Sack_Yds_Lost'].astype(int)
# changing string to integer for later manipulation

In [None]:
fun['Sacks_Taken'] = test2['Sacks_Taken']
fun['Sack_Yds_Lost'] = test2['Sack_Yds_Lost']
# new columns in old df = columns from tempDF

In [None]:
fun = fun.drop(columns='OFF_Sacks-Yards Lost')

In [None]:
fun.head()

In [None]:
test3 = fun['OFF_Kickoffs: Total'].str.split('-', expand=True)

In [None]:
test3 = test3.rename(columns={0:'Kickoffs_Returned', 1:'Kickoff_Yds'})
test3.head()

In [None]:
test3['Kickoffs_Returned'] = test3['Kickoffs_Returned'].astype(int)
test3['Kickoff_Yds'] = test3['Kickoff_Yds'].astype(int)

In [None]:
fun['OFF_Kickoffs: Total'] = test3['Kickoffs_Returned']

In [None]:
fun['OFF_Kickoffs: Total'] = fun['OFF_Kickoffs: Total'] / fun['Games']

In [None]:
fun.rename(columns={'OFF_Kickoffs: Total':'Kickoffs_Returned'}, inplace=True)

In [None]:
fun.rename(columns={'Kickoffs_Returned':'Kickoffs_Returned_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
test4 = fun['OFF_Punt: Total'].str.split('-', expand=True)

In [None]:
test4 = test4.rename(columns={0:'Punts_Returned', 1:'Punt_Yds'})

In [None]:
test4['Punts_Returned'] = test4['Punts_Returned'].astype(int)

In [116]:
fun['OFF_Punt: Total'] = test4['Punts_Returned']

NameError: name 'test4' is not defined

In [None]:
fun['OFF_Punt: Total'] = fun['OFF_Punt: Total'] / fun['Games']

In [None]:
fun.rename(columns={'OFF_Punt: Total': 'Punts_Returned_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
test5 = fun['OFF_INT: Total'].str.split('-', expand=True)

In [None]:
test5 = test5.rename(columns={0:'Int_Forced_PG', 1:'WhoCares'})

In [None]:
test5['Int_Forced_PG'] = test5['Int_Forced_PG'].astype(int)

In [None]:
test5.head()

In [None]:
fun['OFF_INT: Total'] = test5['Int_Forced_PG']

In [None]:
fun.rename(columns={'OFF_INT: Total': "Int_Forced_PG"}, inplace=True)

In [None]:
fun['Int_Forced_PG'] = fun['Int_Forced_PG'] / fun['Games']

In [None]:
fun.head()

In [None]:
test6 = fun['OFF_Punt: Total Yards'].str.split('-', expand=True)

In [None]:
test6 = test6.rename(columns={0:'Punts', 1:'Yds'})

In [None]:
test6['Punts'] = test6['Punts'].astype(int)

In [None]:
fun['OFF_Punt: Total Yards'] = test6['Punts']

In [None]:
fun['OFF_Punt: Total Yards'] = fun['OFF_Punt: Total Yards'] / fun['Games']

In [None]:
fun.rename(columns={'OFF_Punt: Total Yards': 'Punts_PG'}, inplace = True)

In [None]:
fun.head()

In [None]:
test7 = fun['OFF_FG: Good-Attempts'].str.split('-', expand=True)

In [None]:
test7 = test7.rename(columns={0:'FG_Good', 1:'FG_Attempted'})
test7.head()

In [None]:
test7['FG_Good'] = test7['FG_Good'].astype(int)
test7['FG_Attempted'] = test7['FG_Attempted'].astype(int)

In [None]:
test7['FG_Conv_Rate'] = test7['FG_Good'] / test7['FG_Attempted']
test7.head()

In [None]:
fun['OFF_FG: Good-Attempts'] = test7['FG_Conv_Rate']

In [None]:
fun.rename(columns={'OFF_FG: Good-Attempts':'FG_Conv_Rate'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['FG_Att_PG'] = test7['FG_Attempted']

In [117]:
fun['FG_Good'] = test7['FG_Good']

NameError: name 'test7' is not defined

In [None]:
fun['FG_Att_PG'] = fun['FG_Att_PG'] / fun['Games']

In [None]:
fun['FG_Good'] = fun['FG_Good'] / fun['Games']

In [None]:
fun.rename(columns={'FG_Good': 'FG_Good_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
test8 = fun['Total_Penalties-Yds'].str.split('-', expand=True)

In [None]:
test8 = test8.rename(columns={0:'Pen', 1:'Pen_Yds'})

In [None]:
test8.head()

In [None]:
test8['Pen'] = test8['Pen'].astype(int)

In [None]:
fun['Total_Penalties-Yds'] = test8['Pen']

In [None]:
fun['Total_Penalties-Yds'] = fun['Total_Penalties-Yds'] / fun['Games']

In [None]:
fun.rename(columns={'Total_Penalties-Yds': 'Penalties_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun = fun.drop(columns = 'OFF_index')

In [None]:
fun = fun.drop(columns = 'DEF_index')

In [None]:
# Still need to deal with:
# OFF_Fumbles-Lost
# DEF_Comp-Att
# DEF_Sacks-Yards Lost
# DEF_Kickoffs: Total
# DEF_Punt: Total
# DEF_INT: Total
# DEF_Punt: Total Yards
# DEF_FG: Good-Attempts
# DEF_Total_Penalties-Yds
# DEF_Fumbles-Lost

In [118]:
fun.head()

Unnamed: 0,SeasonID,PPG,Tot_TDs_PG,1st_Downs_PG,Rush_1st_Downs_PG,Pass_1st_Downs_PG,OFF_1st_by_pen_PG,3rd_Conv_Rate,4th_Conv_Rate,Pass_Comp_Rate,Pass_Yds_PG,Pass_Yds_Per_Attempt,Pass_Tds_PG,Off_Int_PG,Rush_Att_PG,Yds_Per_Rush,Rush_Yds_PG,Rush_Tds_PG,Off_Plays_PG,Tot_Yds_PG,Kickoffs_Returned_PG,Avg_K_Return_Yds,Punts_Returned_PG,Avg_P_Return_Yds,Int_Forced_PG,Avg_I_Return_Yds,Yds_Per_Punt,Punts_PG,FG_Conv_Rate,Touchback_Rate,Penalties_PG,Avg_Pen_Yds_PG,Avg_TOP,Fum_Lost_PG,Games,DEF_PPG_Against,DEF_Tot_Tds_PG_Against,DEF_1st_Downs_PG_Against,DEF_Rush_1st_Downs_PG_Against,DEF_Pass_1st_Downs_PG_Against,DEF_1st_by_pen_PG,DEF_3rd_Conv_Rate,DEF_4th_Conv_Rate,DEF_Pass_Comp_Rate,DEF_Pass_Yds_Per_Attempt,DEF_Pass_Yds_PG,DEF_Pass_Tds_PG,DEF_Int_PG,DEF_Rush_Att_PG,DEF_Yds_Per_Rush,DEF_Rush_Yds_PG,DEF_Rush_Tds_PG,DEF_Tot_Plays_PG,DEF_YPG_Against,DEF_Kickoffs_Returned_PG,DEF_Avg_K_Return_Yds,DEF_Punts_Returned_PG,DEF_Avg_P_Return_Yds,DEF_Avg_I_Return_Yds,DEF_Yds_Per_Punt_Against,DEF_Punts_PG,DEF_FG_Conv_Rate,DEF_Touchback_Rate,DEF_Penalties_PG,DEF_Avg_Pen_Yds_PG,DEF_Avg_TOP,DEF_Fum_Lost_PG,Sacks_Taken_PG,Sack_Yds_Lost_PG,FG_Att_PG,FG_Good_PG,Pass_Att_PG,DEF_Pass_Att_PG,DEF_Sacks_PG,DEF_Sack_Yds_PG,DEF_FG_Att_PG,DEF_FG_Good_PG
0,Offense--ari-2023,17.5,1.176471,11.705882,4.176471,6.294118,1.235294,35.29,27.27,0.624642,111.176471,6.0,0.529412,0.529412,17.058824,4.8,126.2,0.588235,39.294118,316.5,0.588235,18.0,1.235294,10.1,0.470588,10.5,42.1,2.529412,0.904762,0.85,4.352941,62,0.466833,0.352941,17,25.8,1.941176,14.411765,5.411765,7.647059,1.352941,46.76,36.36,0.696165,7.7,217.5,1.0,0.470588,20.235294,4.2,132.1,0.882353,42.058824,370.7,0.176471,21.0,1.764706,9.9,4.6,43.5,2.352941,0.857143,0.8,4.0,52.364,0.526333,0.294118,1.705882,12.0,1.235294,1.117647,20.529412,19.941176,1.882353,13.705882,1.235294,1.058824
1,Offense--atl-2023,18.9,1.058824,11.705882,4.411765,6.470588,0.823529,41.43,46.15,0.630435,120.529412,7.0,0.529412,0.411765,18.294118,4.2,130.4,0.529412,39.0,356.4,0.529412,16.7,0.882353,12.6,0.294118,6.4,42.4,2.588235,0.954545,0.83,3.176471,49,0.504833,0.529412,17,21.7,1.294118,10.705882,3.235294,6.294118,1.176471,34.38,50.0,0.628483,6.6,200.4,1.0,0.294118,16.235294,3.9,108.4,0.235294,36.470588,322.9,0.411765,21.9,1.117647,5.3,16.9,46.1,2.764706,1.0,0.83,3.941176,55.0,0.488333,0.294118,1.764706,12.411765,1.294118,1.235294,18.941176,19.0,1.235294,8.294118,1.235294,1.235294
2,Offense--bal-2023,27.6,2.058824,13.588235,5.764706,6.294118,1.529412,44.6,42.86,0.693548,136.823529,8.0,0.764706,0.294118,20.882353,4.8,155.1,1.235294,40.705882,380.5,0.470588,19.1,1.235294,13.4,0.588235,17.1,41.3,2.588235,0.826087,0.85,4.0,58,0.523333,0.529412,17,16.1,0.882353,11.352941,3.588235,6.235294,1.529412,35.19,36.84,0.599502,5.5,169.7,0.529412,0.588235,15.705882,4.3,103.7,0.235294,41.941176,305.3,0.529412,22.1,1.235294,13.0,10.8,41.1,3.235294,0.96,0.65,4.176471,50.455,0.4845,0.294118,1.588235,9.0,1.352941,1.117647,18.235294,23.647059,2.588235,20.588235,1.470588,1.411765
3,Offense--buf-2023,26.7,2.058824,14.235294,5.352941,7.882353,1.0,48.12,54.55,0.696335,164.647059,7.5,1.294118,0.705882,17.176471,4.4,117.7,0.705882,40.470588,379.1,0.588235,21.4,0.941176,11.9,0.588235,8.2,39.6,1.764706,0.842105,0.58,4.294118,56,0.505,0.411765,17,17.3,1.176471,12.0,3.294118,7.294118,1.411765,38.62,56.25,0.678771,6.9,204.2,0.764706,0.588235,15.764706,4.5,110.5,0.352941,39.117647,335.4,1.352941,20.4,0.764706,9.3,5.5,45.1,2.588235,0.947368,0.7,4.176471,50.182,0.489,0.529412,0.823529,4.470588,1.117647,0.941176,22.470588,21.058824,2.294118,13.411765,1.117647,1.058824
4,Offense--car-2023,16.3,0.941176,10.941176,3.470588,6.117647,1.352941,37.24,56.52,0.616402,102.588235,5.4,0.647059,0.470588,13.823529,3.9,92.3,0.117647,38.352941,297.4,0.705882,31.3,1.352941,6.6,0.294118,40.4,44.7,2.705882,0.85,0.63,4.117647,55,0.506,0.294118,17,27.5,1.941176,11.764706,4.588235,6.058824,1.117647,36.51,71.43,0.656463,6.5,179.2,0.764706,0.294118,17.764706,4.3,129.4,0.941176,36.058824,321.1,0.588235,24.7,1.058824,9.7,20.6,43.2,2.588235,1.0,0.79,3.470588,43.8,0.487167,0.117647,2.294118,18.058824,1.176471,1.0,22.235294,17.294118,1.0,7.352941,0.882353,0.882353


In [119]:
test9 = fun['OFF_Fumbles-Lost'].str.split('-', expand=True)

KeyError: 'OFF_Fumbles-Lost'

In [None]:
test9 = test9.rename(columns={0:'Tot_Fum', 1:'Fum_Lost'})
test9.head()

In [None]:
test9['Fum_Lost'] = test9['Fum_Lost'].astype(int)


In [None]:
fun['OFF_Fumbles-Lost'] = test9['Fum_Lost']

In [None]:
fun['OFF_Fumbles-Lost'] = fun['OFF_Fumbles-Lost'] / fun['Games']

In [None]:
fun.rename(columns={'OFF_Fumbles-Lost': 'Fum_Lost_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
test1.head()
# upon viewing defensive statistics, I decided that passing attempts per
# game was valuable information, so I am going back and adding it...

In [None]:
fun['Pass_Att_PG'] = test1['Attempts']

In [None]:
fun['Pass_Att_PG'] = fun['Pass_Att_PG'] / fun['Games']

In [None]:
fun.head()

In [None]:
test10 = fun['DEF_Comp-Att'].str.split('-', expand=True)

In [None]:
test10 = test10.rename(columns={0:'Completed', 1:'Attempts'})

In [None]:
test10.head()

In [None]:
test10['Completed'] = test10['Completed'].astype(int)
test10['Attempts'] = test10['Attempts'].astype(int)

In [None]:
fun['DEF_Comp-Att'] = test10['Completed'] / test10['Attempts']

In [None]:
fun.rename(columns={'DEF_Comp-Att':'DEF_Pass_Comp_Rate'}, inplace=True)

In [None]:
fun['DEF_Pass_Att_PG'] = test10['Attempts']

In [None]:
fun['DEF_Pass_Att_PG'] = fun['DEF_Pass_Att_PG'] / fun['Games']

In [None]:
fun.head()

In [None]:
test12 = fun['DEF_Sacks-Yards Lost'].str.split('-', expand=True)

In [None]:
test12 = test12.rename(columns={0:'Sacks_Taken', 1:'Sack_Yds_Lost'})

In [None]:
test12.head()

In [None]:
test12['Sacks_Taken'] = test12['Sacks_Taken'].astype(int)
test12['Sack_Yds_Lost'] = test12['Sack_Yds_Lost'].astype(int)

In [None]:
fun['DEF_Sacks_PG'] = test12['Sacks_Taken']
fun['DEF_Sack_Yds_PG'] = test12['Sack_Yds_Lost']

In [None]:
fun['DEF_Sacks_PG'] = fun['DEF_Sacks_PG'] / fun['Games']

In [None]:
fun['DEF_Sack_Yds_PG'] = fun['DEF_Sack_Yds_PG'] / fun['Games']

In [None]:
fun = fun.drop(columns='DEF_Sacks-Yards Lost')

In [None]:
# I realized that I forgot to divide the OFF columns by games, so lets
# do that now....

In [None]:
fun.head()

In [None]:
fun['Sacks_Taken'] = fun['Sacks_Taken'] / fun['Games']

In [None]:
fun['Sack_Yds_Lost'] = fun['Sack_Yds_Lost'] / fun['Games']

In [None]:
fun.rename(columns={'Sacks_Taken':'Sacks_Taken_PG'}, inplace=True)

In [None]:
fun.rename(columns={'Sack_Yds_Lost':'Sack_Yds_Lost_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
test13 = fun['DEF_Kickoffs: Total'].str.split('-', expand=True)

In [None]:
test13 = test13.rename(columns={0:'Kickoffs_Returned', 1:'Kickoff_Yds'})
test13.head()

In [None]:
test13['Kickoffs_Returned'] = test13['Kickoffs_Returned'].astype(int)
test13['Kickoff_Yds'] = test13['Kickoff_Yds'].astype(int)

In [None]:
fun['DEF_Kickoffs: Total'] = test13['Kickoffs_Returned']

In [None]:
fun['DEF_Kickoffs: Total'] = fun['DEF_Kickoffs: Total'] / fun['Games']

In [None]:
fun.rename(columns={'DEF_Kickoffs: Total':'DEF_Kickoffs_Returned_PG'}, inplace=True)

In [None]:
fun.tail()
# wow, interesting that are returning significantly more kickoffs now than 2004

In [None]:
test14 = fun['DEF_Punt: Total'].str.split('-', expand=True)

In [None]:
test14 = test14.rename(columns={0:'Punts_Returned', 1:'Punt_Yds'})

In [None]:
test14['Punts_Returned'] = test14['Punts_Returned'].astype(int)

In [None]:
fun['DEF_Punt: Total'] = test14['Punts_Returned']

In [None]:
fun['DEF_Punt: Total'] = fun['DEF_Punt: Total'] / fun['Games']

In [None]:
fun.rename(columns={'DEF_Punt: Total': 'DEF_Punts_Returned_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
# After naming issues, I found out these columns are redundant, so I'm just going to drop it
# test15 = fun['DEF_INT: Total'].str.split('-', expand=True)

In [None]:
# test15 = test15.rename(columns={0:'Int_Forced_PG', 1:'Nope'})

In [None]:
# test15.head()

In [None]:
# test15['Int_Forced_PG'] = test15['Int_Forced_PG'].astype(int)

In [None]:
# test15['Int_Forced_PG'].info()

In [None]:
# fun.head()

In [None]:
# test15.head()

In [None]:
# fun['DEF_INT: Total'] = test15['Int_Forced_PG']

In [None]:
# fun['Int_Forced_PG'].head()

In [None]:
# fun.rename(columns={'DEF_INT: Total': "DEF_Int_PG"}, inplace=True)

In [None]:
# fun.head()

In [None]:
# fun['DEF_Int_PG'] = fun['DEF_Int_PG'] / fun['Games']

In [None]:
fun.head()

In [None]:
fun.drop(columns='DEF_INT: Total')

In [None]:
test16 = fun['DEF_Punt: Total Yards'].str.split('-', expand=True)

In [None]:
test16 = test16.rename(columns={0:'APunts', 1:'AYds'})

In [None]:
test16['APunts'] = test16['APunts'].astype(int)

In [None]:
fun['DEF_Punt: Total Yards'] = test16['APunts']

In [120]:
fun['DEF_Punt: Total Yards'] = fun['DEF_Punt: Total Yards'] / fun['Games']

KeyError: 'DEF_Punt: Total Yards'

In [None]:
fun.rename(columns={'DEF_Punt: Total Yards': 'DEF_Punts_PG'}, inplace = True)

In [None]:
fun.head()

In [None]:
fun = fun.drop(columns='DEF_INT: Total')

In [None]:
fun.head()
# if you are reading this, this is around the part of the feature engineering where I lost my mind

In [None]:
test17 = fun['DEF_FG: Good-Attempts'].str.split('-', expand=True)

In [None]:
test17 = test17.rename(columns={0:'FG_Good', 1:'FG_Attempted'})
test17.head()

In [None]:
test17['FG_Good'] = test17['FG_Good'].astype(int)
test17['FG_Attempted'] = test17['FG_Attempted'].astype(int)

In [None]:
test17['FG_Conv_Rate'] = test17['FG_Good'] / test17['FG_Attempted']
test17.head()

In [None]:
fun['DEF_FG: Good-Attempts'] = test17['FG_Conv_Rate']

In [None]:
fun.rename(columns={'DEF_FG: Good-Attempts':'DEF_FG_Conv_Rate'}, inplace=True)

In [None]:
fun.head()

In [None]:
fun['DEF_FG_Att_PG'] = test17['FG_Attempted']

In [None]:
fun['DEF_FG_Good'] = test17['FG_Good']

In [None]:
fun['DEF_FG_Att_PG'] = fun['DEF_FG_Att_PG'] / fun['Games']

In [None]:
fun['DEF_FG_Good'] = fun['DEF_FG_Good'] / fun['Games']

In [None]:
fun.rename(columns={'DEF_FG_Good': 'DEF_FG_Good_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
test18 = fun['DEF_Total_Penalties-Yds'].str.split('-', expand=True)

In [None]:
test18 = test18.rename(columns={0:'Pen', 1:'Pen_Yds'})

In [None]:
test18['Pen'] = test18['Pen'].astype(int)

In [None]:
fun['DEF_Total_Penalties-Yds'] = test18['Pen']

In [None]:
fun['DEF_Total_Penalties-Yds'] = fun['DEF_Total_Penalties-Yds'] / fun['Games']

In [None]:
fun.rename(columns={'DEF_Total_Penalties-Yds': 'DEF_Penalties_PG'}, inplace=True)

In [None]:
fun.head()

In [None]:
test19 = fun['DEF_Fumbles-Lost'].str.split('-', expand=True)

In [None]:
test19 = test19.rename(columns={0:'Tot_Fum', 1:'Fum_Lost'})
test19.head()

In [None]:
test19['Fum_Lost'] = test19['Fum_Lost'].astype(int)

In [None]:
fun['DEF_Fumbles-Lost'] = test19['Fum_Lost']

In [None]:
fun['DEF_Fumbles-Lost'] = fun['DEF_Fumbles-Lost'] / fun['Games']

In [None]:
fun.rename(columns={'DEF_Fumbles-Lost': 'DEF_Fum_Lost_PG'}, inplace=True)

In [None]:
# omg am i done?
fun.head()

In [None]:
fun['Avg_TOP'] = fun['Avg_TOP'].replace(':','', regex=True)

In [None]:
fun['Avg_TOP'].info()

In [None]:
fun['Avg_TOP'] = fun['Avg_TOP'].astype(int)

In [None]:
fun['Avg_TOP'] = fun['Avg_TOP'] / 6000

In [None]:
fun['Avg_TOP'].head()

In [None]:
fun.head()

In [None]:
fun['DEF_Avg_TOP'] = fun['DEF_Avg_TOP'].replace(':','', regex=True)

In [None]:
fun['DEF_Avg_TOP'] = fun['DEF_Avg_TOP'].astype(int)

In [None]:
fun['DEF_Avg_TOP'] = fun['DEF_Avg_TOP'] / 6000

In [None]:
fun.head()

In [None]:
fun['Touchback_Rate'] = fun['Touchback_Rate'] / 100

In [None]:
fun['DEF_Touchback_Rate'] = fun['DEF_Touchback_Rate'] / 100

In [None]:
fun.head()

In [None]:
fun.shape

In [None]:
fun.info()

In [None]:
fun.isna().sum()

In [None]:
fun.head()

In [None]:
import pandas as pd
import numpy as np
from itertools import combinations
from itertools import permutations
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
fun.to_csv('In-Season_2023_Week13.csv', index=False)
# this is the CSV for EDA

In [None]:
fun = pd.read_csv('/Users/justintunley/Documents/BrainStation/Capstone/In-Season_2023_Week11.csv')
# THIS IS A DIFFERENT DATASET SO BE WARNED

In [None]:
yearList = ['2023']

In [None]:
# this is for modeling predictions
final_list = []

for year in yearList:
    temp_df = fun.loc[fun['SeasonID'].str.contains(year),:]

    combinations_list = list(permutations(temp_df.iterrows(), 2))

    pairs_data = []
    for pair in combinations_list:
        index1, row1 = pair[0]
        index2, row2 = pair[1]

        # Combine rows and add a distinction between sides
        pair_data = list(row1) + list(row2)
        #pair_data.extend([f'{col}_Team2' for col in y04.columns])

        pairs_data.append(pair_data)

    # Create column names for the new dataframe
    columns = [f'{col}_Team1' for col in temp_df.columns] + [f'{col}_Team2' for col in temp_df.columns]

    # Create the new dataframe
    pairs_df2 = pd.DataFrame(pairs_data, columns=columns)

    final_list.append(pairs_df2)

In [None]:
final_list

In [None]:
merged = pd.concat(final_list)
merged.head()

In [None]:
merged['season'] = 2023

In [None]:
merged.head()

In [None]:
merged.shape

In [None]:
merged.to_csv('Week13_modeling.csv', index=False)

This is the end of this notebook. There is additional EDA below but further modeling steps can be found in my modeling notebooks.

# EDA on Current-Season Data

In [None]:
EDA = pd.read_csv('/Users/justintunley/Documents/BrainStation/Capstone/In-Season_2023_Week11.csv')

In [None]:
EDA.head()

In [None]:
EDA['PPG'].describe()
# Teams score on average 22.36 points per game
# Means PT O/U will be around 45

In [None]:
EDA['DEF_PPG_Against'].describe()
# Shouldn't these be relatively similar?

In [None]:
EDA['PPG'].plot(kind='hist')
plt.title('Distribution of PPG')
plt.xlabel('Average PPG')
plt.ylabel('Count')
# Normal distribution, not clear skew

In [None]:
EDA.groupby('PPG')['Pass_Tds_PG'].value_counts(normalize=True).unstack().plot(kind='barh', stacked=True)

In [None]:
corr_mat = EDA.corr()
corr_mat
# Correlation matrix demonstrating how features interact with each other and the target variable

In [None]:
EDA['PPG'].describe()