## Practice

Now it’s your turn…

In the cell below, the variable `url` contains a web address to a csv
file containing the result of all NFL games from September 1920 to
February 2017.

Your task is to do the following:

- Use `pd.read_csv` to read this file into a DataFrame named `nfl`  
- Print the shape and column names of `nfl`  
- Save the DataFrame to a file named `nfl.xlsx`  
- Open the spreadsheet using Excel on your computer  


If you finish quickly, do some basic analysis of the data. Try to do
something interesting. If you get stuck, here are some suggestions for
what to try:

- Compute the average total points in each game (note, you will need to
  sum two of the columns to get total points).  
- Repeat the above calculation, but only for playoff games.  
- Compute the average score for your favorite team (you’ll need to
  consider when they were team1 vs team2).  
- Compute the ratio of “upsets” to total games played. An upset is
  defined as a team with a lower ELO winning the game.  

In [None]:
#Load and save xlsx file locally
import pandas as pd
import numpy as np
url = "https://raw.githubusercontent.com/fivethirtyeight/nfl-elo-game/"
url += "3488b7d0b46c5f6583679bc40fb3a42d729abd39/data/nfl_games.csv"

df = pd.read_csv(url, index_col=0)
df.shape
print(f"{df.shape} is the shape of the loaded DF with {df.columns.size} columns")

try:
    df.to_excel("nfl.xlsx")
    print("File successfully created")
except:
    print("Writing operation could not be completed")

(15740, 11) is the shape of the loaded DF with 11 columns


In [61]:
df

Unnamed: 0_level_0,season,neutral,playoff,team1,team2,elo1,elo2,elo_prob1,score1,score2,result1
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1920-09-26,1920,0,0,RII,STP,1503.947000,1300.000000,0.824651,48,0,1.0
1920-10-03,1920,0,0,AKR,WHE,1503.420000,1300.000000,0.824212,43,0,1.0
1920-10-03,1920,0,0,RCH,ABU,1503.420000,1300.000000,0.824212,10,0,1.0
1920-10-03,1920,0,0,DAY,COL,1493.002000,1504.908000,0.575819,14,0,1.0
1920-10-03,1920,0,0,RII,MUN,1516.108000,1478.004000,0.644171,45,0,1.0
...,...,...,...,...,...,...,...,...,...,...,...
2017-01-15,2016,0,1,DAL,GB,1617.794683,1635.451172,0.567714,31,34,0.0
2017-01-15,2016,0,1,KC,PIT,1681.926463,1647.734179,0.638993,16,18,0.0
2017-01-22,2016,0,1,ATL,GB,1664.127266,1651.537731,0.609840,44,21,1.0
2017-01-22,2016,0,1,NE,PIT,1747.160321,1662.437215,0.703052,36,17,1.0


In [71]:

#Compute average total points
total_pts = df['score1'].sum() + df['score2'].sum()
print(f"A. The average total points of an NFL game is {total_pts/df.shape[0]}")

#Playoff game total points, playoff value should be 1
po_table = df.loc[df['playoff'] == 1]
po_total = po_table['score1'].sum() + po_table['score2'].sum()
print(f"B. The average total points of an NFL playoff game is {po_total/po_table.shape[0]}")

#Average score for "favorite" team (chosen arbitrarily), must be in team1 OR team2
#Designate team here
fav_team = "RII"
#Separately calculate sums for when favorite team is team1 vs. team2
sum_team = df.loc[df['team1'] == fav_team]['score1'].sum() + df.loc[df['team2'] == fav_team]['score2'].sum()

#Tracking size of when favorite is team1 OR team2 separately for row count, not elegant but gets job done
team_table = df.loc[(df['team1'] == fav_team) | (df['team2'] == fav_team)]
print(f"C. The average score of team '{fav_team}' is {sum_team/team_table.shape[0]}")

#Upset ratio, Elo1 > Elo2 and result = 0 (team 2 won) and vice versa = upset
upset_table = df.loc[((df['elo1'] > df['elo2']) & (df['result1'] == 0)) | 
                     ((df['elo1'] < df['elo2']) & (df['result1'] == 1))]
print(f"D. The ratio of upsets to total games is {upset_table.shape[0]/df.shape[0]}")

A. The average total points of an NFL game is 39.95368487928844
B. The average total points of an NFL playoff game is 42.70404411764706
C. The average score of team 'RII' is 13.288461538461538
D. The ratio of upsets to total games is 0.34358322744599745


### Cleanup

If you want to remove the files we just created, run the following cell.

In [72]:
import os
def try_remove(file):
    if os.path.isfile(file):
        os.remove(file)

for df in ["nfl"]: #Add files to clean up here
    for extension in ["csv", "feather", "xlsx"]:
        filename = df + "." + extension
        try_remove(filename)