## Online Lab: Pandas with Filtering II

In this online lab, we will use the NFL data set. The data is at "data\football_data".

We are going to review the following knowledge points:

1. Groupby with Transform
2. Groupby with Filtering
3. Merges

#### Ex1:  Read in the games and teams dataset

In [7]:
import pandas as pd

games = pd.read_csv("Data/Football_Data/nflgames.csv")
teams = pd.read_csv("Data/Football_Data/nflteams.csv")
teams.head()

Unnamed: 0,TeamID,TeamName,TeamCapsAbrv,TeamAbrv
0,1,Baltimore Ravens,RAV,rav
1,2,Denver Broncos,DEN,den
2,3,Oakland Raiders,RAI,rai
3,4,Philadelphia Eagles,PHI,phi
4,5,Dallas Cowboys,DAL,dal


In [8]:
games.head()

Unnamed: 0,GameID,Week,HomeTeamID,AwayTeamID,HomeScore,AwayScore,DayOfWeek,TimeOfDay,FieldType,Temp,Wind
0,1,1,1,29,16,23,Sun,Day,sportturf,74,8
1,2,2,1,28,26,6,Thu,Night,sportturf,82,6
2,3,3,27,1,21,23,Sun,Day,grass,71,23
3,4,4,1,15,38,10,Sun,Day,sportturf,78,4
4,5,5,7,1,20,13,Sun,Day,fieldturf,0,0


#### Ex2:  Compute the average homescores for all teams in this dataset

In [42]:
original_homescore = games["HomeScore"].mean()
original_homescore

24.973958333333332

#### Ex3:  Create a categorical variable called temp_cate, which is equal to high if temp > 80, middle if temp is in [65, 80] and low if temp < 65.

In [14]:
def temp_cate(temp):
    if temp > 80:
        return "high"
    if temp < 65:
        return "low"
    return "middle"

games["temp_cate"] = games["Temp"].apply(temp_cate)
games.head()

Unnamed: 0,GameID,Week,HomeTeamID,AwayTeamID,HomeScore,AwayScore,DayOfWeek,TimeOfDay,FieldType,Temp,Wind,temp_cate
0,1,1,1,29,16,23,Sun,Day,sportturf,74,8,middle
1,2,2,1,28,26,6,Thu,Night,sportturf,82,6,high
2,3,3,27,1,21,23,Sun,Day,grass,71,23,middle
3,4,4,1,15,38,10,Sun,Day,sportturf,78,4,middle
4,5,5,7,1,20,13,Sun,Day,fieldturf,0,0,low


#### Ex4:  Standardize A team's homescore based on the average and standard deviations of homescores of all games in that temperature region.

In [19]:
games["standardized_homescore"] = games.groupby(by="temp_cate")["HomeScore"]\
                                       .transform(lambda x: (x-x.mean())/x.std())
games.shape

(192, 13)

#### Ex5: Filter out the teams whose average standardized homescores are below 0

In [40]:
games_2 = games.groupby(by="HomeTeamID").filter(lambda x: x["standardized_homescore"].mean() > 0)
games.shape, games_2.shape

((192, 13), (84, 13))

#### Ex6:  Compute the average (non-standardized) homescores for all teams in this new dataset

In [43]:
original_homescore_2 = games_2["HomeScore"].mean()
original_homescore, original_homescore_2

(24.973958333333332, 30.666666666666668)

#### Ex7:  Create a dataset contains 2 columns: one is the team's average (non-standardized) homescores for the non-filtered teams, and the other is the team's abbreviation

In [51]:
results = games_2.groupby(by="HomeTeamID")\
       .apply(lambda x: pd.Series({"avg_homescore": x["HomeScore"].mean()}))\
       .reset_index()\
       .merge(teams, how="left", left_on="HomeTeamID", right_on="TeamID")\
       [["TeamAbrv", "avg_homescore"]]

results.head()

Unnamed: 0,TeamAbrv,avg_homescore
0,rav,27.166667
1,den,35.333333
2,phi,36.666667
3,clt,29.571429
4,sdg,25.666667
