# Analyzing NHL Stats: Home Ice Advantage

Based on the hypothesis testing conducted, we examined whether home ice advantage significantly impacts the outcome of NHL games. By calculating the proportion of home wins and performing a z-test against the null hypothesis (no advantage, p = 0.5), we found that the observed z-value indicates whether the home ice advantage is statistically significant. If the z-value exceeds the critical threshold, we reject the null hypothesis, suggesting that playing at home does provide a meaningful advantage. Otherwise, we fail to reject the null, indicating insufficient evidence to support the claim. This analysis helps quantify the effect of home ice on game outcomes and informs further investigation into contributing factors.

In [2]:
import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns
import sklearn as sk



In [4]:
#Read information from csv datasets
games = pd.read_csv('./Data/game.csv')
teams = pd.read_csv('./Data/team_info.csv')

#Merging games and team_info tables to provide easier ideitifiers for each team
games = games.merge(teams[['team_id', 'shortName', 'teamName', 'abbreviation']], 
                    left_on='home_team_id', 
                    right_on='team_id', 
                    suffixes=('', '_home'))
games = games.drop(columns=['team_id'])
games = games.rename(columns={'shortName': 'home_city', 'teamName':'home_team', 'abbreviation':'home_abbr'})

games = games.merge(teams[['team_id', 'shortName', 'teamName', 'abbreviation']], 
                    left_on='away_team_id', 
                    right_on='team_id', 
                    suffixes=('', '_away'))
games = games.drop(columns=['team_id'])
games = games.rename(columns={'shortName': 'away_city', 'teamName':'away_team', 'abbreviation':'away_abbr'})

#removing unused columns
games = games.drop(columns=['venue_link', 'venue_time_zone_id', 'venue_time_zone_offset', 'venue_time_zone_tz', 'game_id'])

#removing columns that do not have a definitive win / loss (likely shootout)
games = games[~games['outcome'].str.contains('tbc', na=False)]

#Hypothesis testing
total_games = len(games)

total_home_wins = len(games[games['outcome'].str.contains('home win', na=False)])

p_hat = total_home_wins / total_games

p_not = 0.5

n = len(games)

# Calculate the z-value
z_value = (p_hat - p_not) / np.sqrt((p_not * (1 - p_not)) / n)

if(z_value > 1.645):
    print('Reject Null, home ice advantage is significant')
else:
    print('Fail to Reject null, not enough evidence to support alternate hypothesis')

Reject Null, home ice advantage is significant
