<H1 align="center"> 🏒 Project: NHL 🏒 </H1>

Data for this project can be found in [this](https://www.kaggle.com/datasets/martinellis/nhl-game-data) kaggle link. According to the author:

> The data represents all the official metrics measured for each game in the NHL (from 200 to 2019). I intend to update it semi-regularly depending on development progress of my database server.

We loaded the data into a SQLite databse using the simple [DB Browser for SQLite](https://sqlitebrowser.org/).

## Data preparation for Dash App

We'll set up some functions, datastructures and data for our interactive Dash App.

In [2]:
import sqlite3 as sql
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import base64
import pickle

In [3]:
import plotly.io as pio
pio.templates.default = "simple_white"

# Connecting to Database

In [4]:
con = sql.connect("nhl-data.db")
cur = con.cursor()

# Setting up lookup dictionaries

In [5]:
# team id to team name
query = """
SELECT
	team_id, shortName || ' ' || teamName AS name
FROM
	team_info
ORDER BY
	team_id ASC;
"""
df_teams = pd.read_sql(query, con)

In [6]:
df_teams

Unnamed: 0,team_id,name
0,1,New Jersey Devils
1,2,NY Islanders Islanders
2,3,NY Rangers Rangers
3,4,Philadelphia Flyers
4,5,Pittsburgh Penguins
5,6,Boston Bruins
6,7,Buffalo Sabres
7,8,Montreal Canadiens
8,9,Ottawa Senators
9,10,Toronto Maple Leafs


In [7]:
team_dict = dict(df_teams.values)

In [8]:
team_dict[20]

'Calgary Flames'

In [9]:
# SAVE THIS TO PICKLE FILE
with open('../nhl-dash/assets/team_dict.data', 'wb') as f:
    pickle.dump(team_dict, f)

In [10]:
# Setting up game lookup
query = """
SELECT
	game_id, home_team_id, away_team_id, venue,
	SUBSTR(season, 1, 4) AS season,
	SUBSTR(date_time_GMT, 1, 10) AS date
FROM
	game;
"""
df_games = pd.read_sql(query, con)


In [11]:
df_games.head()

Unnamed: 0,game_id,home_team_id,away_team_id,venue,season,date
0,2016020045,16,4,United Center,2016,2016-10-19
1,2017020812,7,24,KeyBank Center,2017,2018-02-07
2,2015020314,52,21,MTS Centre,2015,2015-11-24
3,2015020849,12,52,PNC Arena,2015,2016-02-17
4,2017020586,24,20,Honda Center,2017,2017-12-30


In [12]:
# SAVE THIS TO PICKLE FILE
with open('../nhl-dash/assets/df_games.data', 'wb') as f:
    pickle.dump(df_games, f)

# Plotting Goals Scored and Conceded

In [13]:
query = """
SELECT
	SUBSTR(game_plays.game_id, 1, 4) AS season,
	team_info.team_id, team_info.shortName, team_info.teamName,
	COUNT(game_plays.event) AS goals_conceded,
	game_plays.team_id_against
FROM
	game_plays
INNER JOIN
	team_info
		ON game_plays.team_id_against = team_info.team_id
WHERE
	event = 'Goal'
GROUP BY
	season, game_plays.team_id_against
ORDER BY
	season, goals_conceded DESC;
"""
df_teams_conceded = pd.read_sql(query, con)

In [14]:
df_teams_conceded.to_pickle("../nhl-dash/assets/df_teams_conceded.data")

In [15]:
query = """
SELECT
	SUBSTR(game_plays.game_id, 1, 4) AS season,
	team_info.team_id, team_info.shortName, team_info.teamName,
	COUNT(game_plays.event) AS number_of_goals,
	game_plays.team_id_for	
FROM
	game_plays
INNER JOIN
	team_info
		ON game_plays.team_id_for = team_info.team_id
WHERE
	event = 'Goal'
GROUP BY
	season, game_plays.team_id_for
ORDER BY
	season, number_of_goals DESC;
"""
df_teams_season = pd.read_sql(query, con)

In [16]:
df_teams_season.to_pickle("../nhl-dash/assets/df_teams_season.data")

In [17]:
def team_goals(team, df_pro=df_teams_season, df_con=df_teams_conceded):
    """Create plotly figure with line plots of goals scored and conceded
    for a given `team` across seasons"""

    fig = go.Figure()

    fig.add_scatter(x=df_pro[df_pro['teamName'] == team]['season'],
                    y=df_pro[df_pro['teamName'] == team]['number_of_goals'],
                    name='Scored',
                    line=dict(color="#0f3e66"))

    fig.add_scatter(x=df_con[df_con['teamName'] == team]['season'],
                    y=df_con[df_con['teamName'] == team]['goals_conceded'],
                    name='Conceded',
                    line=dict(color="#b53312"))

    fig.update_layout(title="Goals - " + team,
                      xaxis_title="Season",
                      yaxis_title="Goals")
    fig.update_layout(hovermode="x unified")
    fig.update_xaxes(tickangle=45)

    return fig

In [18]:
team_goals('Flames').show()

# Shot data

In [19]:
IMAGE_FILENAME1 = './images/NHL-rink-white.jpg'
image1 = base64.b64encode(open(IMAGE_FILENAME1, 'rb').read())

In [20]:
def shot_data(con, season, team_id, return_fig=True):

    query = """
    SELECT
        SUBSTR(game_id, 1, 4) AS season,
        game_id, team_id_for,
        event, secondaryType,
        st_x, st_y
    FROM
        game_plays
    WHERE
        event IN ('Goal', 'Shot', 'Missed Shot')
        AND
            (x <> 'NA' AND y <> 'NA')
        AND
            season = '{0}'
        AND
            team_id_for = {1};
    """
    
    df = pd.read_sql(query.format(season, team_id), con)
    
    df[["st_x", "st_y"]] = df[["st_x", "st_y"]].apply(pd.to_numeric)

    return df

In [21]:
def plot_heatmap(con, season, team_id, event):
    
    df = shot_data(con, season, team_id, return_fig=False)

    fig = px.density_heatmap(df.query(f"event == '{event}'"),
                             x="st_x",
                             y="st_y",
                             nbinsx=50,
                             nbinsy=50,
                             range_x=[-100, 100],
                             range_y=[-45, 45],
                             color_continuous_scale="Reds",
                             title=team_dict[team_id]+' '+str(season)+' '+event+'s'
                             )

    fig.update_traces(opacity=0.6)

    fig.add_layout_image(dict(source='data:image/jpg;base64,{}'.format(image1.decode()),
                              xref="x",
                              yref="y",
                              x=-100, y=42.5,
                              sizex=200,
                              sizey=85, 
                              sizing="stretch",
                              opacity=1,
                              layer="below"))

    fig.update_layout(template="simple_white")
    
    #legend
    fig.update_layout(showlegend=False)

    #x axis
    fig.update_xaxes(visible=False)

    #y axis    
    fig.update_yaxes(visible=False)
    
    return fig

In [22]:
fig = plot_heatmap(con, 2014, 20, 'Goal')
fig.show()

In [23]:
fig = plot_heatmap(con, 2014, 20, 'Missed Shot')
fig.show()

Testing if it is reasonable to store the entire table for these last plots:

In [73]:
query = """
SELECT
    SUBSTR(game_id, 1, 4) AS season,
    game_id, team_id_for, team_id_against,
    event, secondaryType,
    st_x, st_y
FROM
    game_plays
WHERE
    event IN ('Goal', 'Shot', 'Missed Shot')
    AND
        (x <> 'NA' AND y <> 'NA');
"""

shots_df = pd.read_sql(query, con)

shots_df[["st_x", "st_y"]] = shots_df[["st_x", "st_y"]].apply(pd.to_numeric)

In [74]:
shots_df.head()

Unnamed: 0,season,game_id,team_id_for,team_id_against,event,secondaryType,st_x,st_y
0,2016,2016020045,16,4,Shot,Wrist Shot,71,-9
1,2016,2016020045,16,4,Goal,Wrap-around,88,-5
2,2016,2016020045,4,16,Shot,Wrist Shot,56,-7
3,2016,2016020045,16,4,Shot,Slap Shot,37,24
4,2016,2016020045,4,16,Shot,Wrist Shot,57,-20


In [26]:
shots_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1062526 entries, 0 to 1062525
Data columns (total 7 columns):
 #   Column         Non-Null Count    Dtype 
---  ------         --------------    ----- 
 0   season         1062526 non-null  object
 1   game_id        1062526 non-null  int64 
 2   team_id_for    1062526 non-null  object
 3   event          1062526 non-null  object
 4   secondaryType  1062526 non-null  object
 5   st_x           1062526 non-null  int64 
 6   st_y           1062526 non-null  int64 
dtypes: int64(3), object(4)
memory usage: 56.7+ MB


In [75]:
shots_df.describe()

Unnamed: 0,game_id,st_x,st_y
count,1062526.0,1062526.0,1062526.0
mean,2014657000.0,58.8858,-0.1743393
std,2880373.0,21.92569,19.02316
min,2010020000.0,-99.0,-42.0
25%,2012021000.0,45.0,-14.0
50%,2015020000.0,62.0,0.0
75%,2017021000.0,76.0,14.0
max,2019041000.0,99.0,42.0


In [76]:
shots_df.to_pickle("nhl-dash/assets/shots_df.data")

Now, let's rewrite the heatmap function using this new df:

In [29]:
def plot_heatmap_from_df(season, team_id, event, df = shots_df):
    
    df = df.query(f"season == '{season}' and team_id_for == '{team_id}'")

    fig = px.density_heatmap(df.query(f"event == '{event}'"),
                             x="st_x",
                             y="st_y",
                             nbinsx=50,
                             nbinsy=50,
                             range_x=[-100, 100],
                             range_y=[-45, 45],
                             color_continuous_scale="Reds",
                             title=team_dict[team_id]+' '+str(season)+' '+event+'s'
                             )

    fig.update_traces(opacity=0.6)

    fig.add_layout_image(dict(source='data:image/jpg;base64,{}'.format(image1.decode()),
                              xref="x",
                              yref="y",
                              x=-100, y=42.5,
                              sizex=200,
                              sizey=85, 
                              sizing="stretch",
                              opacity=1,
                              layer="below"))

    fig.update_layout(template="simple_white")
    
    #legend
    fig.update_layout(showlegend=False)

    #x axis
    fig.update_xaxes(visible=False)

    #y axis    
    fig.update_yaxes(visible=False)
    
    return fig

In [30]:
fig = plot_heatmap_from_df(2014, 20, 'Goal')
fig.show()

## Shot data with position

In [31]:
def plot_shot_type(season, team_id, shot_type, game_id = None, df = shots_df):
    """
    Plots shot position with background rink (NHL official size).
    Arguments:
    - con: conncetion to nhl database (given in project folder or converted from kaggle dataset)
    - season: integer value of start year of season (currently available: 2000 to 2019)
    - team_id: ID of team as given by the table team_info
    - shot_type: secondary event type of shot events. 
        * Available: 'Wrist Shot', 'Slap Shot', 'Snap Shot', 'Backhand', 'Tip-In', 'Deflected', 'Wrap-around'.
    - game_id (optional): if None is given, plots the entire season. Otherwise, plots only shots for specific game_id.
    """

    df = df.query(f"season == '{season}' and team_id_for == '{team_id}'")
    
    if game_id:
        df = df.query(f"secondaryType == '{shot_type}' and game_id == {game_id}")
        title = team_dict[team_id] + ' ' + str(season) + ' Game ID: '+ str(game_id) + ' ' + shot_type +\
                f's <br><sup>{len(df)} shots</sup>'
    
    else:
        df = df.query(f"secondaryType == '{shot_type}'")
        title = team_dict[team_id] + ' ' + str(season) + ' ' + shot_type +\
                f's <br><sup>{len(df)} shots</sup>'

    number_of_shots = len(df)

    marker_size = 12
    marker_width = 1

    fig = px.scatter(
        df,
        x='st_x',
        y='st_y',
        color='event',
        symbol='event',
        range_x=[-100, 100],
        range_y=[-45, 45],
        title=title,
        color_discrete_map={  # replaces default color mapping by value
            "Goal": "DarkRed",
            "Shot": "LawnGreen"
        },
        symbol_map={  # replaces default symbol mapping by value
            "Shot": "x",
            "Goal": "circle"
        })

    fig.update_traces(marker=dict(size=marker_size,
                                  line=dict(width=marker_width,
                                            color='DarkSlateGrey')),
                      selector=dict(mode='markers'),
                      opacity=0.6)

    fig.add_layout_image(
        dict(source='data:image/jpg;base64,{}'.format(image1.decode()),
             xref="x",
             yref="y",
             x=-100,
             y=42.5,
             sizex=200,
             sizey=85,
             sizing="stretch",
             opacity=0.6,
             layer="below"))

    #x axis
    fig.update_xaxes(visible=False)

    #y axis
    fig.update_yaxes(visible=False)

    # Set templates
    fig.update_layout(template="plotly_white")

    return fig

In [32]:
fig = plot_shot_type(2014, 20, 'Slap Shot')
fig.show()

In [33]:
# Individual game
fig = plot_shot_type(season=2014, team_id=20, shot_type='Wrist Shot', game_id=2014020003)
fig.show()

In [34]:
# df_games.head()

In [35]:
# df_games.info()

# Dashboard structure

* User can pick a team to visualize:
    - Team goals using the `team_goals()` function. This needs the dataframes `df_teams_season`, and `df_teams_conceded`.

* User can pick a team, a season and an event to plot the corresponding heatmap on top of the rink figure using the `plot_heatmap_from_df()` function. This needs the dataframe `shots_df`. (**THE FIGURE NEEDS TO BE AJUSTED TO THE PLOT**).

* User can pick a team, a season and a shot type to use the `plot_shot_type()` and see:
    - Distribution of shots colored by shots or goals over the entire season;
    - Or, if given a game id (we can let user choose from date and location), distribution of shots colored by shots and goals of that single game.
    - These need the dataframe `shots_df`.

In [37]:
for team_id, team_name in team_dict.items():
    print(team_id, team_name)

1 New Jersey Devils
2 NY Islanders Islanders
3 NY Rangers Rangers
4 Philadelphia Flyers
5 Pittsburgh Penguins
6 Boston Bruins
7 Buffalo Sabres
8 Montreal Canadiens
9 Ottawa Senators
10 Toronto Maple Leafs
11 Atlanta Thrashers
12 Carolina Hurricanes
13 Florida Panthers
14 Tampa Bay Lightning
15 Washington Capitals
16 Chicago Blackhawks
17 Detroit Red Wings
18 Nashville Predators
19 St Louis Blues
20 Calgary Flames
21 Colorado Avalanche
22 Edmonton Oilers
23 Vancouver Canucks
24 Anaheim Ducks
25 Dallas Stars
26 Los Angeles Kings
27 Phoenix Coyotes
28 San Jose Sharks
29 Columbus Blue Jackets
30 Minnesota Wild
52 Winnipeg Jets
53 Arizona Coyotes
54 Vegas Golden Knights
