# Port Performance Dashboard App

This notebook developes the visulizations and Dash App for the [Port Performance Project](https://github.com/epistemetrica/Port-Performance-Project). 

Basic draft of app:
- National-level visualizations are displayed by default
    - users can select the date range via sliders
- Users may select a given port
    - drop down selection to start
    - hopefully clicking on a port from the national map could serve the same UI function
- For a selected port:
    - default visualizations are presented
    - users have option to generate custom visualizations based on their selection of date range, metric (hrs at berth, vessel size, etc), stat (mean, median, etc), and other options to be developed. 

In [3]:
#prelims
import numpy as np
import pandas as pd
import geopandas as gpd
import polars as pl
import plotly.express as px
import datetime as dt
from dateutil.relativedelta import relativedelta
import dash
from dash import dcc, html
from dash.dependencies import Input, Output

#display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pl.Config(tbl_rows=100);

## Load Data

The data used in the app is a dataframe with reach row corresponding to a port call, including data such as port, dock, and vessel info, time of arrival, hours at berth and at anchor, time in port waters, etc. 

NOTE this data is currently processed in the geodata_prep notebook followed by the port_stats notebook; final versions of this project may distill that down to a single data preparation step. 

In [4]:
#create main dataframe
calls_df = (
    #read in data
    pl.read_parquet('calls.parquet')
    #get year month and date from arrival time
    .with_columns(
        pl.col('time_arrival').dt.year().alias('year'),
        pl.col('time_arrival').dt.date().dt.month_start().alias('month'),
        pl.col('time_arrival').dt.date().alias('date')
    )
)
#inspect data
calls_df.describe()

statistic,call_id,port_name,port_lat,port_lon,dock_name,dock_id,facility_type,dock_lat,dock_lon,imo,vessel_size,time_port_entry,time_arrival,time_departure,time_port_exit,hrs_at_berth,hrs_at_anchor,hrs_to_dock,hrs_in_port_after_dock,hrs_in_port_waters,year,month,date
str,str,str,f64,f64,str,str,str,f64,f64,f64,f64,str,str,str,str,f64,f64,f64,f64,f64,f64,str,str
"""count""","""152339""","""152339""",152339.0,152339.0,"""152339""","""152339""","""151966""",152339.0,152339.0,152339.0,152339.0,"""152339""","""152339""","""152339""","""152339""",152339.0,152339.0,152339.0,152339.0,152339.0,152339.0,"""152339""","""152339"""
"""null_count""","""0""","""0""",0.0,0.0,"""0""","""0""","""373""",0.0,0.0,0.0,0.0,"""0""","""0""","""0""","""0""",0.0,0.0,0.0,0.0,0.0,0.0,"""0""","""0"""
"""mean""",,,32.314243,-94.595966,,,,32.314355,-94.594993,10138000.0,207.282114,"""2021-07-25 12:09:45.520457""","""2021-07-25 23:42:49.418494""","""2021-07-28 04:42:50.440307""","""2021-07-28 09:58:48.028312""",47.902378,8.398775,11.543167,5.258262,69.809185,2021.068459,"""2021-07-10 17:16:44.551000""","""2021-07-25 11:11:14.325000"""
"""std""",,,6.905596,20.709946,,,,6.905846,20.710309,26423000.0,58.862874,,,,,62.483046,28.999441,35.440416,41.451654,96.098973,2.030807,,
"""min""","""0_Corpus Christi, TX_2020-04-0…","""Albany Port District, NY""",17.938939,-166.549916,"""ADM Corpus Christi Grain Eleva…","""00XE""","""Anchorage""",17.936081,-166.53444,0.0,101.0,"""2018-01-01 00:35:19""","""2018-01-01 00:35:19""","""2018-01-01 04:23:54""","""2018-01-01 09:15:57""",0.083333,0.0,0.0,0.0,0.133333,2018.0,"""2018-01-01""","""2018-01-01"""
"""25%""",,,28.629389,-118.2095,,,,28.645767,-118.21083,9298636.0,176.0,"""2019-09-23 05:36:19""","""2019-09-23 14:53:58""","""2019-09-25 11:07:36""","""2019-09-25 15:04:34""",16.616667,0.0,2.666667,2.15,25.866667,2019.0,"""2019-09-01""","""2019-09-23"""
"""50%""",,,30.69123,-90.085256,,,,30.706768,-90.112537,9403451.0,190.0,"""2021-09-06 01:03:18""","""2021-09-06 20:50:23""","""2021-09-09 01:06:40""","""2021-09-09 05:20:12""",31.116667,0.0,3.516667,2.783333,43.933333,2021.0,"""2021-09-01""","""2021-09-06"""
"""75%""",,,36.86642,-80.05267,,,,36.875896,-80.053495,9619426.0,230.0,"""2023-05-07 10:13:53""","""2023-05-07 22:10:37""","""2023-05-10 09:57:31""","""2023-05-10 13:19:05""",57.183333,0.0,5.383333,3.533333,80.966667,2023.0,"""2023-05-01""","""2023-05-07"""
"""max""","""9993808_Honolulu, O'ahu, HI_20…","""Wilmington, NC""",61.23778,-66.096678,"""YUSEN TERMINALS BERTHS 212-221""","""1JHK""","""Tie Off""",61.24306,-66.086926,980002500.0,667.0,"""2024-12-31 18:55:48""","""2024-12-31 22:30:59""","""2024-12-31 23:37:00""","""2024-12-31 23:37:00""",1398.416667,244.45,2152.533333,3653.166667,4453.8,2024.0,"""2024-12-01""","""2024-12-31"""


#### Init variables

In [5]:
#get date bounds
earliest_date = calls_df['time_arrival'].min().date()
latest_date = calls_df['time_arrival'].max().date()

## Visualization Functions

For the dashboard, we define a handful of visualization types (e.g. scatter map, line plot, bar chart, etc) and code to allow both default visualizations as well as custom user-generated visualizations. 

In [6]:
#define zoom level function for plotly express scatter_mapbox
def mapbox_zoom_finder(lons, lats, lon_pad=0, lat_pad=0):
    """
    Calculates the optimal zoom level for a Plotly Mapbox plot.
    Args:
        lons (list): List of longitudes.
        lats (list): List of latitudes.
        lon_pad (float, optional): Padding to add to the longitude range. Defaults to 0.
        lat_pad (float, optional): Padding to add to the latitude range. Defaults to 0.
    Returns:
        zoom (int): the calculated zoom level
    """
    # Check if the lengths of lons and lats are equal and not empty
    if len(lons) != len(lats) or len(lons) == 0:
        return 10
    # Calculate the maximum and minimum longitude and latitude
    max_lon, min_lon = max(lons), min(lons)
    max_lat, min_lat = max(lats), min(lats)
    # Calculate the longitude and latitude ranges
    lon_range = max_lon - min_lon
    lat_range = max_lat - min_lat
    # Calculate the zoom level based on the ranges
    zoom = 7 - np.log2(max(lon_range + lon_pad, lat_range + lat_pad))
    return zoom

In [7]:
def plot_mapbox(df, cat_group,  lat_col, lon_col, 
                size_col, size_col_alias, color_col, color_col_alias, 
                title, filter_col=None, filter=None, time_col='date', 
                time_range=[earliest_date, latest_date], 
                zoom=None, center=None, width=800, height=600, 
                size_max=30, range_color=None, hover_name=None, hover_data=None, 
                mapbox_style='carto-positron', labels=None, 
                color_continuous_scale=None, color_outlier_z=None):
    """
    Plots a Mapbox scatter plot using Plotly.
    """
    #init df
    df = df
    #filter if specified
    if filter_col:
        df = df.filter(pl.col(filter_col).is_in(filter))
    #generate df
    df = (
        df
        #filter by time range
        .filter(pl.col(time_col).is_between(time_range[0], time_range[1]))
        .group_by(cat_group)
        .agg(
            #get the lat and lon columns
            pl.col(lat_col).first().alias(lat_col),
            pl.col(lon_col).first().alias(lon_col),
            #get hover name
            #may be different or same as cat_group
            #get stats
            pl.col(size_col).mean().alias(size_col_alias),
            pl.col(color_col).mean().alias(color_col_alias),
        )
    )

    #Set default color scale if not provided
    if not color_continuous_scale:
        color_continuous_scale = px.colors.sequential.Viridis

    # Set the zoom level automatically if not provided
    if not zoom:
        zoom = mapbox_zoom_finder(df[lon_col], df[lat_col])

    #Set the center of the map if not provided
    if not center:
        center = {
            'lat': ((df[lat_col].max() - df[lat_col].min()) / 2),
            'lon': ((df[lon_col].max() - df[lon_col].min()) / 2)
        }

    #drop outliers if specified
    if color_outlier_z:
        #get color_col upper and lower limits based on z score
        color_col_mean, color_col_std = df[color_col_alias].mean(), df[color_col_alias].std()
        color_col_upper = color_col_mean + (color_col_std * color_outlier_z)
        color_col_lower = color_col_mean - (color_col_std * color_outlier_z)
        #set range color
        range_color = [color_col_lower, color_col_upper]

    # Create a scatter mapbox figure
    fig = px.scatter_mapbox(
        #data
        df, lat=lat_col, lon=lon_col,
        #categories
        size=size_col_alias, color=color_col_alias,
        #hover info
        hover_name=hover_name, hover_data=hover_data,
        #display settings
        range_color=range_color, size_max=size_max,
        color_continuous_scale=color_continuous_scale, mapbox_style=mapbox_style,
        width=width, height=height,
        #title and labals
        title=title, labels=labels
    )
    # Set the zoom level
    fig.update_layout(mapbox_zoom=zoom)

    #NOTE Add annotation if specified

    return fig

In [8]:
def plot_line(df, cat_group, time_col, y_col, y_col_alias, title,
              filter_col=None, filter=None, cat_limit=None, cat_limit_col=None, 
              time_range=[earliest_date, latest_date], 
              width=800, height=600, hover_name=None, 
              hover_data=None, labels=None, color_continuous_scale=None, 
              highlight=None, highlight_color=None):
    """
    Plots a line chart using Plotly Express.
    """
    #initialize df
    df = (
        df
        #filter by time range
        .filter(pl.col(time_col).is_between(time_range[0], time_range[1]))
    )
    #filter if specified
    if filter_col:
        df = df.filter(pl.col(filter_col).is_in(filter))
    #limit categories if specified
    if cat_limit:
        #get the top n categories
        top_cats = (
            df.group_by(cat_group)
            .agg(pl.col(cat_limit_col).sum())
            .sort(pl.col(cat_limit_col), descending=True)
            .limit(cat_limit)
            .to_series()
        )
        #filter to only top n categories or highlight category
        df = df.filter(pl.col(cat_group)
                       .is_in(top_cats.append(pl.Series([highlight]))))
    #generate df
    df = (
        df
        .group_by(cat_group, time_col)
        .agg(
            #compute y col mean
            pl.col(y_col).mean().alias(y_col_alias)
        )
        .sort(time_col)
    )

    # Set default color scale if not provided
    if not color_continuous_scale:
        color_continuous_scale = px.colors.sequential.Viridis

    # Create a line figure
    fig = px.line(
        df,
        x=time_col,
        y=y_col_alias,
        color=cat_group,
        title=title,
        labels=labels,
        hover_name=hover_name,
        hover_data=hover_data,
    )
    # Set the width and height of the figure
    fig.update_layout(width=width, height=height)

    #highlight given lines if specified
    if highlight:
        fig.update_traces(line_color='lightgray')
        fig.update_traces(patch={'line': {'color': highlight_color}}, 
                          selector=dict(name=highlight))

    return fig

In [9]:
def bar_ranking(df, cat_group, stat_col, stat_alias, title, limit=20, 
                filter_col=None, filter=None, 
                time_col='month', time_range=[earliest_date, latest_date],
                labels=None, width=800, height=600, 
                highlight=None):
    '''
    '''
    #initialize df
    df = (
        df
        #filter by time range
        .filter(pl.col(time_col).is_between(time_range[0], time_range[1]))
    )
    #filter if specified
    if filter_col:
        df = df.filter(pl.col(filter_col).is_in(filter))
    #get top n categories
    top_cats = (
        df
        .group_by(cat_group)
        .agg(pl.col(stat_col).mean().alias(stat_alias))
        .sort(pl.col(stat_alias), descending=True)
        .limit(limit)
        .to_series()
    )
    #create df
    df = (
        df
        .filter(pl.col(cat_group).is_in(top_cats.append(pl.Series([highlight]))))
        .group_by(cat_group)
        .agg(pl.col(stat_col).mean().alias(stat_alias))
        .sort(pl.col(stat_alias))
    )
    # Create a bar figure
    fig = px.bar(
        df,
        x=stat_alias, y=cat_group,
        title=title, labels=labels,
        width=width, height=height,
    )

    #set highlight if specified
    if highlight:
        fig["data"][0]["marker"]["color"] = (
            [fig["data"][0]["marker"]['color'] if c == highlight 
             else "lightgrey" for c in fig["data"][0]["y"]]
        )
    
    return fig


## Static Visualizations

In [10]:
plot_mapbox(df=calls_df, cat_group='port_name', lat_col='port_lat', 
            lon_col='port_lon', size_col='vessel_size', size_col_alias='Mean Vessel Size',
            color_col='hrs_at_berth', color_col_alias='Mean Hours at Berth',
            zoom=2.1, color_outlier_z=1, #size_max=15,
            title='Vessel Size vs. Hours at Berth at Principal Ports', hover_name='port_name')

In [11]:
plot_mapbox(df=calls_df, cat_group='dock_name', filter_col='port_name', 
            filter=['Seattle, WA'],
            lat_col='dock_lat', lon_col='dock_lon', size_col='hrs_at_berth', size_col_alias='Mean Hours at Berth',
            color_col='hrs_at_anchor', color_col_alias='Mean Hours at Anchor',
            size_max=20, color_outlier_z=2,
            title='Hours at Berth vs. Hours at Anchor', hover_name='dock_name')

In [12]:
plot_line(df=calls_df, cat_group='port_name', time_col='month',
          y_col='call_id', y_col_alias='Visits per Month',
          title='Call volume at Principal Ports', width=1200, height=600,
          labels={'port_name':'Port', 'month':''},
          cat_limit=10, cat_limit_col='hrs_at_berth', 
          highlight='Seattle, WA', highlight_color='blue')

In [13]:
plot_line(df=calls_df, filter_col='port_name', filter=['Port of Los Angeles, CA'], cat_group='dock_name', time_col='month',
          y_col='vessel_size', y_col_alias='Mean Vessel Length (m)',
          title='Mean Vessel Size', width=1200, height=600,
          labels={'dock_name':'Dock', 'month':''})

In [14]:
bar_ranking(df=calls_df, cat_group='dock_name', filter_col='port_name', filter=['Port of Los Angeles, CA', 'Port of Long Beach, CA'], 
            stat_col='hrs_at_berth',
           stat_alias='Mean Hours at Berth', title='Mean Hours at Berth for Top 10 San Pedro Bay Docks', 
           limit=10, width=800, height=500)

In [15]:
bar_ranking(df=calls_df, cat_group='port_name', stat_col='hrs_in_port_waters',
           stat_alias='Hrs in Port Waters', title='Average Time in Port Waters', 
           labels={'dock_name':'Dock'}, highlight='Seattle, WA',
           width=800, height=500)

In [16]:
#get top ports
top_ports = (
    calls_df
    .group_by('port_name')
    .agg(
        (pl.col('vessel_size')/pl.col('hrs_at_berth')).mean().alias('Efficiency Score')
    )
    .sort(pl.col('Efficiency Score'), descending=True)
    .limit(20)
    .to_series()
)

#define bar chart function
fig = px.bar(
    data_frame=(
        calls_df
        .filter(pl.col('port_name').is_in(top_ports))
        .group_by('port_name')
        .agg(
            (pl.col('vessel_size')/pl.col('hrs_at_berth')).mean().alias('Efficiency Score')
        )
        .sort(pl.col('Efficiency Score'))
    ),
    labels={'port_name':'Port'},
    x='Efficiency Score', y='port_name', title='Efficiency Rankings',
    width=800, height=600,
)

#annotate
fig.add_annotation(
    text="Vessel-Feet per Hour at Berth",
    xref="paper", yref="paper",
    x=0.5, y=-0.15,
    showarrow=False,
    font=dict(size=12),
    align="center"
)

fig["data"][0]["marker"]["color"] = [fig["data"][0]["marker"]['color'] if 
                                     c == 'Wilmington, NC' else 
                                     "lightgrey" for c in fig["data"][0]["y"]]

fig.show()


## Dash App

In [17]:
#metric dropdown options
metric_options = [
    {'label': 'Hours at Berth', 'value': 'hrs_at_berth'},
    {'label': 'Hours at Anchor', 'value': 'hrs_at_anchor'},
    {'label': 'Hours in Port Waters', 'value': 'hrs_in_port_waters'},
    {'label': 'Vessel Size', 'value': 'vessel_size'}
]

[option['label'] for option in metric_options if 
                           option['value'] == 'vessel_size'][0]

'Vessel Size'

In [18]:
#init app
app = dash.Dash(__name__)

#metric dropdown options
metric_options = [
    {'label': 'Hours at Berth', 'value': 'hrs_at_berth'},
    {'label': 'Hours at Anchor', 'value': 'hrs_at_anchor'},
    {'label': 'Hours in Port Waters', 'value': 'hrs_in_port_waters'},
    {'label': 'Vessel Size', 'value': 'vessel_size'}
]

#layout
app.layout = (
    html.Div([
        html.H1('Port Performance Dashboard v0.2'),
        html.H2('https://github.com/epistemetrica/Port-Performance-Project'),
        html.Div([    
            #NOTE prefer date slider but need to dev slider with date strings
            dcc.DatePickerRange(
                id='date-picker-range',
                #allowed dates
                min_date_allowed=earliest_date, max_date_allowed=latest_date,
                #initial start and end dates
                start_date=earliest_date, end_date=latest_date,
                #display setting
                display_format='YYYY-MM-DD'
            )
        ]),
        html.Div([
            #dropdown for port selection
            dcc.Dropdown(
                id='port-dropdown',
                options=[{'label': port, 'value': port} for 
                         port in calls_df['port_name'].unique()],
                value=['Seattle, WA'], #default to view Seattle 
                multi=True,
                placeholder='Select Ports',
                style={'display':'inline-block','width':'50%'}
            ),
            #dropdown for primary metric selection
            dcc.Dropdown(
                id='metric-dropdown-primary',
                options=metric_options,
                value='hrs_at_berth',
                placeholder='Select Primary Metric',
                style={'display':'inline-block','width':'25%'}
            ),
            #dropdown for secondary metric selection
            dcc.Dropdown(
                id='metric-dropdown-secondary',
                options=metric_options,
                value='vessel_size', 
                placeholder='Select Secondary Metric',
                style={'display':'inline-block','width':'25%'}
            ),
        ]),
        html.Div([
            dcc.Graph(id='mapbox-graph_national', 
                      style={'display':'inline-block', 'width':'50%'}),
            dcc.Graph(id='mapbox-graph_port', 
                      style={'display':'inline-block', 'width':'50%'})
        ]),
        html.Div([
            dcc.Graph(id='line-graph', 
                      style={'display':'inline-block', 'width':'50%'}),
            dcc.Graph(id='bar-graph', 
                      style={'display':'inline-block', 'width':'50%'})
        ])
    ])
)
#decorators
@app.callback(
    Output('mapbox-graph_national', 'figure'),
    Output('mapbox-graph_port', 'figure'),
    Output('line-graph', 'figure'),
    Output('bar-graph', 'figure'),
    Input('date-picker-range', 'start_date'),
    Input('date-picker-range', 'end_date'),
    Input('port-dropdown', 'value'),
    Input('metric-dropdown-primary', 'value'),
    Input('metric-dropdown-secondary', 'value')
)
def update_graphs(start_date, end_date, selected_ports, 
                  primary_metric, secondary_metric):
    #convert to datetime
    start_date = dt.datetime.strptime(start_date, '%Y-%m-%d').date()
    end_date = dt.datetime.strptime(end_date, '%Y-%m-%d').date()
    #get metric labels
    primary_alias = [option['label'] for option in metric_options if 
                           option['value'] == primary_metric][0]
    secondary_alias = [option['label'] for option in metric_options if 
                           option['value'] == secondary_metric][0]
    #
    #make national ports map
    mapbox_national = plot_mapbox(
        df=calls_df, cat_group='port_name', lat_col='port_lat', lon_col='port_lon',
        size_col=secondary_metric, 
        size_col_alias=secondary_alias,
        color_col=primary_metric, 
        color_col_alias=primary_alias,
        color_outlier_z=1, width=None, height=None, size_max=20,
        title=f'{primary_alias} vs. {secondary_alias} at Principal Ports', 
        hover_name='port_name',
        time_range=[start_date, end_date]
    )
    #make port map
    mapbox_port = plot_mapbox(
        df=calls_df, cat_group='dock_name', filter_col='port_name', filter=selected_ports,
        lat_col='dock_lat', lon_col='dock_lon', size_col='hrs_at_berth', size_col_alias='Mean Hours at Berth',
        color_col='hrs_at_anchor', color_col_alias='Mean Hours at Anchor',
        size_max=20, color_outlier_z=3, width=None, height=None,
        title='Hours at Berth vs. Hours at Anchor', hover_name='dock_name',
        time_range=[start_date, end_date]
    )
    #line graph with chosen metric
    line_graph = plot_line(
        df=calls_df, cat_group='port_name', time_col='month',
        y_col='hrs_at_berth', y_col_alias='Average Hours at Berth',
        title='Hours at Berth per call at Principal Ports', width=None, height=None,
        labels={'port_name':'Port', 'month':''},
        cat_limit=10, cat_limit_col='hrs_at_berth',
        highlight=selected_ports[0], highlight_color='blue',
        time_range=[start_date, end_date]
    )
    #bar graph with chosen metric
    bar_graph = bar_ranking(
        df=calls_df, cat_group='dock_name', filter_col='port_name', filter=selected_ports,
        stat_col='hrs_at_berth', stat_alias='Mean Hours at Berth', 
        title=f'Mean Hours at Berth for {selected_ports[0]} Docks',
        limit=10, width=None, height=None,
        time_range=[start_date, end_date]
    )
    #return figures
    return mapbox_national, mapbox_port, line_graph, bar_graph


#run - NOTE dev only; delete for production
if __name__ == '__main__':
    app.run_server(jupyter_mode='external')

Dash app running on http://127.0.0.1:8050/


## Run App

In [None]:
%%script echo skip

#run
if __name__ == '__main__':
    app.run_server()

skip
