# Building Value-driven Dashboards in Python

In this notebook, we will go through the process of creating a dashboard written in Python, for exploring the City of Melbourne's [Pedestrian Traffic Hourly Counts Dataset](https://data.melbourne.vic.gov.au/Transport/Pedestrian-Counting-System-2009-to-Present-counts-/b2ak-trbp).

We will primarily be using the following Python libraries/tools:
* [JupyterLab](https://jupyterlab.readthedocs.io/): An interactive development environment that is well suited for exploratory data analysis. 
* [Dash](https://plotly.com/dash/): a framework for building analytic web-apps in Python.
* [Pandas](https://pandas.pydata.org): a library for analysing and processing tabular data.

See the instructions in the README at the base of this repository on how to setup the environment you'll need for working through this notebook.

_Note:_ You should also be able to use the original Jupyter Notebook (as opposed to JupyterLab) to complete this workshop, however I would recommend using JupyterLab, as this is what I used while developing the dashboard, and more generally because it offers a superior experience.

## Goals and Motivation

The goals of this notebook are as follows:

* Understand the types of options available when constructing dashboards and the kinds of contexts when choosing Python might be a good choice.
* Become familiar with the steps involved in the end-to-end process of producing a Python-based dashboard
* Help you become familiar with components of a tech stack that is well suited for this task.
* Answer some questions about pedestrian traffic patterns in Melbourne's CBD. 

_Note:_ There is not one single best-practice methodology for developing dashboards in Python. This is my attempt to distil some useful strategies and processes from what I've learnt from my experiences. 

## The Dataset
The dataset we'll be using is the Melbourne City Council's Pedestrian Counting System dataset, which is part of the council's [Open Data Portal](https://data.melbourne.vic.gov.au):

_This dataset contains hourly pedestrian counts since 2009 from pedestrian sensor devices located across the city. The data is updated on a monthly basis and can be used to determine variations in pedestrian activity throughout the day._

The data that we will be using comes from two separate datasets:

1. [The Pedestrian Counting System dataset](https://data.melbourne.vic.gov.au/Transport/Pedestrian-Counting-System-2009-to-Present-counts-/b2ak-trbp), which contains the hourly traffic data.
2. [Pedestrian Sensor Locations](https://data.melbourne.vic.gov.au/Transport/Pedestrian-Counting-System-Sensor-Locations/h57g-5234) dataset, which contains data about the sensors collecting the above data. 

In [9]:
from pathlib import Path
import pandas as pd

# change this if needed
data_path = Path("..") / "data"

sensor_csv_path = data_path / "Pedestrian_Counting_System_-_Sensor_Locations.csv"
counts_csv_path = data_path / "Pedestrian_Counting_System___2009_to_Present__counts_per_hour_.csv"

sensors_df = pd.read_csv(sensor_csv_path)
counts_df = pd.read_csv(counts_csv_path)

The sensor dataset contains a range of information regarding each sensor. We'll just be using the geographical cordinates

In [12]:
sensors_df.head()

Unnamed: 0,sensor_id,sensor_description,sensor_name,installation_date,status,note,direction_1,direction_2,latitude,longitude,location
0,59,Building 80 RMIT,RMIT_T,2019/02/13,A,,North,South,-37.808256,144.963049,"(-37.80825648, 144.96304859)"
1,23,Spencer St-Collins St (South),Col623_T,2013/09/02,A,,East,West,-37.819093,144.954527,"(-37.81909256, 144.95452749)"
2,20,Chinatown-Lt Bourke St (South),LtB170_T,2013/09/06,A,,East,West,-37.811729,144.968247,"(-37.81172913, 144.9682466)"
3,34,Flinders St-Spark La,Fli32_T,2014/06/08,A,,East,West,-37.81538,144.97415,"(-37.81537985, 144.9741505)"
4,57,Bourke St Bridge,BouBri_T,2018/08/13,A,,West,East,-37.817673,144.950256,"(-37.8176735, 144.95025595)"


In [15]:
# Number of sensors we have data on
len(sensors_df)

66

In [13]:
counts_df.head()

Unnamed: 0,ID,Date_Time,Year,Month,Mdate,Day,Time,Sensor_ID,Sensor_Name,Hourly_Counts
0,2887628,11/01/2019 05:00:00 PM,2019,November,1,Friday,17,34,Flinders St-Spark La,300
1,2887629,11/01/2019 05:00:00 PM,2019,November,1,Friday,17,39,Alfred Place,604
2,2887630,11/01/2019 05:00:00 PM,2019,November,1,Friday,17,37,Lygon St (East),216
3,2887631,11/01/2019 05:00:00 PM,2019,November,1,Friday,17,40,Lonsdale St-Spring St (West),627
4,2887632,11/01/2019 05:00:00 PM,2019,November,1,Friday,17,36,Queen St (West),774


## Your Mission

You will assume the role of a data specialist whose has been tasked with the responsibility of developing a dashboard

## Phase 0: Discovery

## Phase 1: Exploratory Data Analysis

* Get summary stastistics about data -- crourse in pandas)
* Visualise your data (ie plot your damn data) -- crash course in plotly


## Phase 2: Visualisation Development

### Plot 1: Visualising Traffic Size by Sensor

### Plot 2: Visualising Traffic Size by Month

### Plot 3: Visualising Traffic Geographically

### Plot 4: Visualising Traffic Temporally

## Phase 3: Escaping the Notebook

1. capturing repeated actions with abstractions
 * loading and saving data
 * filter data
 * make custom plots from filtered data
2. converting code into a package

Why:
1. gives you tools for performing later analysis faster and will enable cleaner, more maintianable, and more extensable dashboard code.  

## Phase 4: Making the Dashboard

## Phase 5: Deploying the Dashboard