# Welcome to introduction to dashboards with Plotly and Dash

-------------------------------------------------------------------------------------------------------------------------------

### Workshop facilitators: Maajid Husain -- created by Laura Gutierrez Funderburk

### About this workshop

In this workshop we will explore some characteristics of the housing market in Canada. 

It is important to note that this workshop assumes:

1. Data cleaning and exploration was completed prior to developing the dashboard
2. Some comfort with `pandas` and visualization is assumed
3. Comfort navigating the Jupyter environment is needed


### Workshop schedule:

-------------------------------------------------------------------------------------------------------------------------------


#### 1. Part I: Data exploration

In this section, we will first spend time getting familiar with the data. We will use the `pandas` and `plotly` libraries, we will also explore the `DEX` feature within Noteable to ease getting a good sense for what the data contains.

In this section, we will also explore the notion of factoring code into functions, and the notion of writing a Python script that we can use to easily recreate our results. 

#### 2. Part II: Dashboard components

In this section, we will take what we built together in part I and explore the main components in a Dash dashboard. 

## Part I: Data exploration


In [1]:
import pandas as pd
import plotly.express as px

In [2]:
# Read data
url = 'https://raw.githubusercontent.com/Vancouver-Datajam/dashboard-workshop-dash/main/data/delinquency_mortgage_population_2021_2020.csv'
data_pop_del_mort_df = pd.read_csv(url, index_col=0)
data_pop_del_mort_df.head(10)


Unnamed: 0,Geography,Time,DelinquencyRate,AverageMortgageAmount,PopulationSize
0,Newfoundland,2012Q3,0.24,188732,526345
1,Prince Edward Island,2012Q3,0.57,140279,144530
2,Nova Scotia,2012Q3,0.53,174688,943635
3,New Brunswick,2012Q3,0.63,133390,758378
4,Québec,2012Q3,0.33,159661,8061101
5,Ontario,2012Q3,0.31,247455,13390632
6,Manitoba,2012Q3,0.25,188298,1249975
7,Saskatchewan,2012Q3,0.37,217945,1083755
8,Alberta,2012Q3,0.6,282371,3874548
9,British Columbia,2012Q3,0.42,305427,4566769


## Exercise: Get familiar with the table

-------------------------------------------------------------------------------------------------------------------------------

Run the cell below.

#### Questions

a) What are relevant variables in the data?

b)What is the extent (range), mean and median of columns `DelinquencyRate`, `AverageMortgageAmount` and `PopulationSize`?

c) What is the time range and frequency of the data?

In [3]:
data_pop_del_mort_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 330 entries, 0 to 329
Data columns (total 5 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Geography              330 non-null    object 
 1   Time                   330 non-null    object 
 2   DelinquencyRate        330 non-null    float64
 3   AverageMortgageAmount  330 non-null    int64  
 4   PopulationSize         330 non-null    int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 15.5+ KB


In [4]:
data_pop_del_mort_df.describe()

Unnamed: 0,DelinquencyRate,AverageMortgageAmount,PopulationSize
count,330.0,330.0,330.0
mean,0.443121,221437.539394,3612792.0
std,0.175417,69710.969622,4240269.0
min,0.13,122572.0,143948.0
25%,0.32,172836.0,763666.8
50%,0.42,196802.0,1214796.0
75%,0.5875,278598.5,4852476.0
max,0.84,417004.0,14734010.0


## Using Python and Plotly to generate interactive plots

-------------------------------------------------------------------------------------------------------------------------------


In this section we are going to write a few commands to get started with visualizations.

In [5]:
# First attempt
px.scatter(data_pop_del_mort_df, 
        x = "Time",
        y="DelinquencyRate")


The plot above is quite difficult to read. Let's colour the values by Geography, and add a title.

### Advantages to using Plotly
- can isolate one line at a time
- can use the hover feature to see data points

In [6]:
# Second attempt
px.line(data_pop_del_mort_df, 
        x = "Time",
        y="DelinquencyRate",
       color="Geography",
       title = "Chart: line plot of Time and DelinquencyRate by Geography")


#### Exercise: Let's take a look at the average mortgage amount and population size

Complete the code below to visualize the average mortgage amount. 

Change the code to visualize changes in population size.

In [9]:
# AverageMortgageAmount DelinquencyRate PopulationSize
variable = "PopulationSize"

px.line(data_pop_del_mort_df, 
        x = "Time",
        y=variable,
       color="Geography",
       title = f"Chart: line plot of Time and {variable} by Geography")


Let's take a look at their distribution by using a box plot.

Can see statistical significance of each plot

In [10]:
px.box(data_pop_del_mort_df, 
       x = 'Geography', 
       y = 'DelinquencyRate',
      color = 'Geography',
      title  = 'Chart: box plot of Delinquency rate by Geoography.')

#### Exercise: Let's take a look at distribution of average mortgage amount and population size

Complete the code below to visualize the average mortgage amount and population size. 

In [11]:
# AverageMortgageAmount DelinquencyRate PopulationSize
variable = "AverageMortgageAmount"
px.box(data_pop_del_mort_df, 
       x = 'Geography', 
       y = variable,
      color = 'Geography',
      title  = f'Chart: box plot of {variable} by Geoography.')

Let's work on a scatter plot to see if there is a relationship between average mortgage amount and delinquency.

In [12]:
px.scatter(data_frame=data_pop_del_mort_df,
          y = "AverageMortgageAmount",
          x = "DelinquencyRate",
          title="Average mortgage rate to delinquency rate")

#### Exercise: modify the code above to colour the dots by Geography, add hover name with Time

In [13]:
px.scatter(data_frame=data_pop_del_mort_df,
      y = "AverageMortgageAmount",
      x = "DelinquencyRate",
      title="Average mortgage rate to delinquency rate",
      color="Geography", 
      hover_name="Time")

## Using dictionaries to access different kind of functions

-------------------------------------------------------------------------------------------------------------------------------


We need to do quite a bit of work refactoring our code in preparation for our dashboard.

We will use dictionaries to access different plotting functions.

Recall, a dictionary is a data structure with `keys` and `values`. The syntax of a dictionary is as follows:

    dictionary =  { key1 : value1,
                    key2 : value2,
                    key3 : value3}
                    
Where keys are typically a string, and values can be a data structure such as a string, list, set, tuple, or a function.

In [21]:
sample_dictionary = {"list_numbers" : [1, 2, 3, 4, 5],
                     "set_numbers": set([1, 2, 3, 4, 5]),
                     "tuple_numbers": tuple([1, 2, 3, 4, 5]),
                     "function_sum": sum}

To access the values within a dictionary, we use the following notation

    dictionary[key]
    
For example

In [22]:
sample_dictionary['list_numbers']

[1, 2, 3, 4, 5]

In [23]:
sample_dictionary['set_numbers']

{1, 2, 3, 4, 5}

In [24]:
sample_dictionary['tuple_numbers']

(1, 2, 3, 4, 5)

In [25]:
sample_dictionary['function_sum']

<function sum(iterable, /, start=0)>

To use the function `sum`, simply pass a list of numbers you want to add.

In [26]:
sum([1,2,3])

6

We can obtain the same result with our dictionary as follows:

In [27]:
sample_dictionary['function_sum']([1,2,3])

6

We can use the following dictionary to generate different kinds of plots.

In [14]:
# Dictionary
plot_dict = {'box': px.box,'violin': px.violin, 'scatter': px.scatter, 'line':px.line}

We can then use the dictionary to try different kinds of plots.

In [15]:
plot_dict['scatter'](data_pop_del_mort_df, 
        x = "Time",
        y="DelinquencyRate",
       color="Geography",
       title = "Chart: line plot of Time and DelinquencyRate by Geography")

#### Exercise 1: change the key `scatter` for `line` , `box` and `violin` and run the cell

#### Exercise 2: change the `x` variable to be one of `Geography` or `Time`

#### Exercise 3: Change the `y` variable to be one of `PopulationSize`, `DelinquencyRate` or `AverageMortgageAmount`

In [17]:
plot_dict['box'](data_pop_del_mort_df, 
        x = "Time",
        y="DelinquencyRate",
       color="Geography",
       title = "Playing with several kinds of charts")

## Refactoring code into functions

-------------------------------------------------------------------------------------------------------------------------------


In the next section we will refactor our code to ease reproducibility and also to ensure our Dash app is cleaner. 

We can then put our function dictionary into a Python function. 

In [18]:
def graph_region(region_df, graph_type: str, dimension1: str, dimension2: str):
    """
    Parameters
    ----------
        region_df: (dataframe object) reshaped data frame object with mortage, delinquency and population data
        graph_type: (string) "box", "violin", "scatter", "line"
        dimension1: (str) one of 'Time' or 'Geography'
        dimension2: (str) one of 'AverageMortgageAmount', 'AverageMortgageAmount' or 'PopulationSize'
        
    Returns:
    --------
        Plotly figure 
    """
    
    plot_dict = {'box': px.box,'violin': px.violin, 'scatter': px.scatter, 'line':px.line}
        
    try:
        # Initialize function
        fig = plot_dict[graph_type](region_df, 
                                    x=dimension1, 
                                    y=dimension2, 
                                    color = "Geography",
                                   hover_name = "Time")
        # Format figure 
        title_string = f'Chart: {graph_type} plot of {dimension1} and {dimension2} by Geography'
        fig.update_layout(title = title_string)
        fig.update_xaxes(tickangle=-45)
        return fig
    
    except KeyError:
        print("Key not found. Make sure that 'graph_type' is in ['box','violin', 'scatter', 'line']")
    except ValueError:
        print("Dimension is not valid. dimension1 is one of 'Time' or 'Geography'")
        print("dimension2 is one of 'AverageMortgageAmount', 'DelinquencyRate', 'PopulationSize'")

In [22]:
graph_region(data_pop_del_mort_df, 'line', "Time", "AverageMortgageAmount")

In [23]:
graph_region(data_pop_del_mort_df, 'box', "Geography", "PopulationSize")

In [34]:
graph_region(data_pop_del_mort_df, 'scatter', "AverageMortgageAmount", "DelinquencyRate")

## Bonus, incorporating time series plots

In [24]:
# Optional to have regions

fig = px.scatter(data_frame=data_pop_del_mort_df,
          y = "AverageMortgageAmount",
          x = "DelinquencyRate",
          size= "PopulationSize",
          color= "Geography",
          animation_frame="Time",
           animation_group="Geography",
           title = "Delinquency rate vs average mortgage over time"
          )
fig.update_layout(yaxis_range=[100000,500000])
fig.update_layout(xaxis_range=[0,1])
fig.show()

This concludes part I. In the next section, we will take what we built together and work on the components of our dashboard.