## Project Stage - IV (Dashboard)

## Goals

The final stage aims a developing a simple interactive dashboard based on the analysis you have done so far. In this we will be utilizing Plotly (https://plotly.com/) along with Dash (https://plotly.com/dash/) as our framework. 

Getting started with Dash: https://www.youtube.com/watch?v=hSPmj7mK6ng

*PS: This can be invoked from Jupyter, see here: https://medium.com/plotly/introducing-jupyterdash-811f1f57c02e*

### Tasks:

#### Task 1: (100 pts)
- Member:
    - Dashboard
        - **M1.1** Shows comparisions between a variable and Normalized Mortality rate in a scatter plot
            -- Incorporate your best model prediction trend line - Linear / Non-Linear. (20 pts)
            
        - **M1.2** Contains a data table (20 pts)
        - **M1.3** Contains a map displaying values of either variables (20 pts)
        - **M1.4** Selectors (30 pts)
            - Allows for linear or log mode selection on both the variables for scatter plot. 
            - Allows for selection of state.
            - Allows for linear or log mode selection on both the variables. 
            - Allows for selection of which variable to display on the map.
        
        - **M1.4** Selection of on the graph or Data Table highlights the other ones. (10 pts)
        
        - A partial example:
        
        <img src="../img/Dashboard1.png" width=800 height=800 />
     
***Extra Credit:*** Creative elements with the provided data and good design. Can earn upto 50 pts extra.

**Deliverable**
- Take screenshots of Report upload on canvas.
- Each member creates separate notebooks for member tasks. Upload all notebooks to Github Repository. 
- Final Presentation recordings on canvas.

In [1]:
#pip install jupyter_dash

In [2]:
## pip install dash (version 2.0.0 or higher)
#pip install dash

In [3]:
#pip install dash-bootstrap-components

In [4]:
#pip install gif

In [5]:
import numpy as np
import pandas as pd
import plotly.express as px  # (version 4.7.0 or higher)
import plotly.graph_objects as go
from dash import Dash, dcc, html, Input, Output, dash_table
from jupyter_dash import JupyterDash
import dash_bootstrap_components as dbc
from urllib.request import urlopen
import json
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

import warnings
warnings.filterwarnings("ignore")

<b> Reading the super dataset

In [6]:
data = pd.read_csv("../../../../data/stage_1/superdataset_merge.csv")

<b> Normalizing all the required variables by population per 100000

In [7]:
data["Unemployment value"] = (data["Unemployment raw value"]/data["Population"])*100000

In [8]:
data["Drug Overdose value"] = (data["Drug overdose deaths raw value"]/data["Population"])*100000

In [9]:
data["Insufficient sleep value"] = (data["Insufficient sleep raw value"]/data["Population"])*100000

In [10]:
data["Excessive drinking value"] = (data["Excessive drinking raw value"]/data["Population"])*100000

In [11]:
data["Opiod dispensing value"] = (data["Opiod_Dispensing_Rate"]/data["Population"])*100000

<b> Filling null values with 0 in the required column</b>
* I found that drug overdose had null values using isna() function call. So I filled it with zeros so it doesn't affect the analysis later.

In [12]:
# Norm_Drug_Overdose has many "NA" values filling all the NA before performing regression
data["Drug overdose deaths raw value"] = data["Drug overdose deaths raw value"].fillna(0)
data["Drug Overdose value"] = data["Drug Overdose value"].fillna(0)

Creating a list of variables that we want to show in the `variables dropdown`.

In [13]:
par = ["Opiod dispensing value","Unemployment value","Drug Overdose value","Insufficient sleep value","Excessive drinking value"]

Creating a list of state values that we want to show in the `state dropdown`.

In [14]:
s_name = list(data.State.unique())
s_name.sort()
print(s_name)

['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA', 'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME', 'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY']


Creating a list for dropdown of `select visualization`.

In [15]:
l2 = ["Graph", "Map"]

<b>For Map data</b>
* I found that we dont have 5 digit FIPS code for all counties. There are certain counties where FIPS are 4 digit.
* This creates a problem while displaying map. As map maps the data using FIPS code, if FIPS code are not matched then it will not show the data corresponding to that entire state.
* As a result, I am checking below if FIPS code are less than 5 digit then append another 0 in the front.

Create an empty list and convert all the FIPS to string and store it to that list.

In [16]:
# Our county FIPS at some places are 4 digits so during MAP creation it, is not able to match with 5 digit countyFIPS code
p = []
for i in range(len(data)):
    p.append(str(data["County Code"][i]))

Add a new column FIPS to the dataset and copy the value of list p into that.

In [17]:
data["FIPS"] = p

Check for the length and append if necessary.

In [18]:
j = []
for i in range(len(data)):
    if(len(data["FIPS"][i]) == 4):
        j.append("0"+ data["FIPS"][i])
    else:
        j.append(data["FIPS"][i])

Replace j with FIPS code.

In [19]:
data["FIPS"] = j
data.head()

Unnamed: 0,County,County Code,Population,Deaths,Norm_Deaths,State FIPS Code,County FIPS Code,5-digit FIPS Code,State Abbreviation,Name,...,Total female population raw value,Population growth raw value,State,FIPS,Opiod_Dispensing_Rate,Unemployment value,Drug Overdose value,Insufficient sleep value,Excessive drinking value,Opiod dispensing value
0,"Abbeville County, SC",45001,535389,38,7.0,45,1,45001,SC,Abbeville County,...,,,SC,45001,32.4,0.00856,3.258113,0.067397,0.029146,6.051675
1,"Acadia Parish, LA",22001,1279727,237,19.0,22,1,22001,LA,Acadia Parish,...,,,LA,22001,32.2,0.004617,1.081764,0.025285,0.014885,2.516162
2,"Accomack County, VA",51001,726189,73,11.0,51,1,51001,VA,Accomack County,...,,,VA,51001,19.4,0.00664,1.538371,0.050607,0.020851,2.671481
3,"Ada County, ID",16001,8083452,988,12.0,16,1,16001,ID,Ada County,...,,,ID,16001,60.8,0.000344,0.169568,0.003253,0.002389,0.752154
4,"Adair County, KY",21001,387950,43,11.0,21,1,21001,KY,Adair County,...,,,KY,21001,71.3,0.016322,0.0,0.09385,0.037489,18.378657


Changing all the 0 with 0.0001 as log(0) is undefined, which may create porblem later.

In [20]:
data["Opiod_Dispensing_Rate"] = data["Opiod_Dispensing_Rate"].replace(0,0000.1)
data["Unemployment raw value"] = data["Unemployment raw value"].replace(0,0000.1)
data["Drug overdose deaths raw value"] = data["Drug overdose deaths raw value"].replace(0,0000.1)
data["Insufficient sleep raw value"] = data["Insufficient sleep raw value"].replace(0,0000.1)
data["Excessive drinking raw value"] = data["Excessive drinking raw value"].replace(0,0000.1)

Taking log of all the required columns

In [21]:
data["log_OPR"] = np.log(data["Opiod_Dispensing_Rate"])
data["log_unemp"] = np.log(data["Unemployment raw value"])
data["log_drug_overdose"] = np.log(data["Drug overdose deaths raw value"])
data["log_insufficient_sleep"] = np.log(data["Insufficient sleep raw value"])
data["log_excessive_drinking"] = np.log(data["Excessive drinking raw value"])
data["log_deaths"] = np.log(data["Deaths"])

Creating the dataframe to display as `data table`.

In [22]:
data_table_data = data[["State","County","Norm_Deaths","Opiod dispensing value","Unemployment value","Drug Overdose value","Insufficient sleep value","Excessive drinking value","log_OPR","log_unemp","log_drug_overdose","log_insufficient_sleep","log_excessive_drinking","log_deaths"]]
data_table_data.head(5)

Unnamed: 0,State,County,Norm_Deaths,Opiod dispensing value,Unemployment value,Drug Overdose value,Insufficient sleep value,Excessive drinking value,log_OPR,log_unemp,log_drug_overdose,log_insufficient_sleep,log_excessive_drinking,log_deaths
0,SC,"Abbeville County, SC",7.0,6.051675,0.00856,3.258113,0.067397,0.029146,3.478158,-3.082861,2.858971,-1.019325,-1.857615,3.637586
1,LA,"Acadia Parish, LA",19.0,2.516162,0.004617,1.081764,0.025285,0.014885,3.471966,-2.828751,2.627825,-1.128319,-1.658194,5.46806
2,VA,"Accomack County, VA",11.0,2.671481,0.00664,1.538371,0.050607,0.020851,2.965273,-3.031998,2.413364,-1.001027,-1.887724,4.290459
3,ID,"Ada County, ID",12.0,0.752154,0.000344,0.169568,0.003253,0.002389,4.10759,-3.583819,2.617904,-1.335721,-1.644515,6.895683
4,KY,"Adair County, KY",11.0,18.378657,0.016322,0.0,0.09385,0.037489,4.266896,-2.759543,-2.302585,-1.010351,-1.928001,3.7612


Displaying the `Dash`.

In [24]:
# creating a jupyter dash and adding styling method
app = JupyterDash(__name__,external_stylesheets=[dbc.themes.BOOTSTRAP])

# selecting the required dataframe
df = data

# App layout
app.layout = html.Div([

   # providing a heading to the dashboard
   html.H1("Opiod Mortality Analysis Dashboard with Dash", style={'text-align': 'center','fontSize':25}),
    
    # Creating a row and column styling using CSS. One row and 2 columns
    dbc.Container([
        
         dbc.Row([
        html.Label("Select state:",style={'fontSize':15, 'textAlign':'center','width': "50%"}),
        html.Label("Select variable:",style={'fontSize':15, 'textAlign':'center','width': "50%"}),
        dbc.Col(dcc.Dropdown(
        id='select_state',
        options=[{'label': s, 'value': s} for s in sorted(data.State.unique())],
        value="NC",
        style={'width': "100%"},
        clearable=False
    )), 
    
        
        dbc.Col(dcc.Dropdown(
        id='select_var',
        options=[{'label': j, 'value': j} for j in par],
        value="Opiod dispensing value",
        style={'width': "100%"}, 
        clearable=False
    ))])
        
    ]),
    
    
    
    dbc.Container([
        
         dbc.Row([
        dbc.Col(dcc.RadioItems(
        id = 'rb1',
        options=[{'label': 'Linear', 'value': 'linear'},{'label': 'Log', 'value': 'log'}],
        value='linear',    
        )), 

        dbc.Col(dcc.RadioItems(
        id = 'rb2',
        options=[{'label': 'Linear', 'value': 'linear'},{'label': 'Log', 'value': 'log'}],
        value='linear',   
        ))])
    ]),
    

    # creating another dropdown for graph/map selection using the same method as above.
    html.Br(),
    dbc.Container([ 
     dbc.Row([
    html.Label("Select visualization",style={'fontSize':15, 'textAlign':'center'}),   
    dbc.Col(dcc.Dropdown(
    id='select_vis',
    options=[{'label': j, 'value': j} for j in l2],
    value="Graph",
    style={'width': "100%"},
    clearable=False
    )), 
    ], style={'width': "103.5%",'align':"center"}),
    ]),
    
    # creating a dcc.graph
    html.Div([
    dcc.Graph(id='scplot')
    ], style={'width': '100%'}),
    
    html.Br(),

    # creating a dash table with the required parameters
    dash_table.DataTable(id='tbl', 
                         page_current=0, 
                         page_size=5, 
                         page_action='custom',
                         style_table={'overflowX': 'auto'},
                         #row_selectable="multi",
                        ),
])

# # ------------------------------------------------------------------------------
# # Connect the Plotly graphs with Dash Components

@app.callback(
    [Output(component_id='scplot', component_property='figure'),
    Output(component_id = 'tbl', component_property = 'data')], 
    [Input(component_id='select_state', component_property='value'),
     Input(component_id='select_var', component_property='value'),
     Input(component_id='select_vis', component_property='value'),
     Input(component_id='rb1', component_property='value'),
     Input(component_id='rb2', component_property='value'),
     Input('tbl', "page_current"),
     Input('tbl', "page_size"),
     Input("tbl", "selected_row_ids")
    ]
     
)

# This code_1 defines the function making use of all dropdowns and radio buttons

# providing input from above callback fucntion
def code_1(value1,value2,value3,value4,value5,page_current,page_size,value6):
    
    # defining the data to select based on state dropdown selection for the data table
    dff = data_table_data[data_table_data['State'] == value1]
    
    # creating a dashboard based on the above dataframe
    table = dff.iloc[page_current*page_size:(page_current+ 1)*page_size].to_dict('records')

    
    # creating scatter plot
    
    if(value3 == "Graph"):
        
        # providing all 4 cases: linear-linear, linear-log, log-linear, log-log based on radio button possibility
        
        if(value4 =="linear" and value5 == "linear"):
            
            # determining dataset for scatter plot with the required columns
            s_data = data[data["State"] == value1][["State","Norm_Deaths","Opiod dispensing value","Unemployment value","Drug Overdose value","Insufficient sleep value","Excessive drinking value","log_OPR","log_unemp","log_drug_overdose","log_insufficient_sleep","log_excessive_drinking","log_deaths"]]
            
            # determing what should be the y-axix variable as it is used later for regression line creation
            y = s_data["Norm_Deaths"]
            
            # plotting the scatter plot
            fig = px.scatter(s_data, x=s_data[value2], y=s_data["Norm_Deaths"], color = "State", size = s_data["Norm_Deaths"], size_max=20, color_discrete_sequence  = ["orange"])
            
            # updating the layout
            fig.update_layout(
                title_text="Norm_Deaths vs. "+ value2,
                xaxis_title = value2,
                yaxis_title = 'Norm_Deaths',
            )
            
            
        elif(value4 =="linear" and value5 == "log"):
            
            s_data = data[data["State"] == value1][["State","Norm_Deaths","Opiod dispensing value","Unemployment value","Drug Overdose value","Insufficient sleep value","Excessive drinking value","log_OPR","log_unemp","log_drug_overdose","log_insufficient_sleep","log_excessive_drinking","log_deaths"]]
            
            y = s_data["Norm_Deaths"]
            
            # if log is selected in the variables radio button then change the values in the dropdown from normalized to log
            
            if(value2 == "Opiod dispensing value"):
                value2 = "log_OPR"
            elif(value2 == "Unemployment value"):
                value2 = "log_unemp"
            elif(value2 == "Drug Overdose value"):
                value2 = "log_drug_overdose"
            elif(value2 == "Insufficient sleep value"):
                value2 = "log_insufficient_sleep"
            elif(value2 == "Excessive drinking value"):
                value2 = "log_excessive_drinking"
                
            # plotting the scatter plot
            fig = px.scatter(s_data, x=s_data[value2], y=s_data["Norm_Deaths"],color = "State", size = s_data["Norm_Deaths"], size_max=20, color_discrete_sequence  = ["orange"])
            
            fig.update_layout(
                title_text="Norm_Deaths vs. "+ value2,
                xaxis_title = value2,
                yaxis_title = 'Norm_Deaths',
            )
            
            
        elif(value4 =="log" and value5 == "linear"):
            
            s_data = data[data["State"] == value1][["State","Norm_Deaths","Opiod dispensing value","Unemployment value","Drug Overdose value","Insufficient sleep value","Excessive drinking value","log_OPR","log_unemp","log_drug_overdose","log_insufficient_sleep","log_excessive_drinking","log_deaths"]]
            y = s_data["log_deaths"]
            
            # plotting the scatter plot
            fig = px.scatter(s_data, x=s_data[value2], y=s_data["log_deaths"],color = "State", size = s_data["log_deaths"], size_max=20, color_discrete_sequence  = ["orange"])
            
            fig.update_layout(
                title_text="Norm_Deaths vs. "+ value2,
                xaxis_title = value2,
                yaxis_title = 'Norm_Deaths',
            )
            
            
        elif(value4 =="log" and value5 == "log"):
            
            s_data = data[data["State"] == value1][["State","Norm_Deaths","Opiod dispensing value","Unemployment value","Drug Overdose value","Insufficient sleep value","Excessive drinking value","log_OPR","log_unemp","log_drug_overdose","log_insufficient_sleep","log_excessive_drinking","log_deaths"]]
            y = s_data["log_deaths"]
            
            if(value2 == "Opiod dispensing value"):
                value2 = "log_OPR"
            elif(value2 == "Unemployment value"):
                value2 = "log_unemp"
            elif(value2 == "Drug Overdose value"):
                value2 = "log_drug_overdose"
            elif(value2 == "Insufficient sleep value"):
                value2 = "log_insufficient_sleep"
            elif(value2 == "Excessive drinking value"):
                value2 = "log_excessive_drinking"
           
            # plotting the scatter plot
            fig = px.scatter(s_data, x=s_data[value2], y=s_data["log_deaths"],color = "State", size = s_data["log_deaths"], size_max=20, color_discrete_sequence  = ["orange"])
            
            fig.update_layout(
                title_text="Norm_Deaths vs. "+ value2,
                xaxis_title = value2,
                yaxis_title = 'Norm_Deaths',
            )
            
        
        # Performing Linear regression
        
        # Determine the value for X-axis: what data lies on the x-axis
        X = s_data[value2].values.reshape(-1, 1)
        
        # creating a linear_regression model()
        model = LinearRegression()
        
        # Fit the model
        model.fit(X, y)
        
        # creating a linear space
        x_range = np.linspace(X.min(), X.max(), 100)
        # predicting the values on the linear space
        y_range = model.predict(x_range.reshape(-1,1))
        # adding trace to the above build scatter plot
        fig.add_traces(go.Scatter(x=x_range, y=y_range, name='Linear Fit'))

        
        # Performing non-linear regression for degree 2,3 and 4
        
        # Running a for loop for degree = 2,3, and 4
        for degree in [2, 3, 4]:
            #creating a sklearn polynomial model
            poly = PolynomialFeatures(degree)
            # fitting the model on the X-axis data
            poly.fit(X)
            # Fit-transform the data
            X_poly = poly.transform(X)
            x_range_poly = poly.transform(x_range.reshape(-1,1))

            model = LinearRegression(fit_intercept=False)
            model.fit(X_poly, y)
            y_poly = model.predict(x_range_poly)

            # Adding non-linear traces to the above scatter plot
            fig.add_traces(go.Scatter(x=x_range.squeeze(), y=y_poly,name=f'Poly_degree {degree}'))        
        
        # return fig and datatable
        return fig, table
        
    elif(value3 == "Map"):
        
        with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
            counties = json.load(response)
        
        # if selection is log in the radio button then change the normalized values to log values
        if(value5 == "log"):
            if(value2 == "Opiod dispensing value"):
                value2 = "log_OPR"
            elif(value2 == "Unemployment value"):
                value2 = "log_unemp"
            elif(value2 == "Drug Overdose value"):
                value2 = "log_drug_overdose"
            elif(value2 == "Insufficient sleep value"):
                value2 = "log_insufficient_sleep"
            elif(value2 == "Excessive drinking value"):
                value2 = "log_excessive_drinking"
            
        # plotting cholorpeth map
        fig2 = px.choropleth(df, geojson=counties, locations='FIPS', color=value2,
                           color_continuous_scale="viridis",
                           range_color=(df[value2].quantile(q=0.25), data[value2].quantile(q=0.75)),
                           scope="usa",
                           hover_data=["County"]
                           #labels={'unemp':'unemployment rate'}
                          )
        fig2.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
        
        # return map and datatable
        return fig2, table
    
# # ------------------------------------------------------------------------------
if __name__ == '__main__':
    #app.run_server(mode = "inline", port=10)
    app.run_server(debug=True, port=12)

Dash app running on http://127.0.0.1:12/


### GIF:
<img src="stageIV_images/screen-capture.gif" width = 1800 height = 1800>

<I>Certain screenshots:

<b>1. Full Opiod Dashboard<br><br>
<img src="stageIV_images/sc1.png" width=2000 height=3000 />

<b>2. Opiod Dashboard image with Map<br><br>
<img src="stageIV_images/map_linear.png" width=2000 height=3000 />

<b>3. Data Table output by state<br><br>
<img src="stageIV_images/dt_change.png" width=2000 height=3000 />

<b>4. Scatter plot of dash table<br><br>
<img src="stageIV_images/scatter.png" width=2000 height=3000 />

<b>5. USA map of the dash table<br><br>
<img src="stageIV_images/maps.png" width=2000 height=3000 />

### References:
1. [Creation of row and column for dropdown and radio buttons](https://dash.plotly.com/interactive-graphing?_gl=1*1wezxz3*_ga*MTQ1MjU1NTY1LjE2NjY2MzgyMDM.*_ga_6G7EE0JNSC*MTY2OTkzNTAyOS4yNS4xLjE2Njk5MzUwNzUuMC4wLjA.#update-graphs-on-hover)
2. [Dash table creation](https://dash.plotly.com/datatable/callbacks)
3. [Dash table styling](https://dash.plotly.com/datatable/width)
4. [Linear regression](https://plotly.com/python/ml-regression/)
5. [Non-linear/Polynomial regression](https://plotly.com/python/ml-regression/)
6. [Chloropeth maps](https://plotly.com/python/choropleth-maps/)