# Project Stage - V (Dashboard)
## Goals
The final stage aims a developing a simple interactive dashboard based on the analysis you have done so far. In this we will be utilizing Plotly (https://plotly.com/) along with Dash (https://plotly.com/dash/) as our framework.

Refer here for Plotly: https://github.com/q-tong/CS405-605-Data-Science/tree/main/Fall2023/Lecture/5.Visualization/Visualization

Getting started with Dash: https://www.youtube.com/watch?v=hSPmj7mK6ng

PS: This can be invoked from Jupyter, see here: https://medium.com/plotly/introducing-jupyterdash-811f1f57c02e

Tasks for stage V (team):
Task 1: (70 pts)
- Main graph
    - Allow for selection of date to show the trend of COVID-19 cases and deaths. (30)
    - Allow for linear or log mode selection on the number of cases and deaths. (10)
    - Incorporate your best model prediction trend line - Linear / Non-Linear. (30)
    - Ex: https://ourworldindata.org/coronavirus

Task 2: (30 pts)
- Trend
    - Plot the trend line using moving average (https://en.wikipedia.org/wiki/Moving_average). Use 7-day moving average. (15)
    - Allow for selection of multiple states on the same graph. (15)



Deliverable

Take screenshots of Report upload on canvas.
Each member creates separate notebooks for member tasks. Upload all notebooks to Github Repository.

In [1]:
import pandas as pd
import numpy as np
import time

import plotly.express as px
import plotly.graph_objects as go
import dash   # make sure it's v2.2.0 or greater
from jupyter_dash import JupyterDash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output
#

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model
from sklearn.svm import SVR
from sklearn.linear_model import LinearRegression

pd.set_option('display.max_rows', 5000)


In [2]:
# Import Data
df = pd.read_csv('ProjectDataStage1LONGFORMAT.csv')
df.dropna(inplace = True)

df = df.groupby(['State', 'Date'])[["Deaths", "Cases", "population"]].sum().reset_index()  # group bys

# Data manipulations

df['DailyCases'] = df['Cases'].diff().abs()  # turns case / death data from cumulative to delta
df['DailyDeaths'] = df['Deaths'].diff().abs()

df['DeathPerCapita'] = (df['DailyDeaths'] / df['population'])*100000  # calculates per capita data for the daily data
df['CasesPerCapita'] = (df['DailyCases'] / df['population'])*100000

df['Date'] = pd.to_datetime(df['Date'])  # converts date from object to date time

df.dropna(inplace = True)


In [3]:
# after some flailing around, i realized that Daily Cases and Deaths were being calculated strangely and it was throwing
# the data off for every state after AL.
print(df.loc[df['State'] == 'AL'])

     State       Date  Deaths    Cases  population  DailyCases  DailyDeaths  \
1265    AL 2020-01-22       0        0   4903185.0    287319.0       1457.0   
1266    AL 2020-01-23       0        0   4903185.0         0.0          0.0   
1267    AL 2020-01-24       0        0   4903185.0         0.0          0.0   
1268    AL 2020-01-25       0        0   4903185.0         0.0          0.0   
1269    AL 2020-01-26       0        0   4903185.0         0.0          0.0   
1270    AL 2020-01-27       0        0   4903185.0         0.0          0.0   
1271    AL 2020-01-28       0        0   4903185.0         0.0          0.0   
1272    AL 2020-01-29       0        0   4903185.0         0.0          0.0   
1273    AL 2020-01-30       0        0   4903185.0         0.0          0.0   
1274    AL 2020-01-31       0        0   4903185.0         0.0          0.0   
1275    AL 2020-02-01       0        0   4903185.0         0.0          0.0   
1276    AL 2020-02-02       0        0   4903185.0  

In [4]:
# We'll just reset those values to 0.
mask = df['Date'] == pd.to_datetime('2020-01-22')
df.loc[mask, ['DailyDeaths', 'DailyCases','DeathPerCapita','CasesPerCapita']] = 0
print(df.loc[df['State'] == 'AL'])

     State       Date  Deaths    Cases  population  DailyCases  DailyDeaths  \
1265    AL 2020-01-22       0        0   4903185.0         0.0          0.0   
1266    AL 2020-01-23       0        0   4903185.0         0.0          0.0   
1267    AL 2020-01-24       0        0   4903185.0         0.0          0.0   
1268    AL 2020-01-25       0        0   4903185.0         0.0          0.0   
1269    AL 2020-01-26       0        0   4903185.0         0.0          0.0   
1270    AL 2020-01-27       0        0   4903185.0         0.0          0.0   
1271    AL 2020-01-28       0        0   4903185.0         0.0          0.0   
1272    AL 2020-01-29       0        0   4903185.0         0.0          0.0   
1273    AL 2020-01-30       0        0   4903185.0         0.0          0.0   
1274    AL 2020-01-31       0        0   4903185.0         0.0          0.0   
1275    AL 2020-02-01       0        0   4903185.0         0.0          0.0   
1276    AL 2020-02-02       0        0   4903185.0  

In [5]:
# Creating lists to use in plotly drop down and functions to use in app

states = ['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA',
           'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME',
           'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM',
           'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
           'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY']



    

In [6]:
# Functions to make plotly range slider have dates
# https://stackoverflow.com/questions/51063191/date-slider-with-plotly-dash-does-not-work

daterange = pd.date_range(start='2020',end='2023',freq='W') 
def unixTimeMillis(dt):
    #''' Convert datetime to unix timestamp '''
    return int(time.mktime(dt.timetuple()))

def unixToDatetime(unix):
    #''' Convert unix timestamp to datetime. '''
    return pd.to_datetime(unix,unit='s')

def getMarks(start, end, Nth=100):
    #''' Returns the marks for labeling. 
    #   Every Nth value will be used.
    #'''

    result = {}
    for i, date in enumerate(daterange):
        if(i%Nth == 1):
            # Append value to dict
            result[unixTimeMillis(date)] = str(date.strftime('%Y-%m-%d'))

    return result

    

In [7]:
# Functions 
# initialize the app

app = JupyterDash(__name__)


# Build App

app.layout = html.Div([
    html.H1("COVID-19 Dashboard", style = {'text-align':'center'}),  # Title
    
    dcc.RangeSlider(id='yearSlider',
                min = unixTimeMillis(daterange.min()),
                max = unixTimeMillis(daterange.max()),
                value = [unixTimeMillis(daterange.min()),
                         unixTimeMillis(daterange.max())],
                marks=getMarks(daterange.min(),
                            daterange.max())), # range slider
    
    dcc.Dropdown(id = 'stateSelect', options = states, value = 'AK', multi = True, style={'width': "40%"}),
    dcc.RadioItems(id = 'dataTransform',options = ['Linear', 'Log Transform'], value = 'Linear'),

    
    dcc.Graph(id='graphCases', figure = {}),  # Graph for Case Data
    dcc.Graph(id='graphCases2', figure = {}),  # Graph for Death Data

    html.Div(id = 'output_container', children = []), # container for text
    
    html.Br(),  # space
    

])



# Define callback to update graph
# Connect the Plotly graphs with Dash Components
@app.callback(
    [Output(component_id='output_container', component_property='children'),
     Output(component_id='graphCases', component_property='figure'),
     Output(component_id='graphCases2', component_property='figure')],
    [Input(component_id='stateSelect', component_property='value')],
    [Input(component_id='yearSlider', component_property='value')],
    [Input(component_id='dataTransform', component_property='value')]
)
def update_graph(selected_states, yearSlider, dataTransform):
    dateStart = unixToDatetime(yearSlider[0])
    dateEnd = unixToDatetime(yearSlider[1])

    container = "The year range chosen by the user was: {}".format(dateStart, dateEnd)

    figCases = go.Figure()
    figCases.update_layout(title=go.layout.Title(text="Covid-19 Cases across the USA"),
                          xaxis_title="Date", yaxis_title="Number of Cases"
                          )
    figDeath = go.Figure()
    figDeath.update_layout(title=go.layout.Title(text="Covid-19 Deaths across the USA"),
                          xaxis_title="Date", yaxis_title="Number of Deaths"
                          )

    
    for stateSelect in selected_states:
        dfTemp = df[df['State'] == stateSelect]
        dfTemp = dfTemp[(dfTemp['Date'] > dateStart) & (dfTemp['Date'] < dateEnd)]

        if len(dfTemp) > 0:  # Check if there are data points
            if dataTransform == "Linear":
                x = np.array((dfTemp['Date'] - dfTemp['Date'].min()).dt.days).reshape(-1, 1)
                y = dfTemp['CasesPerCapita']

                # Fit linear regression model
                model = LinearRegression()
                model.fit(x, y)

                # Predict for the entire date range
                x_pred = np.array((pd.date_range(start=dfTemp['Date'].min(), end=dfTemp['Date'].max()) - dfTemp['Date'].min()).days).reshape(-1, 1)
                y_pred = model.predict(x_pred)

                # Add scatter plot with regression line to the figure
                figCases.add_trace(go.Scatter(x=dfTemp['Date'], y=dfTemp['CasesPerCapita'], mode='markers', name=f'{stateSelect} - Cases'))
                figCases.add_trace(go.Scatter(x=pd.date_range(start=dfTemp['Date'].min(), end=dfTemp['Date'].max()), y=y_pred, mode='lines', name=f'{stateSelect} - Trendline'))
                
                # Calculate and plot 7-day running average
                if len(dfTemp['Date']) > 7:
                    dfTemp['7DayAvg'] = dfTemp['CasesPerCapita'].rolling(window=7).mean()
                    figCases.add_trace(go.Scatter(x=dfTemp['Date'], y=dfTemp['7DayAvg'], mode='lines', name=f'{stateSelect} - 7 Day Avg'))
                
                # Making the plot for the death data
                x2 = np.array((dfTemp['Date'] - dfTemp['Date'].min()).dt.days).reshape(-1, 1)
                y2 = dfTemp['DeathPerCapita']

                # Fit linear regression model
                model = LinearRegression()
                model.fit(x2, y2)
                
                x2_pred = np.array((pd.date_range(start=dfTemp['Date'].min(), end=dfTemp['Date'].max()) - dfTemp['Date'].min()).days).reshape(-1, 1)
                y2_pred = model.predict(x2_pred)
                
                figDeath.add_trace(go.Scatter(x=dfTemp['Date'], y=dfTemp['DeathPerCapita'], mode='markers', name=f'{stateSelect} - Deaths'))
                figDeath.add_trace(go.Scatter(x=pd.date_range(start=dfTemp['Date'].min(), end=dfTemp['Date'].max()), y=y2_pred, mode='lines', name=f'{stateSelect} - Trendline'))
                
                if len(dfTemp['Date']) > 7:
                    dfTemp['7DayAvgD'] = dfTemp['DeathPerCapita'].rolling(window=7).mean()
                    figDeath.add_trace(go.Scatter(x=dfTemp['Date'], y=dfTemp['7DayAvgD'], mode='lines', name=f'{stateSelect} - 7 Day Avg'))
                

            else:
                # Creates a new temp df that removes from the original database where the CasesPerCapita is equal to zero
                # This was done to fix issues with np.log function and the trendlines.
                # dfTemp2 = dfTemp.drop(dfTemp[dfTemp.CasesPerCapita == 0].index).copy(deep=True)
                # dfTemp2 = dfTemp.drop(dfTemp[dfTemp.DeathPerCapita == 0].index).copy(deep=True)
                dfTemp2 = dfTemp.copy()
                
                # dfTemp2.loc[(dfTemp2.CasesPerCapita < 1), 'CasesPerCapita'] = 1 # this one removes all negative values
                # all zeros are changed to a value close to zero so the log function works
                dfTemp2.loc[(dfTemp2.CasesPerCapita == 0), 'CasesPerCapita'] = 0.0001 
                dfTemp2.loc[(dfTemp2.DeathPerCapita == 0), 'DeathPerCapita'] = 0.0001 
                
                # Set up Linear Regression
                x = np.array((dfTemp2['Date'] - dfTemp2['Date'].min()).dt.days).reshape(-1, 1)
                y = np.log10(dfTemp2['CasesPerCapita'])
                
                model = LinearRegression()
                model.fit(x, y)
                
                # Predict for the date range.
                x_pred = np.array((pd.date_range(start=dfTemp2['Date'].min(), end=dfTemp2['Date'].max()) - dfTemp2['Date'].min()).days).reshape(-1, 1)
                y_pred = model.predict(x_pred)
                
                figCases.add_trace(go.Scatter(x=dfTemp2['Date'], y=np.log10(dfTemp2['CasesPerCapita']), mode='lines+markers', name=f'{stateSelect} - Log Transform Cases'))
                figCases.add_trace(go.Scatter(x=pd.date_range(start=dfTemp2['Date'].min(), end=dfTemp2['Date'].max()), y=y_pred, mode='lines', name=f'{stateSelect} - Trendline'))

                # Calculate and plot 7-day running average for the log transform.
                if len(dfTemp2['Date']) > 7:
                    dfTemp2['7DayAvg'] = np.log10(dfTemp2['CasesPerCapita']).rolling(window=7).mean()
                    figCases.add_trace(go.Scatter(x=dfTemp2['Date'], y=dfTemp2['7DayAvg'], mode='lines', name=f'{stateSelect} - 7 Day Avg'))
                
                # Making the deaths plot
                
                # Set up Linear Regression
                x2 = np.array((dfTemp2['Date'] - dfTemp2['Date'].min()).dt.days).reshape(-1, 1)
                y2 = np.log10(dfTemp2['DeathPerCapita'])
                
                model = LinearRegression()
                model.fit(x2, y2)
                
                # Predict for the date range.
                x2_pred = np.array((pd.date_range(start=dfTemp2['Date'].min(), end=dfTemp2['Date'].max()) - dfTemp2['Date'].min()).days).reshape(-1, 1)
                y2_pred = model.predict(x2_pred)
                
                figDeath.add_trace(go.Scatter(x=dfTemp2['Date'], y=np.log10(dfTemp2['DeathPerCapita']), mode='lines+markers', name=f'{stateSelect} - Log Transform Deaths'))
                figDeath.add_trace(go.Scatter(x=pd.date_range(start=dfTemp2['Date'].min(), end=dfTemp2['Date'].max()), y=y2_pred, mode='lines', name=f'{stateSelect} - Trendline'))

                # Calculate and plot 7-day running average for the log transform.
                if len(dfTemp2['Date']) > 7:
                    dfTemp2['7DayAvgD'] = np.log10(dfTemp2['DeathPerCapita']).rolling(window=7).mean()
                    figDeath.add_trace(go.Scatter(x=dfTemp2['Date'], y=dfTemp2['7DayAvgD'], mode='lines', name=f'{stateSelect} - 7 Day Avg'))

            
    return container, figCases, figDeath
    
# Run app and display result inline in the notebook
app.run_server(mode='inline', port=8050)