## Project Stage - V (Dashboard)  ddl: 04/28/2023

## Goals

The final stage aims a developing a simple interactive dashboard based on the analysis you have done so far. In this we will be utilizing Plotly (https://plotly.com/) along with Dash (https://plotly.com/dash/) as our framework. 

Refer here for Plotly: https://github.com/UNCG-CSE/CSC-405-605_Spring_2021/blob/master/Class_Resources/Lecture_10/Visualization/03_Plotly/Plotly.ipynb

Getting started with Dash: https://www.youtube.com/watch?v=hSPmj7mK6ng

*PS: This can be invoked from Jupyter, see here: https://medium.com/plotly/introducing-jupyterdash-811f1f57c02e*

### Tasks for stage V (team):

#### Task 1: (70 pts)

    - Main graph
        - Allow for selection of date to show the trend of COVID-19 cases and deaths. (30)
        - Allow for linear or log mode selection on the number of cases and deaths. (10)
        - Incorporate your best model prediction trend line - Linear / Non-Linear. (30)
        - Ex: https://ourworldindata.org/coronavirus
        
#### Task 2: (30 pts)

    - Trend
        - Plot the trend line using moving average (https://en.wikipedia.org/wiki/Moving_average). Use 7-day moving average. (15)
        - Allow for selection of multiple states on the same graph. (15)
    
    

**Deliverable**
- Take screenshots of Report upload on canvas.
- Each member creates separate notebooks for member tasks. Upload all notebooks to Github Repository. 
- Final Presentation recordings on canvas.

In [62]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from math import sqrt
from sklearn.svm import SVR
import warnings
from scipy.stats import pearsonr
from scipy.stats import ttest_ind
from scipy import stats
import statsmodels.api as sm
warnings.filterwarnings('ignore')

In [63]:
def moving_average(data, window_size):
    moving_average = []
    for i in range(len(data)):
        if i + window_size < len(data):
            moving_average.append(np.mean(data[i:i+window_size]))
        else:
            moving_average.append(np.mean(data[i:len(data)]))
    return moving_average

class PolynomialModel:
    """
    df: should be in format <index = Datetime> <columns = state_name> <values = number of new cases/deaths>
    """
    def __init__(self, df, degree=3):
        self.df = df
        self.degree = degree
        self.trends = pd.DataFrame()
        self.moving_averages = pd.DataFrame()
        self.model()
    
    def model(self):
        for column in self.df.columns:
            pr = LinearRegression()
            poly = PolynomialFeatures(degree=self.degree)
            x = np.arange(len(self.df.index)).reshape(-1,1)
            y = self.df[column].values.astype(int)
            poly_x = poly.fit_transform(x)
            pr.fit(poly_x, y)
            self.trends[column] = pr.predict(poly_x)
            self.moving_averages[column] = moving_average(self.df[column], 7)
            self.trends.index = self.df.index
            self.moving_averages.index = self.df.index
            
    
    def get_trends(self, start_date, end_date, states):
        if len(states) == 0 or 'All States' in states:
            selected = self.trends.sum(axis=1)
            selected.name = 'trend'
        else:
            selected = self.trends[states].sum(axis=1)[start_date:end_date]
        return selected
    
    def get_moving_average(self, start_date, end_date, states, window_size=7):
        if len(states) == 0 or 'All States' in states:
            selected = self.moving_averages.sum(axis=1)
            selected.name = 'mv_avg'
        else:
            selected = self.moving_averages[states][start_date:end_date]
        return selected
    
def get_us_confirmed_cases():
    df = pd.read_csv('../Stage IV/team_work/data/covid_confirmed_usafacts.csv')
    df = df.drop(['countyFIPS', 'County Name', 'StateFIPS'], axis=1)
    df = df.groupby('State').sum(numeric_only=True)
    df.columns = pd.to_datetime(df.columns)
    df = df.transpose()
    return df

In [64]:
covid_cases_states = get_us_confirmed_cases()
model = PolynomialModel(covid_cases_states)
pd.merge(covid_cases_states, model.get_trends(covid_cases_states.index[0], covid_cases_states.index[10], ['AK', 'NC']), left_index=True, right_index=True)

Unnamed: 0,AK,AL,AR,AZ,CA,CO,CT,DC,DE,FL,...,TN,TX,UT,VA,VT,WA,WI,WV,WY,trend
2020-01-22,0,0,0,0,722,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,88930.882465
2020-01-23,0,0,0,0,733,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,87277.084604
2020-01-24,0,0,0,0,739,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,85642.918087
2020-01-25,0,0,0,0,749,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,84028.354388
2020-01-26,0,0,0,1,756,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,82433.364984
2020-01-27,0,0,0,1,766,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,80857.921349
2020-01-28,0,0,0,1,772,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,79301.994959
2020-01-29,0,0,0,1,776,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,77765.557289
2020-01-30,0,0,0,1,783,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,76248.579816
2020-01-31,0,0,0,1,798,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,74751.034014


In [66]:
model.get_moving_average(covid_cases_states.index[0], covid_cases_states.index[50], 'All States')

0       7.502857e+02
1       7.582857e+02
2       7.657143e+02
3       7.744286e+02
4       7.857143e+02
            ...     
1086    9.662386e+07
1087    9.662649e+07
1088    9.662782e+07
1089    9.662872e+07
1090    9.662946e+07
Name: Moving Average, Length: 1091, dtype: float64