# Covid 19 Analytics Dashboard

In this project we are going through the covid-19 data from the [John Hopkins University](https://github.com/CSSEGISandData/COVID-19) to build a global analytics dashboard. This project is divided in 4 parts:

* Setting up the data
* Generating the views
* Generating the dashboard
* Conclusion

## Setting up the data

We start our project by loading the needed packages and data.

In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import dash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output
from datetime import datetime, timedelta
import pycountry

covid_df = pd.read_csv('data/covid_tidy_data.csv')
covid_df.tail()

Unnamed: 0,Country,Date,Cases_Confirmed,New_Cases_Confirmed,Cases_Death,New_Cases_Death,Cases_Recovered,New_Cases_Recovered,Code
225166,Zimbabwe,2023-03-05,264127,0,5668,0,0,0,ZWE
225167,Zimbabwe,2023-03-06,264127,0,5668,0,0,0,ZWE
225168,Zimbabwe,2023-03-07,264127,0,5668,0,0,0,ZWE
225169,Zimbabwe,2023-03-08,264276,149,5671,3,0,0,ZWE
225170,Zimbabwe,2023-03-09,264276,0,5671,0,0,0,ZWE


In [7]:
pop_df = pd.read_csv('data/world_population.csv')
pop_df.tail()

Unnamed: 0,Rank,CCA3,Country/Territory,Capital,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (km²),Density (per km²),Growth Rate,World Population Percentage
229,226,WLF,Wallis and Futuna,Mata-Utu,Oceania,11572,11655,12182,13142,14723,13454,11315,9377,142,81.493,0.9953,0.0
230,172,ESH,Western Sahara,El Aaiún,Africa,575986,556048,491824,413296,270375,178529,116775,76371,266000,2.1654,1.0184,0.01
231,46,YEM,Yemen,Sanaa,Asia,33696614,32284046,28516545,24743946,18628700,13375121,9204938,6843607,527968,63.8232,1.0217,0.42
232,63,ZMB,Zambia,Lusaka,Africa,20017675,18927715,16248230,13792086,9891136,7686401,5720438,4281671,752612,26.5976,1.028,0.25
233,74,ZWE,Zimbabwe,Harare,Africa,16320537,15669666,14154937,12839771,11834676,10113893,7049926,5202918,390757,41.7665,1.0204,0.2


The covid_df already contains a tidy dataset and we went to this process in another project that can be seen [here](https://github.com/mateusmelo821/covid-19). Now let us merge the dataframes. We will only be using the 2020 population, since it was the year of the pandemic start and its peak.

In [8]:
covid_df = covid_df.merge(pop_df[['CCA3', '2020 Population']], left_on='Code', right_on='CCA3')
covid_df.tail()

Unnamed: 0,Country,Date,Cases_Confirmed,New_Cases_Confirmed,Cases_Death,New_Cases_Death,Cases_Recovered,New_Cases_Recovered,Code,CCA3,2020 Population
222880,Zimbabwe,2023-03-05,264127,0,5668,0,0,0,ZWE,ZWE,15669666
222881,Zimbabwe,2023-03-06,264127,0,5668,0,0,0,ZWE,ZWE,15669666
222882,Zimbabwe,2023-03-07,264127,0,5668,0,0,0,ZWE,ZWE,15669666
222883,Zimbabwe,2023-03-08,264276,149,5671,3,0,0,ZWE,ZWE,15669666
222884,Zimbabwe,2023-03-09,264276,0,5671,0,0,0,ZWE,ZWE,15669666


We ended up with less rows than before because some countries lack of entries in the countries population file. We can't get the incidence, mortality and lethality rates since those will have to be calculated according to the appied filters in the dashboard. To finish this first part, let us change the Date column type, get rid of the CCA3 column and alter the name of the population column.

In [9]:
covid_df['Date'] = pd.to_datetime(covid_df['Date'])
covid_df = covid_df.drop('CCA3', axis=1)
covid_df = covid_df.rename(columns={'2020 Population':'Population'})
covid_df.tail()

Unnamed: 0,Country,Date,Cases_Confirmed,New_Cases_Confirmed,Cases_Death,New_Cases_Death,Cases_Recovered,New_Cases_Recovered,Code,Population
222880,Zimbabwe,2023-03-05,264127,0,5668,0,0,0,ZWE,15669666
222881,Zimbabwe,2023-03-06,264127,0,5668,0,0,0,ZWE,15669666
222882,Zimbabwe,2023-03-07,264127,0,5668,0,0,0,ZWE,15669666
222883,Zimbabwe,2023-03-08,264276,149,5671,3,0,0,ZWE,15669666
222884,Zimbabwe,2023-03-09,264276,0,5671,0,0,0,ZWE,15669666


## Generating the views

In this part of the project, we are going to generate the views that will be later used in the dashboard. We are going to write several functions in this part, so they can be reused later. Let us start by getting some KPIs with general information.

In [32]:
def get_cases(df):
    return df['New_Cases_Confirmed'].sum()
def get_deaths(df):
    return df['New_Cases_Death'].sum()
def get_incidence(df):
    return get_cases(df)/df.groupby('Country')['Population'].max().sum()
def get_mortality(df):
    return get_deaths(df)/df.groupby('Country')['Population'].max().sum()
def get_lethality(df):
    return get_cases(df)/get_deaths(df)

def get_kpi(indicator, title):
    fig = go.Figure(go.Indicator(
        mode = "number",
        value = indicator,
        number = {'font':{'size':56}, 'font_color':'black'},
        title = {'text': title, 'font_size':30, 'font_color':'black'},
        domain = {'x': [0, 1], 'y': [0, 1]}
    ), layout=go.Layout(
    autosize=True))
    return fig


In [33]:
fig = get_kpi(get_cases(covid_df), 'Cases')
fig.show()

In [35]:
fig = get_kpi(get_deaths(covid_df.query("Country == 'Brazil'")), 'Deaths in Brazil')
fig.show()

In [37]:
fig = get_kpi(get_incidence(covid_df.query("Country == 'US'")), 'Incidence in The US')
fig.show()

In [41]:
fig = get_kpi(get_mortality(covid_df.query("Country == 'San Marino'")), 'Mortality in San Marino')
fig.show()

In [48]:
fig = get_kpi(get_lethality(covid_df.query("Date < '2021-01-01'")), 'Lethality in 2020')
fig.show()

Now let us get bubble map with tha case as the bubble size and the deaths as its color.

In [60]:
def cases_deaths_by_country(df):
    return df.groupby(['Code', 'Country'])[['New_Cases_Confirmed', 'New_Cases_Death']].sum().reset_index()

def get_bubble_map(df, title):
    fig = px.scatter_geo(df, locations="Code", color="New_Cases_Confirmed",
                     hover_name="Country", size="New_Cases_Death",
                     color_continuous_scale=px.colors.sequential.YlOrRd,
                     title_text = '2014 US city populations<br>(Click legend to toggle traces)',
                     projection="natural earth")
    return fig

In [61]:
get_bubble_map(cases_deaths_by_country(covid_df), "Teste")

TypeError: scatter_geo() got an unexpected keyword argument 'title_text'