<a href="https://colab.research.google.com/github/sidbannet/COVID-19_analysis/blob/master/COVID_19_dashboard.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Author Bio

[LinkedIn profile](https://www.linkedin.com/in/sidban)

[Resume](https://docs.google.com/document/d/1uVc9le7LM2WMmGM4ub9w2uI1FY7I63h7SBxNGyIItWc/edit?usp=sharing)

[GitHub](https://github.com/sidbannet?tab=repositories)

---
I develop **high-performance computation models** to understand *turbulence flow*, *multi-phase flow* and *combustion flames*. I apply **data-science** to accelerate design innovations in *propulsion* device.

I received **PhD** from **University of Wisconsin - Madison** in 2011 with major in **Mechanical and Chemical Engineering** and distributed minor in *Mathamatics*, *Statistics* and *Computer Science*.

I received recognitions for my work in clean propulsion innovation from [United States Deparment Of Energy](https://www.energy.gov/eere/vehicles/vehicle-technologies-office) and [Dr. Stephen Chu](https://en.wikipedia.org/wiki/Steven_Chu).

# About COVID-19 tracker


---
This is 2019 Novel Coronavirus Visual Dashboard operated by [Siddhartha Banerjee](https://www.linkedin.com/in/sidban) by using data published by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).

**Data Sources**:

World Health Organization (WHO): https://www.who.int/ DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia. BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/ National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html Macau Government: https://www.ssm.gov.mo/portal/ Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0 US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html Government of Canada: https://www.canada.ca/en/public-health/services/diseases/coronavirus.html Australia Government Department of Health: https://www.health.gov.au/news/coronavirus-update-at-a-glance European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19 Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus 1Point3Arces: https://coronavirus.1point3acres.com/en WorldoMeters: https://www.worldometers.info/coronavirus/

Additional Information about the Visual Dashboard: https://systems.jhu.edu/research/public-health/ncov/

**Contact Me**: 
sidban@uwalumni.com


# Clone the Git repository

---

*   Clone code and data repository
*   Setup environmental variables

In [1]:
# Clone repository
!git clone https://github.com/sidbannet/COVID-19_analysis.git
%cd ./COVID-19_analysis
!git submodule init
!git submodule update --remote

Cloning into 'COVID-19_analysis'...
remote: Enumerating objects: 208, done.[K
remote: Counting objects: 100% (208/208), done.[K
remote: Compressing objects: 100% (124/124), done.[K
remote: Total 569 (delta 119), reused 159 (delta 81), pack-reused 361[K
Receiving objects: 100% (569/569), 14.45 MiB | 22.45 MiB/s, done.
Resolving deltas: 100% (327/327), done.
/content/COVID-19_analysis
Submodule 'JHU_repo' (https://github.com/sidbannet/COVID-19.git) registered for path 'JHU_repo'
Cloning into '/content/COVID-19_analysis/JHU_repo'...
Submodule path 'JHU_repo': checked out '703bda857cb0adbc804818ec090fed39627fb86e'


# Load packages
---
*   Analysis tools
*   Plotting tools

In [0]:
# Import nessesary modules
from tools import collection as cll
import plotly.tools as tls
from plotly.offline import iplot
import plotly.express as px

# Parse data from the database
---

*   Setup classes containing the data and methods to parse data
*   Parse the data



In [0]:
# Setup data class and parse the database
d = cll.DataClass()
d.parse()

In [0]:
d._parse_timeseries_()
df_us = d.df_geo_us

Optional plots to check out the trends by comparing the rate of increase of COVID cases against the total number of COVID cases

In [0]:
#@title
# Setup figure objects
d.__window__ = 5
fig, ax = d.plots()
[axes.legend() for axes in ax.flat]
fig.set_size_inches(w=24, h=12)

# Plot COVID-19 trends
---
Plot COVID-19 time series data per country and per state basis with
*   Number of confirmed cases
*   Number of deaths from COVID
*   Number of recovered from COVID

Plot these variables in log scale to highlight exponential growth in pandemic against days since initial outbreak.

In [0]:
# Make some meaningful timeseries plots
fig, ax = d.plots_timeseries(
  n_outbreak=500, n_filter_country=10000, n_filter_state=5000)
fig.set_size_inches(w=24, h=12)
_ = [axes.set_ylim([10, 50000]) for axes in ax[:, 1].flat]
_ = ax[0, 0].set_xlim([0, 50])
_ = ax[0, 0].get_legend().remove()
_ = ax[0, 1].get_legend().remove()
_ = ax[1, 0].get_legend().remove()
_ = ax[1, 1].get_legend().remove()

In [0]:
# Convert and plot in plotly
plotly_fig = tls.mpl_to_plotly(fig) 
iplot(plotly_fig)

# Global spread of COVID-19

In this `geoscatter` animation image below, the bubble size represents the reported number of cases with COVID-19. The color of the bubble representes the daily growth rate in the number of cases. The animation frame represents the date starting from Jan 22, 2020.

In [0]:
# Data frame customized for plotly express geo-scatter
df_global = d.df_global.copy()
date_time = [str(date) for date in df_global.date]
date_str = [str.split(date, ' ')[0] for date in date_time]
df_global['Date'] = date_str

In [0]:
# Geo scatter of confirmed cases
fig = px.scatter_geo(df_global, locations="iso_alpha", color="rate",
                     color_continuous_scale='jet', range_color=[1.0, 2.0],
                     hover_name="country", size="confirmed",
                     animation_frame="Date",
                     title='Confirmed case',
                     size_max=int(80),
                     width=2000, height=1000,
                     projection="natural earth")
fig.show()

In [0]:
# Geo scatter of deaths
fig = px.scatter_geo(df_global, locations="iso_alpha", color="rate",
                     color_continuous_scale='jet', range_color=[1.0, 2.0],
                     hover_name="country", size="death",
                     animation_frame="Date",
                     title='Deaths',
                     size_max=int(80),
                     width=2000, height=1000,
                     projection="natural earth")
fig.show()

# Animated bubble map of US

Animated COVID-19 dashboard for US. Scatter bubbles based on normalized number of cases and deaths per county basis.


---

Minor correction is made to the population of [Dukes and Nantucket, Massachusetts, US](https://goo.gl/maps/wC7xFAh2zyVoxM2S6) in the database done based on available data. Population of Dukes and Nantucket is corrected to be [17352](https://www.sec.state.ma.us/census2020/dukes-nantucket.html).

In [0]:
#@title
df_us.Population.values[cll.np.where(df_us.Key == 'Dukes and Nantucket,Massachusetts,US')] = 17352

df_us.Number_Cases_per_1mil.values[cll.np.where(df_us.Key == 'Dukes and Nantucket,Massachusetts,US')] = \
    df_us.Confirmed.values[cll.np.where(df_us.Key == 'Dukes and Nantucket,Massachusetts,US')] \
    / df_us.Population.values[cll.np.where(df_us.Key == 'Dukes and Nantucket,Massachusetts,US')]

for inum, value in enumerate(df_us.Number_Cases_per_1mil.values):
    df_us.Number_Cases_per_1mil.values[inum] = cll.np.floor(value)

In [0]:
#@title
date_time = [str(date) for date in df_us.Date]
date_str = [str.split(date, ' ')[0] for date in date_time]
df_us.Date = date_str

**Animated geo-scatter map of confirmed cases**

Animation showing spread of COVID-19 across various locations in USA based on officially reported data. Plot bubble size based on normalized confirmed cases colored by spread rate over 3 day period.

In [0]:
fig = px.scatter_geo(df_us,
                     lat="Lat", lon="Long",
                     color="Rate",
                     color_continuous_scale='jet', range_color=[1.0, 2.0],
                     hover_name="Key", size="Number_Cases_per_1mil",
                     animation_frame="Date",
                     title='Confirmed Cases per 1 mil population',
                     size_max=int(8000000),
                     width=2000, height=1000,
                     scope = 'usa',
                     projection="albers usa")
fig.show()

**Animated geo-scatter map of deaths**

Animation showing spread of COVID-19 across various location in USA based on reported deaths from COVID-19. Plot of bubble size based on number of deaths and colored by reported mortality rate. Reported mortality rate is based on deaths and confirmed cases (as reported by local county basis).

In [0]:
# Doing some pre-analysis
df_us['Norm_Death'] = (df_us.Death / (df_us.Population + 0.0001)) * 1e6

In [0]:
fig = px.scatter_geo(df_us,
                     lat="Lat", lon="Long",
                     color="Mortality",
                     color_continuous_scale='jet', range_color=[1.0, 10.0],
                     hover_name="Key", size="Norm_Death",
                     animation_frame="Date",
                     title='Deaths per million in county',
                     size_max=int(8000000),
                     width=2000, height=1000,
                     scope = 'usa',
                     projection="albers usa")
fig.show()