# Covid-19 Analysis to visualize and understand the pandemic cases using different Techniques. 

Coronavirus disease 2019 (COVID-19) time series lists confirmed cases, reported deaths, active cases and comparison with other Epidemics. 
Data are disaggregated by country (and sometimes subregion). 
Coronavirus disease (COVID-19) is caused by the
[Severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2)](https://en.wikipedia.org/wiki/Severe_acute_respiratory_syndrome_coronavirus_2) and has had a worldwide effect.

This notebook uses data from various sources to understand, analyze and visualize the changes in the number of cases through different visualization techniques and plots.

This dataset includes time-series data tracking the number of people affected by COVID-19 worldwide, including:

- confirmed tested cases of Coronavirus infection
- number of people who have reportedly died due to Coronavirus
- number of people who have reportedly recovered from it


**Note:** The Data collection for **Recovered cases** isn't quite accurate and has been stopped by a lot of countries and it is also found discrepancies in data if taken from multiple sources. Also a lot of recovery cases aren't reported and it is not possible to analyze them accurately. Yet the data reported till July 2020, is quite accurate to understand. You can select from Jan2020 to Jan 2021 to clearly see the transition in all the graphs.  

Therefore the following two scenarios should also be kept in mind:
- Active Cases are calculated based on recoveries so this will not be also correct
- Anything that requires recoveries in the calculation isn't accurate, like deaths/100recoveries, etc.

**Additional References** for the above point:
- https://www.dallasnews.com/news/public-health/2020/05/19/why-arent-coronavirus-recoveries-always-reported/
- https://abc11.com/nc-coronavirus-recovery-cases-update/6127051/

Most of these reports are based on the US but mostly it is true for the rest of the world.

In [None]:
pip install folium

In [None]:
pip install plotly

In [None]:
#required packages and libraries

import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots

import folium

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

import math
import random
from datetime import timedelta

import warnings
warnings.filterwarnings('ignore')

# color pallette
cnf ='#393e46'
dth ='#ff2e63'
rec ='#21bf73'
act ='#fe9801'


## Dataset Preparation

In [None]:
import plotly as py
py.offline.init_notebook_mode(connected = True)

In [None]:
import os

In [None]:
try:
    os.system("rm -rf Covid-19-Preprocessed-Dataset")
except:
    print('File does not exist')

In [None]:
# cloning the data 
!git clone https://github.com/laxmimerit/Covid-19-Preprocessed-Dataset.git

In [None]:
# Understanding the Data
df = pd.read_csv('Covid-19-Preprocessed-Dataset/preprocessed/covid_19_data_cleaned.csv', parse_dates=['Date'])
df

In [None]:
# We need to handle the NaN values which are there under the Province/State column
df['Province/State'] = df['Province/State'].fillna("")
df

In [None]:
#importing the country_daywise, countrywise, daywise dataset
country_daywise = pd.read_csv('Covid-19-Preprocessed-Dataset/preprocessed/country_daywise.csv', parse_dates=['Date'])
countrywise = pd.read_csv('Covid-19-Preprocessed-Dataset/preprocessed/countrywise.csv')
daywise = pd.read_csv('Covid-19-Preprocessed-Dataset/preprocessed/daywise.csv', parse_dates=['Date'])

In [None]:
country_daywise

## Worldwide Total Confirmed, Recovered Cases and Deaths

In [None]:
#looking at the number of confirmed cases
confirmed = df.groupby('Date').sum()['Confirmed'].reset_index()
confirmed

In [None]:
#looking at the recovered cases
recovered = df.groupby('Date').sum()['Recovered'].reset_index()
recovered

In [None]:
#looking at the number of death cases
deaths = df.groupby('Date').sum()['Deaths'].reset_index()
deaths

In [None]:
#check for any null values
df.isnull().sum()

In [None]:
df.query('Country == "US"')

## Scatterplot Visualization

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x = confirmed['Date'], y= confirmed['Confirmed'], mode='lines+markers', name='Confirmed Cases', line=dict(color='Orange',width =2)))
fig.add_trace(go.Scatter(x = recovered['Date'], y= recovered['Recovered'], mode='lines+markers', name='Recovered Cases', line=dict(color='Green', width =2)))
fig.add_trace(go.Scatter(x = deaths['Date'], y= deaths['Deaths'], mode='lines+markers', name='Deaths', line=dict(color='Red', width=2)))

fig.update_layout(title = "WorldWide Covid-19 Cases", xaxis_tickfont_size=14, yaxis= dict(title='Number of Cases'))
fig.show()

## Cases Density Animation On World Map

In [None]:
#convert date to string
df['Date'] = df['Date'].astype(str)

In [None]:
df.info()

In [None]:
#using plotly express to display a density map
fig =px.density_mapbox(df, lat='Lat', lon='Long', hover_name='Country', hover_data=['Confirmed', 'Recovered', 'Deaths'], animation_frame='Date',color_continuous_scale='Portland', radius=7, zoom=0,height=700)
fig.update_layout(title='WorldWide Covid-19 Cases with TimeLapse')
fig.update_layout(mapbox_style='open-street-map', mapbox_center_lon=0)
fig.show()

## Cases over a time period with Area Plot

In [None]:
df['Date'] =pd.to_datetime(df['Date'])
df.info()

In [None]:
temp = df.groupby('Date')['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()
temp = temp[temp['Date']==max(temp['Date'])].reset_index(drop =True)
tm =temp.melt(id_vars = 'Date', value_vars = ['Active', 'Deaths', 'Recovered'])
fig = px.treemap(tm , path = ['variable'], values='value', height = 250, width = 800, color_discrete_sequence=[act, dth, rec])
fig.data[0].textinfo ='label+text+value'
fig.show()

In [None]:
temp = df.groupby('Date')['Recovered', 'Deaths', 'Active'].sum().reset_index()
temp = temp.melt(id_vars = 'Date', value_vars =['Recovered', 'Deaths', 'Active'], var_name = 'Case', value_name = 'Count')

fig = px.area(temp, x='Date', y='Count', color='Case', height=500, title='Cases Over Time', color_discrete_sequence=[rec, dth, act])
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

## Cases visualization using Folium Maps

In [None]:
#Worldwide cases on Folium Maps
temp =df[df['Date'] == max(df['Date'])]

m =folium.Map(location =[0,0], tiles='cartodbpositron', min_zoom =1, max_zoom= 4, zoom_start=1)

for i in range(0, len(temp)):
    folium.Circle(location = [temp.iloc[i]['Lat'], temp.iloc[i]['Long']], color='crimson', fill =  'crimson', 
                tooltip = '<li><bold> Country: ' + str(temp.iloc[i]['Country'])+
                          '<li><bold> Province: ' + str(temp.iloc[i]['Province/State'])+
                          '<li><bold> Confirmed: ' + str(temp.iloc[i]['Confirmed'])+
                          '<li><bold> Deaths: ' + str(temp.iloc[i]['Deaths']),
                radius = int(temp.iloc[i]['Confirmed'])**0.5).add_to(m)

m

## Confirmed Cases with Choropleth Map

In [None]:
fig = px.choropleth(country_daywise, locations='Country', locationmode='country names', color=country_daywise['Confirmed'],
                   hover_name = 'Country', animation_frame=country_daywise['Date'].dt.strftime('%Y-%m-%d'),
                   title = 'Cases Over Time in Different Countries', color_continuous_scale=px.colors.sequential.Inferno)

fig.update(layout_coloraxis_showscale=True)
fig.show()

## Confirmed & Death Cases using Bar plots

In [None]:
fig_c =px.bar(daywise, x='Date', y='Confirmed', color_discrete_sequence=[act])
fig_d =px.bar(daywise, x='Date', y='Deaths', color_discrete_sequence=[dth])

fig = make_subplots(rows =1, cols =2, shared_xaxes=False, horizontal_spacing= 0.1,
                   subplot_titles=('Confirmed Cases', 'Death Cases'))

fig.add_trace(fig_c['data'][0], row=1, col=1)
fig.add_trace(fig_d['data'][0], row=1, col=2)

fig.update_layout(height=400)
fig.show()


## Confirmed & Death Cases with Static Colormap

In [None]:
fig_c = px.choropleth(countrywise, locations='Country', locationmode='country names',
                     color=np.log(countrywise['Confirmed']), hover_name='Country', hover_data=['Confirmed'])

temp = countrywise[countrywise['Deaths']> 0]
fig_d = px.choropleth(temp, locations='Country', locationmode='country names',
                     color=np.log(temp['Deaths']), hover_name='Country', hover_data=['Deaths'])

fig = make_subplots(rows=1, cols=2, subplot_titles=['Confirmed Cases', 'Deaths'], 
                   specs=[[{'type':'choropleth'},{'type':'choropleth'}]])

fig.add_trace(fig_c['data'][0], row=1, col=1)
fig.add_trace(fig_d['data'][0], row=1, col=2)

fig.update(layout_coloraxis_showscale=False)

fig.show()

## Deaths & Recoveries per 100 cases

**Note:** The Data is only accurate till July 2021, as after that time the data for recovery cases were not reported or were not being tracked by the institutes. And it is difficult to predict the number based on it.

In [None]:
daywise.columns

In [None]:
fig1 = px.line(daywise, x='Date', y='Deaths / 100 Cases', color_discrete_sequence=[dth])
fig2 = px.line(daywise, x='Date', y='Recovered / 100 Cases', color_discrete_sequence=[rec])
fig3 = px.line(daywise, x='Date', y='Deaths / 100 Recovered', color_discrete_sequence=[rec])

fig = make_subplots(rows =1, cols=3, shared_xaxes=False, 
                   subplot_titles=('Deaths / 100 Cases','Recovered / 100 Cases','Deaths / 100 Recovered'))

fig.add_trace(fig1['data'][0], row=1,col=1)
fig.add_trace(fig2['data'][0], row=1,col=2)
fig.add_trace(fig3['data'][0], row=1,col=3)

fig.update_layout(height=400)
fig.show()

## Number of new cases per Day & countries affected

In [None]:
fig_c = px.bar(daywise, x='Date', y='Confirmed', color_discrete_sequence=[act])
fig_d = px.bar(daywise, x='Date', y='No. of Countries', color_discrete_sequence=[dth])

fig =make_subplots(rows =1, cols=2, shared_xaxes=False, horizontal_spacing=0.1, subplot_titles=('No. of New Cases per Day', 'No. of Countries'))

fig.add_trace(fig_c['data'][0], row=1, col=1)
fig.add_trace(fig_d['data'][0], row=1, col=2)

fig.show()

## Top 15 Countries Case Analysis

In [None]:
countrywise.columns

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

In [None]:
top =15

#fig for confirmed cases
fig_c = px.bar(countrywise.sort_values('Confirmed').tail(top), x='Confirmed', y='Country',
              text = 'Confirmed', orientation='h', color_discrete_sequence=[cnf])

#fig for death cases
fig_d = px.bar(countrywise.sort_values('Deaths').tail(top), x='Deaths', y='Country',
              text = 'Deaths', orientation='h', color_discrete_sequence=[dth])

#fig for active cases
fig_a = px.bar(countrywise.sort_values('Active').tail(top), x='Active', y='Country',
              text = 'Active', orientation='h', color_discrete_sequence=[act])

#for recovered cases but due to discrepancy and no data available commented this part
# fig_r = px.bar(countrywise.sort_values('Recovered').tail(top), x='Recovered', y='Country',
#               text = 'Recovered', orientation='h', color_discrete_sequence=[rec])

#note we haven't plotted the fig's for any scenario where recovered cases is being used to calculate the result to avoid irregularity

#fig for deaths / 100 cases
fig_dc = px.bar(countrywise.sort_values('Deaths / 100 Cases').tail(top), x='Deaths / 100 Cases', y='Country',
              text = 'Deaths / 100 Cases', orientation='h', color_discrete_sequence=['#ff0000'])


#fig for new cases country wise
fig_nc = px.bar(countrywise.sort_values('New Cases').tail(top), x='New Cases', y='Country',
              text = 'New Cases', orientation='h', color_discrete_sequence=['#944dff'])

temp= countrywise[countrywise['Population']>1000000]
#fig for cases per million people
fig_p = px.bar(temp.sort_values('Cases / Million People').tail(top), x='Cases / Million People', y='Country',
              text = 'Cases / Million People', orientation='h', color_discrete_sequence=['#3366ff'])

#fig for 1 week changes
fig_ow = px.bar(countrywise.sort_values('1 week change').tail(top), x='1 week change', y='Country',
              text = '1 week change', orientation='h', color_discrete_sequence=['#ff6600'])

#fig for 1 week % increase
tem= countrywise[countrywise['Confirmed']>100]
fig_op = px.bar(tem.sort_values('1 week % increase').tail(top), x='1 week % increase', y='Country',
              text = '1 week % increase', orientation='h', color_discrete_sequence=['#990033'])



fig= make_subplots(rows=4 , cols=2, shared_xaxes=False, horizontal_spacing= 0.2, vertical_spacing=.05,
                  subplot_titles=('Confirmed Cases', 'Deaths Reported', 'Active Cases', 'Deaths / 100 Cases'
                                 ,'New Cases', 'Cases / Million People', '1 week change', '1 week % increase'))

fig.add_trace(fig_c['data'][0], row=1, col=1)
fig.add_trace(fig_d['data'][0], row=1, col=2)

fig.add_trace(fig_a['data'][0], row=2, col=1)
fig.add_trace(fig_dc['data'][0], row=2, col=2)


fig.add_trace(fig_nc['data'][0], row=3, col=1)
fig.add_trace(fig_p['data'][0], row=3, col=2)


fig.add_trace(fig_ow['data'][0], row=4, col=1)
fig.add_trace(fig_op['data'][0], row=4, col=2)


fig.update_layout(height=3000)
fig.show()

## Scatter Plot for Deaths Vs Confirmed Cases

In [None]:
countrywise.sort_values('Deaths', ascending=False).head(15)

In [None]:
top =15
fig = px.scatter(countrywise.sort_values('Deaths', ascending=False).head(top), 
                x = 'Confirmed', y='Deaths', color='Country', height=600, size='Confirmed',
                text='Country', log_x = True, log_y = True, title='Deaths vs Confirmed Cases(Cases are on log10 scale)')

fig.update_traces(textposition= 'top center')
fig.update_layout(showlegend = True)
fig.update_layout(xaxis_rangeslider_visible = True)
fig.show()

## Stacked Bar Plot

In [None]:
fig = px.bar(country_daywise, x='Date', y='Confirmed', color='Country', height=600,
            title='Confirmed Cases', color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()

In [None]:
fig = px.bar(country_daywise, x='Date', y='Deaths', color='Country', height=600,
            title='Deaths', color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()

In [None]:
fig = px.bar(country_daywise, x='Date', y='New Cases', color='Country', height=600,
            title='New Cases', color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()

## Line Plot

In [None]:
fig = px.line(country_daywise, x ='Date', y='Confirmed', color='Country', height=600, title='Confirmed Cases',
             color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()

In [None]:
fig = px.line(country_daywise, x ='Date', y='Deaths', color='Country', height=600, title='Deaths',
             color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()

In [None]:
fig = px.line(country_daywise, x ='Date', y='New Cases', color='Country', height=600, title='New Cases',
             color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()

## Growth rate after 100 Cases

In [None]:
df['Date'] =pd.to_datetime(df['Date'])
df.info()

In [None]:
gt_100 = country_daywise[country_daywise['Confirmed']>100]['Country'].unique()

temp = df[df['Country'].isin(gt_100)]

temp = temp.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp =temp[temp['Confirmed']>100]

min_date = temp.groupby('Country')['Date'].min().reset_index()
min_date.columns = ['Country', 'Min Date']

from_100th_case = pd.merge(temp, min_date, on ='Country')
from_100th_case['N days'] = (from_100th_case['Date'] - from_100th_case['Min Date']).dt.days

fig = px.line(from_100th_case, x = 'N days', y='Confirmed', color='Country', title='N Days From 100 Cases', height=600)
fig.show()

### Growth rate after 1000 Cases

In [None]:
gt_1000 = country_daywise[country_daywise['Confirmed']>1000]['Country'].unique()
temp = df[df['Country'].isin(gt_1000)]

temp = temp.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp =temp[temp['Confirmed']>1000]

min_date = temp.groupby('Country')['Date'].min().reset_index()
min_date.columns = ['Country', 'Min Date']

from_1000th_case = pd.merge(temp, min_date, on ='Country')
from_1000th_case['N days'] = (from_1000th_case['Date'] - from_1000th_case['Min Date']).dt.days

fig = px.line(from_1000th_case, x = 'N days', y='Confirmed', color='Country', title='N Days From 1000 Cases', height=600)
fig.show()

### Growth rate after 100K Cases

In [None]:
gt_100000 = country_daywise[country_daywise['Confirmed']>100000]['Country'].unique()
temp = df[df['Country'].isin(gt_100000)]

temp = temp.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp =temp[temp['Confirmed']>100000]

min_date = temp.groupby('Country')['Date'].min().reset_index()
min_date.columns = ['Country', 'Min Date']

from_100000th_case = pd.merge(temp, min_date,on ='Country')
from_100000th_case['N days'] = (from_100000th_case['Date'] - from_100000th_case['Min Date']).dt.days

fig = px.line(from_100000th_case, x = 'N days', y='Confirmed', color='Country', title='N Days From 100K Cases', height=600)
fig.show()

### Growth rate after 1Million Cases

In [None]:
gt_1000000 = country_daywise[country_daywise['Confirmed']>1000000]['Country'].unique()
temp = df[df['Country'].isin(gt_1000000)]

temp = temp.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp =temp[temp['Confirmed']>1000000]

min_date = temp.groupby('Country')['Date'].min().reset_index()
min_date.columns = ['Country', 'Min Date']

from_1000000th_case = pd.merge(temp, min_date,on ='Country')
from_1000000th_case['N days'] = (from_1000000th_case['Date'] - from_1000000th_case['Min Date']).dt.days

fig = px.line(from_1000000th_case, x = 'N days', y='Confirmed', color='Country', title='N Days From 1Million Cases', height=600)
fig.show()

### Growth rate after 10Million Cases

In [None]:
gt_10000000 = country_daywise[country_daywise['Confirmed']>10000000]['Country'].unique()
temp = df[df['Country'].isin(gt_10000000)]

temp = temp.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp =temp[temp['Confirmed']>10000000]

min_date = temp.groupby('Country')['Date'].min().reset_index()
min_date.columns = ['Country', 'Min Date']

from_10000000th_case = pd.merge(temp, min_date,on ='Country')
from_10000000th_case['N days'] = (from_10000000th_case['Date'] - from_10000000th_case['Min Date']).dt.days

fig = px.line(from_10000000th_case, x = 'N days', y='Confirmed', color='Country', title='N Days From 10Million Cases', height=600)
fig.show()

## Tree Map Analysis

In [None]:
# Confirmed Cases
full_latest = df[df['Date'] == max(df['Date'])]

fig = px.treemap(full_latest.sort_values(by='Confirmed', ascending=False).reset_index(drop=True),
                path = ['Country', 'Province/State'], values='Confirmed', height=700,
                title='Number of Confirmed Cases',
                color_discrete_sequence=px.colors.qualitative.Dark2)

fig.data[0].textinfo= 'label+text+value'
fig.show()

In [None]:
#deaths
full_latest = df[df['Date'] == max(df['Date'])]

fig = px.treemap(full_latest.sort_values(by='Deaths', ascending=False).reset_index(drop=True),
                path = ['Country', 'Province/State'], values='Deaths', height=700,
                title='Number of Deaths',
                color_discrete_sequence=px.colors.qualitative.Dark2)

fig.data[0].textinfo= 'label+text+value'
fig.show()

## First & Last Case Report Time

In [None]:
first_date= df[df['Confirmed']>0]
first_date = first_date.groupby('Country')['Date'].agg(['min']).reset_index()


last_date= df.groupby(['Country', 'Date'])['Confirmed', 'Deaths']
last_date = last_date.sum().diff().reset_index()


mask = (last_date['Country'] != last_date['Country'].shift(1))

last_date.loc[mask, 'Confirmed'] = np.nan
last_date.loc[mask, 'Deaths'] = np.nan

last_date = last_date[last_date['Confirmed']>0]
last_date = last_date.groupby('Country')['Date'].agg(['max']).reset_index()

first_last = pd.concat([first_date, last_date['max']], axis=1)
first_last['max'] = first_last['max'] + timedelta(days=1)

first_last['Days'] = first_last['max'] - first_last['min']
first_last['Task'] = first_last['Country'] 

first_last.columns = ['Country', 'Start', 'Finish', 'Days', 'Task']

first_last = first_last.sort_values('Days')

colors = ['#' + ''.join([random.choice('0123456789ABCDEF') for j in range(6)]) for i in range(len(first_last))]


fig = ff.create_gantt(first_last, index_col ='Country', colors=colors, show_colorbar=False,
                     bar_width =0.2, showgrid_x = True, showgrid_y = True, height= 2500)
fig.show()

## Confirmed Cases Country & Day Wise

In [None]:
temp = country_daywise.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp = temp[temp['Country'].isin(gt_10000000)]

countries= temp['Country'].unique()


ncols = 3
nrows = math.ceil(len(countries)/ncols)

fig = make_subplots(rows = nrows, cols= ncols, shared_xaxes= False, subplot_titles= countries)

for ind, country in enumerate(countries):
    row = int((ind/ncols)+1)
    col = int((ind%ncols)+1)
    fig.add_trace(go.Bar(x= temp['Date'], y=temp.loc[temp['Country']== country, 'Confirmed'], name=country), row= row, col=col)
    
fig.update_layout(height=4000, title_text='Confirmed Cases in each Country')    
fig.update_layout(showlegend=False)
fig.show()


## Covid-19 Vs Other Similar Epidemics

In [None]:
#source for the number wikipedia
epidemics = pd.DataFrame({
    'epidemic' :['COVID-19', 'SARS', 'EBOLA', 'MERS', 'H1N1'],
    'start_year' : [2019, 2002, 2013, 2012, 2009],
    'end_year' :[2020, 2004, 2016, 2020, 2010],
    'confirmed' :[full_latest['Confirmed'].sum(), 8422, 28646, 2519, 6724149],
    'deaths': [full_latest['Deaths'].sum(), 813, 11323, 866, 19654]
})


In [None]:
#calculating mortality rate
epidemics['mortality'] =round((epidemics['deaths']/epidemics['confirmed'])*100, 2)

temp = epidemics.melt(id_vars='epidemic', value_vars=['confirmed', 'deaths', 'mortality'], 
                     var_name='Case', value_name='Value')

fig= px.bar(temp, x='epidemic', y='Value', color= 'epidemic', text='Value', facet_col='Case',
           color_discrete_sequence=px.colors.qualitative.Bold)

fig.update_traces(textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode= 'hide')
fig.update_yaxes(showticklabels= False)
fig.layout.yaxis2.update(matches = None)
fig.layout.yaxis3.update(matches = None)
fig.show()