# **A Data Science Primer in Pandemic Modelling for Everyone**
## Visualizing India's COVID19 Response

<img src="https://thelogicalindian.com/h-upload/2020/03/25/170266-locweb.jpg" width="500" />
![]()

 #### Image Courtesy: [Pixabay](https://pixabay.com/illustrations/virus-pathogen-infection-biology-4931041/?fbclid=IwAR0NV63gE8MmLUGnSsZ9tWb2TC7_xK_OTqmqxBAXuC_jdYz-T9pvSB0dPqk) 
    


## Parag Mantri, PhD
### Principal Scientist
![](https://upload.wikimedia.org/wikipedia/en/4/4d/International_School_of_Engineering_Logo.png)


* [Data and Libraries](#LibDat)
    
1. Standard Data
    * [Basic Bar Chart](#Basic-BarChart)
        - [Include Population](#IncludePop)
    * [Data on Map](#WorldMap)
    
2. Time Series Data
    * [TimeSeries](#TimeSeries)

3. Test Data
    * [Testing vs Positive ](#DoublingTime)

<a id="LibDat"></a>
## Libraries and Data

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import plotly.express as px

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load



# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<a id="BasicBarChart"></a>
# Basic Bar Chart Comparison

In [None]:
df=pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv')
df.head()


In [None]:
df_latest = df[df['ObservationDate'] == max(df['ObservationDate'])].reset_index()
df_latest=df_latest.groupby(["Country/Region"]).sum().reset_index().drop('SNo',axis=1)
df_latest.head()

In [None]:
df_deaths_top10=df_latest.sort_values("Deaths",ascending=False).head(10)
plt.figure(figsize=(12,10))
sns.barplot(x='Country/Region',y='Deaths',data=df_deaths_top10)
plt.xticks(rotation=30)
plt.show()

<a id="IncludePop"></a>
## Include Population

In [None]:
popdf=pd.read_csv('/kaggle/input/population-by-country-2020/population_by_country_2020.csv')
popdf.head()


In [None]:
popdf_select=popdf.loc[popdf['Country (or dependency)'].isin(['United States',
                                                              'United Kingdom',
                                                              'Italy',
                                                              'France', 
                                                              'Spain',
                                                              'India',
                                                              'China',
                                                              'Pakistan',
                                                              'South Korea',
                                                            'Iran'])]
popdf_select=popdf_select.replace({'Country (or dependency)': {'United States': 'US', 'United Kingdom': 'UK'}})
popdf_select=popdf_select.rename(columns={"Country (or dependency)": "Country/Region"})
popdf_select


In [None]:
df_deaths_select=df_latest.loc[df_latest['Country/Region'].isin(['US',
                                                              'UK',
                                                              'Italy',
                                                              'France', 
                                                              'Spain',
                                                              'India',
                                                              'Mainland China',
                                                              'Pakistan',
                                                              'South Korea',
                                                            'Iran'])]
df_deaths_select=df_deaths_select.replace({'Country/Region': {'Mainland China': 'China'}})
df_deaths_select

In [None]:
merged_df = df_deaths_select.merge(popdf_select, how = 'inner', on = ['Country/Region'])
merged_df['DPM']=merged_df['Deaths']/merged_df['Population (2020)']*1e6
merged_df=merged_df.sort_values('DPM',ascending=False)
merged_df

In [None]:
plt.figure(figsize=(12,10))
sns.barplot(x='Country/Region',y='DPM',data=merged_df)

<a id="WorldMap"></a>
# World Map

In [None]:
fig = px.choropleth(df_latest, locations="Country/Region",
                    color=df_latest["Deaths"], 
                    hover_name="Country/Region", 
                    hover_data=["Deaths"],
                    locationmode="country names")

fig.update_layout(title_text="Confirmed Cases Heat Map (Log Scale)")
fig.update_coloraxes(colorscale="reds")

fig.show()

<a id="TimeSeries"></a>
# Time Series

In [None]:
df1=pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_deaths.csv')
df1.head()

In [None]:
df1=df1.groupby(['Country/Region']).sum().drop(['Lat','Long'],axis=1)
#df2.reset_index(inplace=True)
df1.head()

In [None]:
df2=df1.loc[df1.index.isin(['India','US','Pakistan','China','Italy','Sri Lanka','Singapore','Korea, South'])]
plt.figure(figsize=(12,8))
upto_days=90
for i,country in enumerate(df2.index):
    death = df2.loc[df2.index== country].values[0]
    death = death[death>1][:upto_days]
    day = np.arange(len(death))
    plt.plot(day,death,label = country)

    
    
plt.yscale("log")
plt.legend(loc = "upper left")
plt.tick_params(labelsize = 14)        
plt.show()

<a id="DoublingTime"></a>
# Testing


In [None]:
df_test=pd.read_csv('/kaggle/input/covid19-tests-conducted-by-country/Tests_conducted_11May2020.csv')
df_test_select=df_test[df_test['Country'].isin(['United States',
                                              'United Kingdom',
                                                'Italy',
                                                'India',
                                                              'Pakistan',
                                                              'South Korea',
                                                            'Iran'])]
df_test_select

In [None]:

fig,ax=plt.subplots(figsize=(12,10))
sns.scatterplot(x='Tested /millionpeople',y='Positive /millionpeople',hue='Country',size='Tested',sizes=(100,1500),data=df_test_select)

h,l = ax.get_legend_handles_labels()

# COLOR LEGEND (FIRST 30 ITEMS)
col_lgd = plt.legend(h[:8], l[:8], loc=3, 
                     bbox_to_anchor=(0., 1.12, 1., 1.12), prop={'size': 15},ncol=2)

# SIZE LEGEND (LAST 5 ITEMS)
#size_lgd = plt.legend(h[-5:], l[-5:], loc='lower center', borderpad=1.6, prop={'size': 20},
#                      bbox_to_anchor=(0.5,-0.45), fancybox=True, shadow=True, ncol=5)

# ADD FORMER (OVERWRITTEN BY LATTER)
#plt.gca().add_artist(col_lgd)

plt.title('Tested Vs Positive Per Million (Size = Total Tests)',size=24)
ax.tick_params(labelsize=18,size=2)
plt.xlabel('Test per Million',size=24)
plt.ylabel('Positive per Million',size=24)
plt.show()


# Instructor Profile

<img src="https://lh3.googleusercontent.com/pw/ACtC-3d1oSRtRbmd3LYkzC7ZsmxDifJCmxItOiuSKNw5whuGgzgVeHdQAp9Pc-NjphoqvpSLT5LCUd8IFPTEfFxHk4DFoswEPXB52ZrcMdWp2Nn4ToXa12ckbasuIW-jnvTfDgbYpG_d5oBqYnXT8Nixg1ZA=w596-h789-no?authuser=0" width="300" height="300" />

## Experience
* Principal Scientist, INSOFE
* Lead Research Engineer, GE Global Research
* Senior Engineer, Belcan, India and Belcan, US
* Assistant Professor, Aerospace Engineer, University of Petroleum & Energy Studies

## Education
* PhD Aerospace Engineering, North Carolina State University
* MS Mechancial Engineering, Tufts University
* BE Mechancial Engineering, Osmania University

## Recognition

* Six Sigma Black Belt
* 6 filed (2 granted) patents
* 7 International conferences & journal publications