In [None]:
import pandas as pd
import numpy as np
import scipy as sp
from matplotlib import pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.optimize import minimize
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
from IPython.display import Image
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import DBSCAN
from scipy import integrate, optimize
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import datetime as dt
import matplotlib.dates as mdates
import warnings
from pandas.core.common import SettingWithCopyWarning

warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=SettingWithCopyWarning) 
import sys
import warnings

if not sys.warnoptions:
    warnings.simplefilter("ignore")

With 1100 cases of COVID-19 infections registered in India so far, the number still falls short of what a pandemic outbreak can do to a populated country. In this study, we aim to arrive at a statistically reliable estimate of potential infection cases in India till lockdown by comparing the infection parameters and their dependence on temperature and relative humidity in other countries. The parameters obtained for 4 Indian states are used as an input in our agent based model to simulate the number of cases under lockdown and no-lockdown scenarios.

The first case of what has since been known as COVID-19 dates back to November 17th 2019 which was confirmed by December 7th. The disease caused by an infection with SARS-CoV-2 is termed as COVID-19 which translates lucidly to coronavirus disease 2019. Initially it was limited to the city of Wuhan in the Hubei province of China. However lack of checks to prevent the spread of the disease through human to human transmission resulted in a worldwide pandemic. On March 29, 2020 the number of confirmed cases stands at 723328 (Microsoft Bing Covid Tracker, 6.30am GMT March 30, 2020) with the number of active cases standing at 537332 (74.29%), the number of recovered being 151991 (21.01%) and the number of deaths being 34005 (4.70%). In India, which is dubbed as the world’s largest democracy, the present count of confirmed cases stands at 1161 with 1031 active cases, 102 recoveries and 28 deaths. A previous study attempted to portray the behaviour of COVID-19 outbreak across different countries and compared India’s position with respect to other countries based on some growth parameters.

The first case of COVID 19 in India was registered on 30th January, 2020. Since then over 1000 active cases has been registered in the country. Trying to make projections for a country with a population of more than 130 crore is no mean task and we have to proceed with extreme caution while interpreting the low number of cases. One of the most effective ways of trying to stop the spread of COVID 19 is through proactive testing. However, India lags far behind the other countries when it comes to tests conducted per million people with the present number of tests carried out per million people standing at 19 (TOI). This might have contributed towards the slow growth rate of COVID 19 affected people in India as compared to other countries. According to a World Bank report, the number of hospital beds per thousand people in India stands at 0.7. In comparison, China has 4.2, France has 6.5, Germany has 8.3 and the world average is 2.7. If India goes through Stage III of the pandemic and enters Stage IV, then the sheer rise in the number of cases would be too difficult to control any more. There would be no clear endpoint for the pandemic anymore.


The government however has made several interventions since the beginning of March to fight against human transmission of the disease. 

In this study, we have considered the time series data from CSSE COVID-19 Dataset for our analysis. Using the same dataset till 22nd March, 2020, we have shown in our [previous study](https://www.kaggle.com/subhamrath/covid-parameter-study-india-others) that COVID-19 growth parameters of India were most similar to countries like Italy and Iran.

The present study extends our previous work as we now aim to understand the number of Indians who could be potentially infected by COVID-19 till lockdown. Further, we also present two scenarios of number of infection cases under lockdown and no lockdown.
Using the datasets, we have selected places and regions where confirmed cases till date is greater than 100. We did not consider the dataset of China because of large discrepancies in the dataset while at the same time we do not consider India as it is our country of interest and thus it would lead to overfitting. Assuming that the susceptible population is at least 100 times more than the total number of infected till date, we fit a SIR model in the dataset to estimate the parameters.

Reading the dataset


Datasets 'https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data'

In [None]:
#DATASET CONTAINING INFORMATION OF POPULATION DENSITY , AVG. TEMPERATURE , RELATIVE HUMIDITY
geo_data = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/place_list_2k.csv', encoding = "ISO-8859-1")
geo_data = geo_data.set_index(['Unnamed: 0'])
geo_data_train = geo_data.iloc[:89,:]
geo_data_test = geo_data.iloc[89:]


In [None]:
#READING COVID19 confirmed, death and recovery dataset 
covid_cnf_ts = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/time_series_covid19_confirmed_global.csv')
covid_de_ts = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/time_series_covid19_deaths_global.csv')
covid_re_ts = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/time_series_covid19_recovered_global.csv')

In [None]:
plt.style.use('fivethirtyeight')

In [None]:
date_columns = list(covid_cnf_ts.iloc[:,4:].columns)
covid_cnf_ts = covid_cnf_ts[covid_cnf_ts['Country/Region'] != 'Cruise Ship']
covid_de_ts = covid_de_ts[covid_de_ts['Country/Region'] != 'Cruise Ship']
covid_re_ts = covid_re_ts[covid_re_ts['Country/Region'] != 'Cruise Ship']

# covid_cnf_ts_grouped = covid_cnf_ts.groupby('Country/Region')[date_columns].sum()
# covid_de_ts_grouped = covid_de_ts.groupby('Country/Region')[date_columns].sum()
# covid_re_ts_grouped = covid_re_ts.groupby('Country/Region')[date_columns].sum()

Selecting places/regions where confirmed cases till date is greater than 100

In [None]:
covid_details = pd.concat([covid_cnf_ts.iloc[:,-1],covid_re_ts.iloc[:,-1], covid_de_ts.iloc[:,-1]], axis = 1 )
covid_details.columns =['Confirmed', 'Recovery', 'Death']
covid_details = covid_details[(covid_details['Confirmed'] > 100) & (covid_details['Recovery'] > 10) & (covid_details['Death']>0)]


In [None]:
covid_cnf_aggregate = covid_cnf_ts.iloc[:,4:].T.sum()
covid_cnf_ts['Total'] = covid_cnf_aggregate
sort_cnf = covid_cnf_ts.sort_values(by = ['Total'], ascending = False)
sort_cnf
affected_countries_confirmed = covid_cnf_ts
affected_countries_confirmed = affected_countries_confirmed.ix[sort_cnf.index]
affected_countries_confirmed = affected_countries_confirmed[affected_countries_confirmed.iloc[:,-2]>100]

covid_de_aggregate = covid_de_ts.iloc[:,4:].T.sum()
covid_de_ts['Total'] = covid_de_aggregate
sort_de = covid_de_ts.sort_values(by = ['Total'], ascending = False)
affected_countries_death = covid_de_ts
affected_countries_death = affected_countries_death.ix[sort_de.index]
affected_countries_death = covid_de_ts[covid_de_ts.iloc[:,-2]> 1]

covid_re_aggregate = covid_re_ts.iloc[:,4:].T.sum()
covid_re_ts['Total'] = covid_re_aggregate
sort_re = covid_re_ts.sort_values(by = ['Total'], ascending = False)
affected_countries_recovery = covid_re_ts
affected_countries_recovery = affected_countries_recovery.ix[sort_re.index]
affected_countries_recovery = covid_re_ts[covid_re_ts.iloc[:,-2] > 10]


We are not considering dataset of China because of large discreapancies in dataset. We are not considering India, as it is the country of our interest (to avoid overfitting).

In [None]:
confirmed_cases_df = affected_countries_confirmed[(affected_countries_confirmed['Country/Region'] != 'China') &  
                                                  (affected_countries_confirmed['Country/Region']!= 'India')]
country_details = confirmed_cases_df.iloc[:,:4]
# country_details.to_csv('place_list_1.csv')

Assumption - 1 : Susceptible population is 100 times more than total number of infected till date.

In [None]:
confirmed_cases_df['Total'] = 100*confirmed_cases_df.iloc[:,-2]


The SIR model consists of three compartments: S for the number of susceptible, I for the number of infectious, and R for the number of recovered or deceased (or immune) individuals. To represent that the number of susceptible, infected and recovered individuals may vary over time (even if the total population size remains constant), we make the precise numbers a function of t (time): S(t), I(t) and R(t). The model is expressed in terms of following set of Non-linear differential equations. 
$$\frac{dS}{dt} = -\frac{\beta I S}{N}$$

$$\frac{dI}{dt} = \frac{\beta I S}{N} - \gamma I$$

$$\frac{dR}{dt} = \gamma I$$

where S is the stock of susceptible population, I is the stock of infected, R is the stock of recovered population and N is the sum of these three.
β is the average number of contacts per person per time, multiplied by the probability of disease transmission in a contact between a susceptible and an infectious subject.
γ is simply the rate of recovery or mortality, that is, number of recovered or dead during one day divided by the total number of infected on that same day, supposing “day” is the time unit.


In [None]:
#SIR MODEL 
def SIR_model(SIR_info, time, beta, gamma):
    '''Fits SIR model into data and returns effective contact rate (beta)
    and recovery rate (gamma).'''
    
    S = SIR_info[0]
    I= SIR_info[1]
    R = SIR_info[2]
    
    St = - beta*S*I/population
    Rt = gamma*I
    It = -(St+Rt)
    return St, It, Rt

def ode_solution(time, beta, gamma):
    '''Solves SIR Differential equations'''
    return integrate.odeint(SIR_model, (initial_susceptible, initial_infected, initial_recovered),
                            time,args=(beta, gamma))[:,1]


#Early epidemic growth model Chowell, 2016

def chowell_model(C, time, r,p):
    dC_Dt = r*(C)**p
    
    return dC_Dt

def integrate_chowell_model(time, r, p):
    return integrate.odeint(chowell_model, initial_infected,
                            time,args=(r, p))

def chowell_model_solution(time, r,p):
#     print('r',r,'p',p)
    m = 1/(1-p)
    A = initial_infected**(1/m)
    C = ((r/m)*time + A)**m
    return C


'-------------------------'

def covid_likelihood(params, *data):
    '''Constructs a likelihood based on the data observed'''
    
    k = params[0]
    b= params[1]
    sd = params[2]
    y_dat = data
    f = 1/(1+np.exp(-k*(x_dat-b)))
#     print(stats.norm.logpdf(y_dat, f, sd))
    likelihood = - np.sum(stats.norm.logpdf(y_dat/y_dat[-1], f, sd))
    return likelihood

def sigmoid(x,a,b,c):
    '''Non scaled sigmoid function to model the data'''
    c = 1
    f = c/(1+np.exp(-(x-b)/a))
    return f

def sigmoid_1(x,a,b):
    '''Scaled sigmoid function to model the normalized data'''
    f = 1/(1+np.exp(-(x-b)/a))
    return f

def get_param_estimate(function, initparams):
    estimates = minimize(covid_likelihood, [1,1,1], method = 'Nelder-Mead')
    return estimates.x

def func_exp(x, a,b, c):
    c = 0
    return a * np.exp(b * x) + c

def parameter_estimations(x, y, scale_flag):
    '''Provides functionality for parameter estimations 
    with or without scaling (provided by scale_flag)'''
    if scale_flag:
        y_scale = y/y[-1]
        p0 = [2, np.argmax(y)]
        popt, pcov = curve_fit(sigmoid_1, x, y_scale, p0, method='dogbox',maxfev=100000)
        parameter = [popt[0], popt[1]]
    else:
        p0 = [2, np.argmax(y_dat), np.max(y)]
        popt, pcov = curve_fit(func_exp, x, y, p0 , maxfev = 10000)
        parameter = [popt[0], popt[1], popt[2]]
    return parameter

# Identifying onset date of infection case for each place 

In [None]:
first_non_zero_location = confirmed_cases_df.iloc[:,4:-1].T.ne(0).idxmax()
column_info = pd.Series(confirmed_cases_df.columns)
#storing the index of the dates 
index_dict = {}
for i in column_info:
    loc = column_info[column_info == i].index.tolist()
    index_dict[i] = loc

Fitting the SIR model and estimating the parameters

In [None]:

chowell_model = False
parameter_list = []
for place in range(len(confirmed_cases_df)):
    population = confirmed_cases_df.iloc[place, -1]
    first_non_zero_date = confirmed_cases_df.iloc[place,4:-1].T.ne(0).idxmax()
    non_zero_index = index_dict[first_non_zero_date][0]
    initial_infected = confirmed_cases_df.iloc[place, non_zero_index]
#     print('initial_infected',initial_infected, non_zero_index)
    initial_susceptible = population - initial_infected
    initial_recovered = 0 
    xdata = np.arange(0,len(confirmed_cases_df.iloc[place,non_zero_index:-1]))
#     print(xdata)
    ydata = confirmed_cases_df.iloc[place,non_zero_index:-1]
#     print(ydata.values.tolist())
    if chowell_model:
        param_bounds=(0,[10,1])
        popt, pcov = optimize.curve_fit(chowell_model_solution, xdata,
                                        ydata.values.tolist(),bounds= param_bounds,method='dogbox',maxfev=100000)
    else:
        param_bounds=(0,[np.inf,np.inf])
        popt, pcov = optimize.curve_fit(ode_solution, xdata,
                                        ydata.values.tolist(),bounds= param_bounds,method='dogbox',maxfev=1000)
    parameter_list.append(popt)
    
    

In [None]:
beta = [i[0] for i in parameter_list]
gamma = [i[1] for i in parameter_list]

In [None]:
confirmed_cases_df['beta'] = beta
confirmed_cases_df['gamma'] = gamma
confirmed_cases_df['pop_density'] = geo_data_train['Pop Density']

In [None]:
geo_data_train.columns

In [None]:
avg_temp = (geo_data_train['Avg Temp Mar (in _)'] + geo_data_train['Avg Temp Feb (in _)'])/2
confirmed_cases_df['average_tem'] = avg_temp
confirmed_cases_df['RH'] = (geo_data_train['Relative Humidity Feb (in %)'] + geo_data_train['Relative Humidity Mar (in %)'])/2

In [None]:
confirmed_cases_df_1 = confirmed_cases_df[confirmed_cases_df.iloc[:,-6]>100]

In [None]:
confirmed_cases_df_1 = confirmed_cases_df_1[confirmed_cases_df_1['average_tem'].notnull()]

In [None]:
fig = plt.figure(figsize = (14, 7))
fig.add_subplot(1,2,1)
plt.scatter(x = confirmed_cases_df_1['average_tem'], 
            y = confirmed_cases_df_1['RH'], 
            s = confirmed_cases_df_1['beta']*25, 
            alpha=0.6, c ='red',
            edgecolors='black')
plt.annotate('effective contact rate', xy=(15, 80), xytext=(25, 92),
                   arrowprops=dict(facecolor='black', shrink=0.05))

plt.xlabel('Average temperature')
plt.ylabel('Relative Humidity')
sns.kdeplot(confirmed_cases_df_1['average_tem'], (confirmed_cases_df_1['beta']) ,cmap="YlOrBr", kind="kde", shade=True, shade_lowest=False, ax =fig.add_subplot(1,2,2)) 
plt.xlabel('Average temperature')
plt.ylabel('Effective contact rate')
plt.suptitle('Temperature-RH-effective contact rate variation \n across different places', y=1.05)
plt.savefig('Figure_1.png')
plt.show()

After estimating the parameters, we compared its variation with Relative Humidity (RH) and Average Temperature (T) for month of February and March for different places.

The plot above hows a variation of effective contact rate β with RH and T ; the blob sizes represent the magnitude of β for different cities at corresponding RH, T coordinates. We see a large concentration of effective contact rates in a zone between T (0 C–15C), and RH (65–90). From our dataset we didn’t find any case of COVID-infection between T (-10 C to 10 C) and RH (30–50) which implies that COVID-19 infection outbreaks are less likely to occur in dry and cold areas. The plot on right hand side shows a joint density plot between effective contact rate β and T; β values of 1–3 are concentrated between (0 C–18 C) and RH (70–80). We use these high dimensional non linear relationship between β, γ and T, RH to estimate β, γ of some Indian states given their T, RH data for February, March, April and May.

Next, we have used a non-parametric nearest neighbour method to estimate these parameters. However, we have not used the most obvious choice, k-nearest neighbours for this method as it suffers from curse of dimensionality. Instead we have used locality sensitive hashing (LSH Algorithm and Implementation) — an approximate nearest neighbour (ANN) which has proven to be extremely efficient with high dimensional datasets.

We are unable to find LSHForest package in Kaggle. We are attaching the code for perusal as it ran perfectly in our local system. 

In [None]:
# from sklearn.neighbors import LSHForest
# import numpy as np
# from scipy.stats import mode

# class LSH_KNN:

#     def __init__(self, **kwargs):
#         self.n_neighbors = kwargs['n_neighbors']
#         self.lsh = LSHForest(**kwargs)

#     def fit(self, X, y):
#         self.y = y
#         self.lsh.fit(X)

#     def predict(self, X):
#         _, indices = self.lsh.kneighbors(X, n_neighbors = self.n_neighbors)
# #         print(indices, len(indices))
#         beta_list = []
#         for i in range(len(indices)):
#             beta_neighbour = []
#             for j in indices[i]:
#                 beta = self.y[j]
#                 beta_neighbour.append(beta)
#                 beta_m = np.mean(beta_neighbour)
#             beta_list.append(beta_m)
#         return beta_list

# modeling_data = pd.concat([confirmed_cases_df['average_tem'][:70], confirmed_cases_df['RH'], confirmed_cases_df['beta'], confirmed_cases_df['gamma']], axis = 1)
# modeling_data = modeling_data.dropna()
# x = modeling_data.iloc[:,:2]
# beta = modeling_data['beta'].values
# gamma = modeling_data['gamma'].values
# x_train, x_test, y_train, y_test = train_test_split(x,beta, test_size = 0.01, random_state = 42)
# sc = StandardScaler()
# x_train_sc = sc.fit_transform(x_train)
# # x_test_sc = sc.fit_transform(x_test)

# state_list = geo_data_test['Province/State'].values.tolist()

# RH = (geo_data_test['Relative Humidity Feb (in %)'] + geo_data_test['Relative Humidity Mar (in %)'])/2
# Temp = (geo_data_test['Avg Temp Feb (in _)'] + geo_data_test['Avg Temp Mar (in _)'])/2
# test_df = pd.DataFrame(list(zip(RH.values.tolist(), Temp.values.tolist())))
# test_df.index = state_list
# test_df.columns = ['RH', 'Temp']

# x_test_1 = sc.fit_transform(test_df.iloc[:,:2])
# x_test_sc = x_test_1
# x_test_sc_1 = sc.fit_transform(test_df.iloc[:,2:])
# model = LSH_KNN(n_neighbors = 7)
# fit = model.fit(x_train_sc,beta)
# beta_pred_Feb_Mar = model.predict(x_test_sc)
# beta_pred_Apr_May = model.predict(x_test_sc_1)


# fit_1 = model.fit(x_train, gamma )
# gamma_pred_Feb_Mar = model.predict(x_test_sc)
# gamma_pred_Apr_May = model.predict(x_test_sc_1)

# test_df['beta_1']= beta_pred_Feb_Mar
# test_df['gamma_1'] = gamma_pred_Feb_Mar

# test_df['beta_2']= beta_pred_Apr_May
# test_df['gamma_2'] = gamma_pred_Apr_May

The above code gives us the following results . We see that for our chosen states in India, namely, Maharashtra (MH), West Bengal (WB), Kerala (KL), Rajasthan (RJ) and Karnataka (KA), the β value varies with change in temperature and RH. β changes from 1.73 to 1.36 for MH. For WB it changes from 2.13 to 1.36; for KL it changes from 1.74 to 1.84; for Rajasthan it changes from 1.09 to 1.37; for KA it changes from 1.33 to 1.27. The former value in each of the cases has been predicted for the months of February and March while the later value has been predicted for the warmer months of April and May. γ, remains unchanged for the most part with MH, KA and WB having almost no change in γ with change in weather conditions while there is a slight change observed for RJ whose γ value rises from 0.75 during February and March to 1.02 in April and May. These changes in β and γ can be attributed to increases in temperature and Relative Humidity.

(<img src="https://miro.medium.com/max/1400/1*kc6Ysr7V4NmpiMIKaJJ0jg.png" width="1000px">


) 

Next we standardise β and γ  for inputs into our Agent Based Modeling. 

In [None]:
# from sklearn.preprocessing import MinMaxScaler
# scaler = MinMaxScaler()
# beta_extend = np.append(beta, np.array(beta_pred_Feb_Mar))
# beta_extend_1 = np.append(beta_extend, beta_pred_Apr_May)
# beta_transform = scaler.fit_transform(beta_extend_1.reshape((-1,1)))*100

# gamma_extend = np.append(gamma, np.array(gamma_pred_Feb_Mar))
# gamma_extend_1 = np.append(gamma_extend, np.array(gamma_pred_Apr_May))
# gamma_transform = scaler.fit_transform(gamma_extend_1.reshape((-1,1)))*100


In [None]:
# beta_india = [i[0] for i in beta_transform[-10:]]
# gamma_india = [i[0] for i in gamma_transform[-10:]]
# abm_input_FM = pd.DataFrame(list(zip(beta_india[:5], gamma_india[:5])))
# abm_input_AM = pd.DataFrame(list(zip(beta_india[5:], gamma_india[5:])))
# abm_input_FM.columns = ['Contact %', 'Recovery %']
# abm_input_FM.index = state_list

# abm_input_AM.columns = ['Contact%', 'Recovery %']
# abm_input_AM.index = state_list


We used a basic skeleton of the Netlogo Virus Model as our framework. This model simulates the transmission and perpetuation of a virus in a human population. In this agent based model, we have initialized using infection rate (proxy of effective contact rate), recovery rate and incubation period of 5 days. For the sake of simplicity, we have set a strong assumption that a person once recovering from the disease won’t fall sick again in a year at least. The gif below briefly illustrates a glimpse of our primitive agent based model

<img src="https://miro.medium.com/max/1400/1*PJbJDFXtpH3PHZREhxsiqw.gif" width="1000px">


In [None]:
#ABM Input for Feb-March
# abm_input_FM

In [None]:
#ABM Input for April-May
# abm_input_AM

After running the ABM Simulation for both lockdown and no-lockdown scenarios we import and analyse the results.

In [None]:
date_rng = pd.date_range(start='3/14/2020', end='5/14/2020', freq='D')
df_1 = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/wb_lockdown.csv',error_bad_lines=False)
df_2 = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/wb_no_lockdown.csv',error_bad_lines=False)
df_1.index = date_rng
df_2.index = date_rng


date_rng_1 = pd.date_range(start='3/01/2020', end='5/01/2020', freq='D')
df_3 = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/kerala_lockdown.csv',error_bad_lines=False)
df_4 = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/kerala_no_lockdown.csv',error_bad_lines=False)
df_3[:len(date_rng_1)].index = date_rng_1
df_4[:len(date_rng_1)].index = date_rng_1

df_5 = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/karnataka_lockdown.csv',error_bad_lines=False)
df_6 = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/karnataka_no_lockdown.csv',error_bad_lines=False)
df_6 = df_6[:len(date_rng_1)]
df_5.index = date_rng_1
df_6.index = date_rng_1

df_7 = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/rajasthan_lockdown.csv',error_bad_lines=False)
df_8 = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/rajasthan_no_lockdown.csv',error_bad_lines=False)
df_7 = df_7[:len(date_rng_1)]
df_8 = df_8[:len(date_rng_1)]
df_7.index = date_rng_1
df_8.index = date_rng_1

date_rng_3 = pd.date_range(start='3/09/2020', end='5/09/2020', freq='D')

df_9 = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/maharashtra_lockdown.csv',error_bad_lines=False)
df_10 = pd.read_csv('../input/covid19-dataset-johnhopkins-personal-netlogo-csv/maharashtra_no_lockdown.csv',error_bad_lines=False)
df_9 = df_9[:len(date_rng_3)]
df_10 = df_10[:len(date_rng_3)]
df_9.index = date_rng_3
df_10.index = date_rng_3

In [None]:
plt.figure(figsize = (18,18))
plt.subplot(2,2,1)
plt.plot(df_1['y'][:]*100/1028, label = 'lockdown')
plt.plot(df_2['y'][:]*100/1028, label = 'no lockdown')
plt.xticks(rotation=45, ha="right")
plt.ylabel('%population')
plt.axvline(dt.datetime(2020, 3, 24), c ='black')
x = dt.datetime(2020, 3, 24)
x1 = dt.datetime(2020, 5, 16)
plt.axvspan(x, x1, alpha=0.3, color='grey')
plt.legend()
plt.title( 'West Bengal')

plt.subplot(2,2,2)
plt.plot(df_5['y']*100/319, label = 'lockdown')
plt.plot(df_6['y']*100/319, label = 'no lockdown')
plt.xticks(rotation=45, ha="right")
plt.ylabel('%population')
plt.axvline(dt.datetime(2020, 3, 24), c ='black')
x = dt.datetime(2020, 3, 24)
x1 = dt.datetime(2020, 5, 2)
plt.axvspan(x, x1, alpha=0.3, color='grey')
plt.legend()
plt.title( 'Karnataka')



plt.subplot(2,2,3)
plt.plot(df_7['y']*100/200, label = 'lockdown')
plt.plot(df_8['y']*100/200, label = 'no lockdown')
plt.xticks(rotation=45, ha="right")
plt.ylabel('%population')
plt.axvline(dt.datetime(2020, 3, 24), c ='black')
x = dt.datetime(2020, 3, 24)
x1 = dt.datetime(2020, 5, 2)
plt.axvspan(x, x1, alpha=0.3, color='grey')
plt.legend()
plt.title( 'Rajasthan')



plt.subplot(2,2,4)
plt.plot(df_9['y']*100/365, label = 'lockdown')
plt.plot(df_10['y']*100/365, label = 'no lockdown')
plt.xticks(rotation=45, ha="right")
plt.ylabel('%population')
plt.axvline(dt.datetime(2020, 3, 24), c ='black')
x = dt.datetime(2020, 3, 24)
x1 = dt.datetime(2020, 5,11)
plt.axvspan(x, x1, alpha=0.3, color='grey')
plt.legend()
plt.title( 'Maharashtra')

plt.suptitle('Infected population percentage per sq.Km')
# plt.savefig('Figure_2.png')
plt.show()



# plt.annotate('effective contact rate', xy=(11, 12), xytext=(11, 12),
#                    arrowprops=dict(facecolor='black', shrink=0.05))

In [None]:
#Number of days(d) between first COVID infection and lockdown announcement  
#infected_population_lockdown_state = df_1['y'][d]/population density
infected_population_lockdown_wb = df_1['y'][4]/1028
infected_population_lockdown_karnataka = df_5['y'][24]/319
infected_population_lockdown_rajasthan = df_9['y'][24]/200
infected_population_lockdown_maharastra = df_9['y'][15]/365

In [None]:
print('infected population  percentage per sq.km till lockdown West Bengal :', round(infected_population_lockdown_wb,3)*100)
print('infected population  percentage per sq.km till lockdown Karnataka :', round(infected_population_lockdown_karnataka,2)*100)

print('infected population  percentage per sq.km till lockdown Rajasthan :', round(infected_population_lockdown_rajasthan,2)*100)

print('infected population  percentage per sq.km till lockdown Maharashtra :', round(infected_population_lockdown_maharastra,2)*100)



**Our projections suggest that with no lockdown, MH would have 69 cases out of 359 per sq. km, RJ would have 33 out of 200 cases per sq. km, KA would have 73 out of 319 cases per sq. km by 29th April while WB would have 1028 cases per sq. km by 16th May. We can clearly see that WB has far more cases per square kilometer and this can be attributed to the fact that the population density of WB is almost three times that of MH and KA and five times that of RJ.
However, with a lockdown in place, the numbers would change drastically as MH would have 14 cases per sq. km, RJ would have 8 cases per sq. km and KA would have 15 cases per sq. km by 29th April while WB would have at a maximum of 143 cases per sq. km by 16th May.
However, these estimates are achievable only when we go for a 50 day lockdown.**

**It is to be noted that, in a lockdown case we are reducing the effective contact rate by 80 percent post 24th March, 2020 , assuming that there would be still some movement outside even after the lockdown announcement.
Till lockdown, there may be 4 potentially COVID-19 infected persons per sq. km in WB, 3 in MH, 8 in RJ, and 9 in KA per sq. km. This roughly translates to 3 lakh people in total in WB, 18 lakh people in total in KA, 27 lakh people in total in RJ and 11 lakh people in total in KA.
**
However, we strongly suggest that these numbers are not absolutely accurate and we would urge the readers to focus more on the population per sq. km count, as population is not distributed evenly across a region. We also believe that addition of more parameters and introduction of more complexities would only bring these numbers down.

All the evidence from our calculations seems to justify the present lockdown in place by the Indian Government. However, the way it was enforced is where it falls short and can even be termed as draconian. Although the model we used is very primitive to say the least, our findings in spite of falling short of outrageous numbers by [other studies](https://cddep.org/covid-19/) (it must however be noted that IndiaSIM is one of the most advanced models out there) is much larger than the official reported cases so far. This can also be attributed to the extremely low testing rates in India (currently at a meagre 19 tests per million people). Without any vaccination in place, nearly all of us have a possibility to catch COVID-19 sooner or later just like pox/measles; COVID-19 is not as contagious as measles but potentially more dangerous. Since there is still a lack of evidence as to how the virus spreads through social contacts and how the latter varies with age [(Singh et.al. 2020)](https://arxiv.org/abs/2003.12055), it is therefore necessary to resolve both these aspects before proceeding further. We have also not been able to capture demographic movement across the country with a large percentage of the population falling prey to the unexpected lockdown and being stranded all across the country. We do not know yet what effect that could have, especially because social distancing is not on top of the priority list for the migrant labourers as they seek to find some form of transport to go back home. Moreover, there is also the matter of huge uncertainty in all the parameters used by us in making the calculations as this is still an evolving situation and we hope to collaborate with anyone having access to better and more evolved data sources. Our study can be found here.
