# Introduction to COVID-19
<hr>

![COVID-19](https://techcrunch.com/wp-content/uploads/2020/02/coronavirus.jpg)
*Image Credits : [Scientific Animations](http://www.scientificanimations.com/wiki-images/) under a [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license*

> **Coronavirus** is a family of viruses that can cause illness, which can vary from *common cold* and *cough* to sometimes more severe disease. **Middle East Respiratory Syndrome (MERS-CoV)** and **Severe Acute Respiratory Syndrome (SARS-CoV)** were such severe cases with the world already has faced.<br> **SARS-CoV-2 (n-coronavirus)** is the new virus of the coronavirus family, which first *discovered* in 2019, which has not been identified in humans before. It is a *contiguous* virus which started from **Wuhan** in **December 2019**. Which later declared as **Pandemic** by **WHO** due to high rate spreads throughout the world. Currently (on the date 10 June 2020), this leads to a total of *500K+ Deaths* across the globe.<br>
Pandemic is spreading all over the world; it becomes more important to understand about this spread. This NoteBook is an effort to analyze the cumulative data of confirmed, deaths, and recovered cases over time. In this notebook, the main focus is to analyze the spread trend of this virus all over the world. 


### The following two curves shows why we need to flattern the curve and follow the social distancing measures<hr>


<div style="">
    <div style="width:70%"><img src="https://healthblog.uofmhealth.org/sites/consumer/files/2020-03/Coronavirus_flattening_curve_1.jpg"/></div>
    <div style="width:70%"><img src="https://labblog.uofmhealth.org/sites/lab/files/2020-04/flattening_curve_social_distancing.jpg"/></div>
</div>
    
<hr>

#### I will **update** this **notebook** *continuously* with new viz and updated data. 


**NOTE :** 
* **Since Dataset Structure has been changed and recovered dataset is no longer updated by Johns Hopkins University, Few Visualization has been dropped related to recovered cases and also active cases.**

**<span style = "color:#cc1616">Update log:</span>**
* <font style="color: rgba(107, 61, 35, 0.92) "><b>01 Apr 2020 01:35 AM IST (Version 42) :</b> Indian Testing Data and Comparision with South Korea Added. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>03 Apr 2020 02:50 AM IST (Version 51) :</b> <a href="Calander-Map">Calander-Map</a> Added and Visual updates. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>03 Apr 2020 03:40 PM IST (Version 52) :</b> Dataset Update and Bug fix. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>05 Apr 2020 01:10 AM IST (Version 54) :</b> 2 New section added <a href='#COVID-19-Daily-Analysis'>COVID-19 Daily Analysis</a> and <a href='#Testing-Analysis'>Testing Data Analysis</a>. Dataset Update and Bug fix. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>15 Apr 2020 02:50 AM IST (Version 61) :</b> Dataset Update and Prediction model update. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>02 May 2020 01:50 PM IST (Version 67) :</b> Prediction Model Updated. Dataset Update. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>03 May 2020 02:05 AM IST (Version 71) :</b> Testing Data Updated. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>16 June 2020 10:45 PM IST (Version 103) :</b> Dataset Updated. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>19 June 2020 11:00 PM IST (Version 105) :</b> Model Updatet. Some Improvements. Dataset Updated. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>03 July 2020 02:20 AM IST (Version 110) :</b> Dataset Updated. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>09 July 2020 01:00 AM IST (Version 112) :</b> Dataset Updated. </font>
* <font style="color: rgba(107, 61, 35, 0.92) "><b>12 July 2020 03:00 AM IST (Version 114) :</b> Dataset Updated. </font>
* <font style="color: #cc1616 "><b>24 August 2020 05:40 PM IST (Version 120) :</b> Dataset Updated. </font>

<hr>

### SOURCES:
https://github.com/CSSEGISandData/COVID-19<br>
2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE
<br>
This dataset is updated on daily basis by Johns Hopkins CSSE
<hr>

# Please don't PANIC, stay safe, follow your nation and WHO guidelines. We all can defeat this together. Please don't spread rumors.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Covid 19 ANAYLYSIS USING RNN AND PREDICTING EFFECTS OF IT 

In [None]:
#importing libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns


In [None]:
#LOADING THE DATASET
pd.options.display.max_columns = 500
pd.options.display.max_rows = 500
X = pd.read_csv('../input/novel-corona-virus-2019-dataset/covid_19_data.csv')

In [None]:
print(X.shape)

Printing different attributes, just to visulaize them and see the details


In [None]:
print(X.dtypes)

In [None]:
X.head(10)

In [None]:
#converting the columns into correct form 
X['ObservationDate'] = pd.to_datetime(X['ObservationDate'])
X['Last Update'] = pd.to_datetime(X['Last Update'])
print(X.dtypes)

# **TEN MOST AFFECTED COUNTRIES DURING DIFFERENT MONTHS**

In [None]:
#made a new dataset by extracting the obeservation date
#by selecting the observation date as the end of each month the entire data for the month is selected 
#grouped them according to country of region 
#only the confirmed cases was extracted 
#sum of each month using .sum()
#took only the first 10 countries and sorted the total no of confirmed cases in each country in non-ascending order 

First_Month = X[X['ObservationDate']=='01/31/2020'].groupby('Country/Region')['Confirmed'].sum().head(10).sort_values(ascending = False)

In [None]:
Second_Month = X[X['ObservationDate']=='02/29/2020'].groupby('Country/Region')['Confirmed'].sum().head(10).sort_values(ascending = False)

In [None]:
Third_Month = X[X['ObservationDate']=='03/31/2020'].groupby('Country/Region')['Confirmed'].sum().head(10).sort_values(ascending = False)

In [None]:
Fourth_Month = X[X['ObservationDate']=='04/30/2020'].groupby('Country/Region')['Confirmed'].sum().head(10).sort_values(ascending = False)

In [None]:
Fifth_Month = X[X['ObservationDate']=='05/30/2020'].groupby('Country/Region')['Confirmed'].sum().head(10).sort_values(ascending = False)

In [None]:
Sixth_Month = X[X['ObservationDate']=='06/30/2020'].groupby('Country/Region')['Confirmed'].sum().head(10).sort_values(ascending = False)

In [None]:
Seventh_Month = X[X['ObservationDate']=='07/30/2020'].groupby('Country/Region')['Confirmed'].sum().head(10).sort_values(ascending = False)

In [None]:
fig, ax = plt.subplots(nrows = 2, ncols = 4, figsize = (20,10))
#fig.tight_layout uses to seperate each of the given subplots so that there is no overlapment 
fig.tight_layout(pad=5.0)

ax[0,0].bar(First_Month.index,First_Month)
ax[0,0].set_xticklabels(First_Month.index,rotation = 45)
ax[0,0].title.set_text('January')

ax[0,1].bar(Second_Month.index,Second_Month, color = 'g')
ax[0,1].set_xticklabels(Second_Month.index,rotation = 45)
ax[0,1].title.set_text('February')

ax[0,2].bar(Third_Month.index,Third_Month, color = 'c')
ax[0,2].set_xticklabels(Third_Month.index,rotation = 45)
ax[0,2].title.set_text('March')

ax[0,3].bar(Fourth_Month.index,Fourth_Month, color = 'r')
ax[0,3].set_xticklabels(Fourth_Month.index,rotation = 45)
ax[0,3].title.set_text('April')

ax[1,0].bar(Fifth_Month.index,Fifth_Month)
ax[1,0].set_xticklabels(Fifth_Month.index,rotation = 45)
ax[1,0].title.set_text('May')

ax[1,1].bar(Sixth_Month.index,Sixth_Month, color = 'g')
ax[1,1].set_xticklabels(Sixth_Month.index,rotation = 45)
ax[1,1].title.set_text('June')

ax[1,2].bar(Seventh_Month.index,Seventh_Month, color = 'c')
ax[1,2].set_xticklabels(Third_Month.index,rotation = 45)
ax[1,2].title.set_text('July')

plt.show()


# Covid-19 Data Analysis in India 

In [None]:
Confirmed_cases = X[X['Country/Region']=='India'].groupby('ObservationDate')['Confirmed'].sum()
Deaths_cases = X[X['Country/Region']=='India'].groupby('ObservationDate')['Deaths'].sum()
Recovered_cases = X[X['Country/Region']=='India'].groupby('ObservationDate')['Recovered'].sum()

In [None]:
plt.figure(figsize = (20,10))
plt.plot(Confirmed_cases,color = 'c',marker = 'v',label = 'Confirmed_cases')
plt.plot(Deaths_cases,color = 'g',marker = 'x',label = 'Deaths')
plt.plot(Recovered_cases,color = 'b',marker = 'o',label = 'Recovered')
plt.xlabel('Covid-19 Cases count')
plt.ylabel('total No of Affected people')
plt.legend
plt.show()

# Forecasting Using RNN

The reason for not using ARIMA model is because ARIMA models are mainly used for linear models.
By using RNN models, we can predict more complex patterns from the dataset 


In [None]:
#sorting out the data
X_India = X[X['Country/Region']=='India']
arranged_dataset = X_India.groupby(['ObservationDate']).agg({'Confirmed':'sum','Recovered':'sum','Deaths':'sum'})


In [None]:
arranged_dataset.tail(15)
arranged_dataset.shape

In [None]:
training_set = arranged_dataset.iloc[:,0:1].values

#Date Preprocessing
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)

#Creating data structure with 45 timesteps 
X_train = []
y_train = []
for i in range(45,180):
    X_train.append(training_set_scaled[i-45:i, 0])
    y_train.append(training_set_scaled[i, 0])
    
X_train, y_train = np.array(X_train) , np.array(y_train)   

#Reshaping
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))

#Initialize the RNN
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout

regressor = Sequential()

#Add first LSTM layer and Dropout regularisation
regressor.add(LSTM(units =50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.2))

#Adding second layer
regressor.add(LSTM(units =50, return_sequences = True))
regressor.add(Dropout(0.2))

#Adding third layer
regressor.add(LSTM(units =50, return_sequences = True))
regressor.add(Dropout(0.2))

#Adding fourth layer
regressor.add(LSTM(units =50))
regressor.add(Dropout(0.2))

#Output layer
regressor.add(Dense(units = 1))

regressor.compile(optimizer = 'adam', loss = 'mse')

#Training the model
#Taking a small batch size because the number of data points to train on is limited
regressor.fit(X_train, y_train, epochs = 50, batch_size = 5)

In [None]:
#Prediction and visualization
real_confirmed_cases = arranged_dataset.iloc[170:213,0:1].values

X_test = []

for i in range(170,213):
    X_test.append(training_set_scaled[i-45:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_confirmed_cases = regressor.predict(X_test)
predicted_confirmed_cases = sc.inverse_transform(predicted_confirmed_cases)

In [None]:
plt.figure(figsize = (12,8))
plt.plot(real_confirmed_cases, color='c',marker = 'o', label = 'Real Confirmed Cases')
plt.plot(predicted_confirmed_cases, color='g',marker = 'o', label = 'Predicted Number of Cases')
plt.title('Coronavirus Forecasting Trend in Cases')
plt.xlabel('Days')
plt.ylabel('Number of Cases')
plt.legend()
plt.show()

**The RNN model is forecasting an exponential trend in the number of covid-19 cases which is quite simliar to the Real number of cases.**

# India is now second most affected country in the world and the curve still has not peaked yet. With ~90,000 cases increasing per day, we need to take neccessary precautions and follow the basic guildlines suggested by the Government.



> I have just begun exploring the field of Data Science and have still got a lot to learn. Will appreciate any kind of feedback or criticism. There is still a lot of things I could have tried with this dataset. Will surely come back and try to improve on this work.
> 
> Thank You