## **COVID-19: Cases, Vaccinations, Deaths, Predictions (India)🦠💉**<hr>

![alt text](https://southkingstownri.com/ImageRepository/Document?documentID=3809)

**Attribute Information**

* State/UTs - Names of Indian States and Union Territories.
* Total Cases - Total number of confirmed cases
* Active - Total number of active cases
* Discharged - Total number of discharged cases
* Deaths - Total number of deaths
* Active Ratio (%) - Ratio of number of active cases to total cases
* Discharge Ratio (%) - Ratio of number of discharged cases to total cases
* Death Ratio (%) - Ratio of number of deaths to total cases
* Dose 1 - Number of first dose of vaccine given
* Dose 2 - Number of second dose of vaccine given
* Total Vaccination Doses- Total number of vaccine doses given

### Importing all the libraries

In [16]:
# !pip install pandas_profiling
# !pip install fbprophet

In [15]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pandas_profiling import ProfileReport
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from fbprophet import Prophet

## **Let's take a look at the overall COVID-19 situation prevailing in India**

### Importing the Dataset

In [18]:
dataset = pd.read_csv('./input/Latest Covid-19 India Status.csv')
dataset.head()

Unnamed: 0,State/UTs,Total Cases,Active,Discharged,Deaths,Active Ratio (%),Discharge Ratio (%),Death Ratio (%)
0,Andaman and Nicobar,7549,1,7419,129,0.01,98.28,1.71
1,Andhra Pradesh,1995669,16341,1965657,13671,0.82,98.5,0.69
2,Arunachal Pradesh,51655,1763,49640,252,3.41,96.1,0.49
3,Assam,581398,8772,567113,5513,1.51,97.54,0.95
4,Bihar,725518,204,715665,9649,0.03,98.64,1.33


### Basic Data Wrangling

In [19]:
print("The shape of the first dataset is",dataset.shape)

The shape of the first dataset is (36, 8)


In [20]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   State/UTs            36 non-null     object 
 1   Total Cases          36 non-null     int64  
 2   Active               36 non-null     int64  
 3   Discharged           36 non-null     int64  
 4   Deaths               36 non-null     int64  
 5   Active Ratio (%)     36 non-null     float64
 6   Discharge Ratio (%)  36 non-null     float64
 7   Death Ratio (%)      36 non-null     float64
dtypes: float64(3), int64(4), object(1)
memory usage: 2.4+ KB


In [21]:
dataset.describe(include = 'all')

Unnamed: 0,State/UTs,Total Cases,Active,Discharged,Deaths,Active Ratio (%),Discharge Ratio (%),Death Ratio (%)
count,36,36.0,36.0,36.0,36.0,36.0,36.0,36.0
unique,36,,,,,,,
top,Odisha,,,,,,,
freq,1,,,,,,,
mean,,896829.4,10205.972222,874609.0,12014.416667,1.682222,97.052222,1.265
std,,1303563.0,30660.005408,1262310.0,23205.834381,3.33566,3.239175,0.564464
min,,7549.0,1.0,7419.0,4.0,0.01,81.44,0.04
25%,,69817.75,213.75,65981.25,803.75,0.0675,97.15,0.9475
50%,,464516.5,1062.5,454699.5,5322.5,0.57,98.2,1.32
75%,,998082.2,7369.5,982674.2,13579.5,1.3375,98.5975,1.6075


In [22]:
dataset.isnull().sum()

State/UTs              0
Total Cases            0
Active                 0
Discharged             0
Deaths                 0
Active Ratio (%)       0
Discharge Ratio (%)    0
Death Ratio (%)        0
dtype: int64

### Pandas Profiling

In [23]:
profile = ProfileReport(dataset, title = "Pandas Profiling Report")
profile.to_notebook_iframe()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

### Exploratory Data Analysis

In [25]:
cases = dataset.groupby('State/UTs')['Total Cases'].sum()
fig = px.bar(x = cases.index, y = cases.values, text = cases.values,
       labels = {'x': 'States', 'y' : 'Total Cases (in millions)'}, title = 'State Wise Total Cases')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(uniformtext_minsize = 2, uniformtext_mode='hide', width=1300, height=600)
fig.show()

**Q) Which state or union territory is having highest number of cases?**

* As we can see from the above graph(Acumulated figures for Cases) Maharashtra is the state with the highest number of Covid-19 cases.
* *Cases: 6,401,213*

**Q) Which state or union territory is having the least number of cases?**

* As we can see from the above graph (Acumulated figures for Cases) Andaman and Nicobar is the union territory with least number of covid-19 cases.
* *Cases: 7549*

In [26]:
cases = dataset.groupby('State/UTs')['Active'].sum()
fig = px.bar(x = cases.index, y = cases.values, text = cases.values,
       labels = {'x': 'States', 'y' : 'Total Cases (in millions)'}, title = 'State Wise Active Cases')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside', marker_color='brown')
fig.update_layout(uniformtext_minsize = 2, uniformtext_mode='hide', width=1300, height=600)
fig.show()

**Q) Which state or union territory is having highest number of Active cases?**

* As we can see from the above graph(Acumulated figures for Cases) Kerala is the state with the highest number of active Covid-19 cases.
* *Cases: 1,75,695*

**Q) Which state or union territory is having least number of Active cases?**

* As we can see from the above graph(Acumulated figures for Cases) Dadra and Nagar Haveli and Daman and Diu is the union territory with the least number of active Covid-19 cases.
* *Cases: 4*

In [27]:
cases = dataset.groupby('State/UTs')['Discharged'].sum()
fig = px.bar(x = cases.index, y = cases.values, text = cases.values,
       labels = {'x': 'States', 'y' : 'Total Cases (in millions)'}, title = 'State Wise Number of patients Discharged')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside', marker_color='green')
fig.update_layout(uniformtext_minsize = 2, uniformtext_mode='hide', width=1300, height=600)
fig.show()

**Q) Which state or union territory is having highest number of Recovery cases?**

* As we can see from the above graph(Acumulated figures for Cases) Maharashtra is the state with the highest recovery cases during Covid-19.
* *Recoveries: 6,201,168*

In [28]:
cases = dataset.groupby('State/UTs')['Deaths'].sum()
fig = px.bar(x = cases.index, y = cases.values, text = cases.values,
       labels = {'x': 'States', 'y' : 'Total Cases (in millions)'}, title = 'State Wise Deaths')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside', marker_color='red')
fig.update_layout(uniformtext_minsize = 2, uniformtext_mode='hide', width=1300, height=600)
fig.show()

**Q) Which state or union territory has shown most number of deaths?**

* As we can see from the above graph(Acumulated figures for deaths) Maharashtra is the country with highest number of Deaths.
* *Deaths: 1,35,255*

**Q) Which state or union territory has shown least number of deaths?**

* As we can see from the above graph(Acumulated figures for deaths) Dadra and Nagar Haveli and Daman and Diu is the union territory with lowest number of Deaths.
* *Deaths: 4*

In [29]:
cases = dataset.groupby('State/UTs')['Active Ratio (%)'].mean()
fig = px.line(x = cases.index, y = cases.values, text = cases.values,
       labels = {'x': 'States', 'y' : 'Active Ratio (in %)'}, title = 'State Wise Active Ratio')
fig.update_traces(textposition="top right")
fig.update_layout(width = 2000, height = 700)
fig.show()

From above, we can see that the active cases ratio of Mizoram is the highest at 18.19%, whereas Andaman and Nicobar and Madhya Pradesh has the least active cases ratio at 0.01%

In [30]:
cases = dataset.groupby('State/UTs')['Discharge Ratio (%)'].mean()
fig = px.line(x = cases.index, y = cases.values, text = cases.values,
       labels = {'x': 'States', 'y' : 'Discharge Ratio (in %)'}, title = 'State Wise Discharge Ratio')
fig.update_traces(textposition="top right")
fig.update_layout(width = 2000, height = 700)
fig.show()

From above, we can see that the discahrge ratio of Dadra and Nagar Haveli and Daman and Diu combined is the highest at 99.92%, whereas Mizoram has the least discahrge ratio at 81.44%

In [31]:
cases = dataset.groupby('State/UTs')['Death Ratio (%)'].mean()
fig = px.line(x = cases.index, y = cases.values, text = cases.values,
       labels = {'x': 'States', 'y' : 'Death Ratio (in %)'}, title = 'State Wise Death Ratio')
fig.update_traces(textposition="top right")
fig.update_layout(width = 2000, height = 700)
fig.show()

From above, we can see that the death ratio of Dadra and Nagar Haveli and Daman and Diu combined is least, whereas Punjab has the highest death ratio at 2.72%

In [32]:
fig = px.bar(x = dataset['State/UTs'], y = dataset['Total Cases'], color = dataset['Deaths'],
       labels = {'x': 'States', 'y' : 'Total Cases (in millions)', 'color': 'Deaths'}, 
             title = 'State Wise Total Cases considering the number of Deaths')
fig.update_layout(width=1300, height=600)
fig.show()

The state leading with the total number of cases and deaths at the same time, is the state of 'Maharashtra'.

In [33]:
fig = go.Figure()
df = pd.DataFrame({'Active Ratio': dataset['Active Ratio (%)'], 'Discharge Ratio': dataset['Discharge Ratio (%)'],
                  'Death Ratio': dataset['Death Ratio (%)']})

for col in df:
    fig.add_trace(go.Box(y = df[col].values, name = df[col].name))
fig.show()

From above, we can deduce that the Discharge ratio is quite exemplary for India, both the Active ratio and Death ratio are quite dormant.

### General Trends

In [34]:
px.scatter(x = dataset['Total Cases'], y = dataset['Deaths'], trendline="ols",
          labels = {'x': 'Total Cases (in millions)', 'y': 'Deaths'})

From above, it is clearly visible that as the total number of cases increase, the number of deaths also increase simultaneously.

Here we are setting the trendlline as "ols" which stands for "ordinary least square" and it is a type of linear least squares method for estimating the unknown parameters in a linear regression model

In [35]:
px.scatter(x = dataset['Total Cases'], y = dataset['Active'], trendline = "lowess",
          labels = {'x': 'Total Cases (in millions)', 'y': 'Active Cases'})

From above, it is clearly visible that as the total number of cases increase, the number of active cases also increase simultaneously.

Here we are setting the trendlline as "lowess" which stands for "locally weighted scatterplot smoothing", It is a strongly related non-parametric regression method that combines multiple regression models in a k-nearest-neighbor-based meta-model. It is a common method for local polynomial regression.

### Analysis of the Top 5 States

In [36]:
df = dataset.sort_values(by=['Total Cases'], ascending = False).head(5)
fig = go.Figure(data = [go.Pie(labels = df['State/UTs'], values = df['Total Cases'])])
fig.update_traces(hoverinfo = 'label+percent', textinfo = 'label+value', textfont_size=15, hole = 0.4,
                  marker=dict(line = dict(color = '#000000', width = 2)))
fig.update_layout( title_text="Top 5 States with Highest number of Cases")
fig.show()

In [37]:
df = dataset.sort_values(by=['Active'], ascending = False).head(5)
fig = go.Figure(data = [go.Pie(labels = df['State/UTs'], values = df['Active'])])
fig.update_traces(hoverinfo = 'label+percent', textinfo = 'label+value', textfont_size=15, hole = 0.4,
                  marker=dict(line = dict(color = '#000000', width = 2)))
fig.update_layout( title_text="Top 5 States with Highest number of Active Cases")
fig.show()

In [38]:
df = dataset.sort_values(by=['Discharged'], ascending = False).head(5)
fig = go.Figure(data = [go.Pie(labels = df['State/UTs'], values = df['Discharged'])])
fig.update_traces(hoverinfo = 'label+percent', textinfo = 'label+value', textfont_size=15, hole = 0.4,
                  marker=dict(line = dict(color = '#000000', width = 2)))
fig.update_layout( title_text="Top 5 States with Highest number of Active Cases")
fig.show()

### Individual Analysis (State-wise)

In [39]:
most_cases =  dataset[dataset['Total Cases'] == max(dataset['Total Cases'])]
df = most_cases.groupby(by = ['State/UTs']).sum().T
fig = go.Figure(data = [go.Pie(labels = df.index, values = df.values.flatten(), pull=[0, 0, 0.2, 0])])
fig.update_traces(hoverinfo = 'label+percent', textinfo = 'label+value', textfont_size=15, hole = 0.4,
                  marker=dict(line = dict(color = '#000000', width = 2)))
fig.update_layout( title_text="State with Highest number of Cases", 
                  annotations = [dict(text = df.T.index[0], x = 0.48, y = 0.5, font_size = 12, showarrow = False)])
fig.show()

In [40]:
most_cases =  dataset[dataset['Total Cases'] == min(dataset['Total Cases'])]
df = most_cases.groupby(by = ['State/UTs']).sum().T
fig = go.Figure(data = [go.Pie(labels = df.index, values = df.values.flatten(), pull=[0, 0, 0.2, 0])])
fig.update_traces(hoverinfo = 'label+percent', textinfo = 'label+value', textfont_size=15, hole = 0.4,
                  marker=dict(line = dict(color = '#000000', width = 2)))
fig.update_layout( title_text="State with Least number of Cases", 
                  annotations = [dict(text = df.T.index[0], x = 0.48, y = 0.5, font_size = 10, showarrow = False)])
fig.show()

In [41]:
most_cases =  dataset[dataset['Active'] == max(dataset['Active'])]
df = most_cases.groupby(by = ['State/UTs']).sum().T
fig = go.Figure(data = [go.Pie(labels = df.index, values = df.values.flatten(), pull=[0, 0, 0.2, 0])])
fig.update_traces(hoverinfo = 'label+percent', textinfo = 'label+value', textfont_size=15, hole = 0.4,
                  marker=dict(line = dict(color = '#000000', width = 2)))
fig.update_layout( title_text="State with Highest number of Active Cases", 
                  annotations = [dict(text = df.T.index[0], x = 0.48, y = 0.5, font_size = 12, showarrow = False)])
fig.show()

In [42]:
most_cases =  dataset[dataset['Active'] == min(dataset['Active'])]
df = most_cases.groupby(by = ['State/UTs']).sum().T
fig = go.Figure(data = [go.Pie(labels = df.index, values = df.values.flatten(), pull=[0, 0, 0.2, 0])])
fig.update_traces(hoverinfo = 'label+percent', textinfo = 'label+value', textfont_size=15, hole = 0.4,
                  marker=dict(line = dict(color = '#000000', width = 2)))
fig.update_layout( title_text="State with Least number of Active Cases", 
                  annotations = [dict(text = df.T.index[0], x = 0.48, y = 0.5, font_size = 10, showarrow = False)])
fig.show()

In [43]:
most_cases =  dataset[dataset['Deaths'] == max(dataset['Deaths'])]
df = most_cases.groupby(by = ['State/UTs']).sum().T
fig = go.Figure(data = [go.Pie(labels = df.index, values = df.values.flatten(), pull=[0, 0, 0.2, 0])])
fig.update_traces(hoverinfo = 'label+percent', textinfo = 'label+value', textfont_size=15, hole = 0.4,
                  marker=dict(line = dict(color = '#000000', width = 2)))
fig.update_layout( title_text="State with Highest Death Count", 
                  annotations = [dict(text = df.T.index[0], x = 0.48, y = 0.5, font_size = 12, showarrow = False)])
fig.show()

In [44]:
most_cases =  dataset[dataset['Deaths'] == min(dataset['Deaths'])]
df = most_cases.groupby(by = ['State/UTs']).sum().T
fig = go.Figure(data = [go.Pie(labels = df.index, values = df.values.flatten(), pull=[0, 0, 0.2, 0])])
fig.update_traces(hoverinfo = 'label+percent', textinfo = 'label+value', textfont_size=15, hole = 0.4,
                  marker=dict(line = dict(color = '#000000', width = 2)))
fig.update_layout( title_text="State with Least Death Count", 
                  annotations = [dict(text = 'Dadar Nagar and D&Diu', x = 0.48, y = 0.5, font_size = 8, showarrow = False)])
fig.show()

<hr>

## **Let's take a closer look at the Vaccination Drive in India**

### Basic Data Wrangling

In [47]:
dataset1 = pd.read_csv('./input/COVID-19 India Statewise Vaccine Data.csv')
dataset1.head()

Unnamed: 0,State/UTs,Dose 1,Dose 2,Total Vaccination Doses
0,Andaman and Nicobar,235835,98337,334172
1,Andhra Pradesh,18967840,6653976,25621816
2,Arunachal Pradesh,704206,204773,908979
3,Assam,12241611,2582327,14823938
4,Bihar,25869605,5016297,30885902


In [48]:
print("The shape of the dataset is",dataset1.shape)

The shape of the dataset is (37, 4)


In [49]:
dataset1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 4 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   State/UTs                37 non-null     object
 1   Dose 1                   37 non-null     int64 
 2   Dose 2                   37 non-null     int64 
 3   Total Vaccination Doses  37 non-null     int64 
dtypes: int64(3), object(1)
memory usage: 1.3+ KB


In [50]:
dataset1.describe(include = 'all')

Unnamed: 0,State/UTs,Dose 1,Dose 2,Total Vaccination Doses
count,37,37.0,37.0,37.0
unique,37,,,
top,Odisha,,,
freq,1,,,
mean,,11776220.0,3376540.0,15152760.0
std,,13022530.0,3602243.0,16437770.0
min,,51809.0,18408.0,70217.0
25%,,752894.0,256265.0,1009159.0
50%,,8478252.0,2201788.0,11203640.0
75%,,18967840.0,5037828.0,25621820.0


In [51]:
dataset1.isnull().sum()

State/UTs                  0
Dose 1                     0
Dose 2                     0
Total Vaccination Doses    0
dtype: int64

### Pandas Profiling

In [52]:
profile = ProfileReport(dataset1, title = "Pandas Profiling Report")
profile.to_notebook_iframe()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

### Exploratory Data Analysis

In [53]:
df = dataset1.sort_values(by = ['Dose 1'], ascending = False)
fig = px.line(df, x = 'State/UTs', y = 'Dose 1', title = "First Dose",
             labels = {'Dose 1': 'Number of First Dose administered'})
fig.update_layout(width=1300, height=600)
fig.show()

In [54]:
df = dataset1.sort_values(by = ['Dose 2'], ascending = False)
fig = px.line(df, x = 'State/UTs', y = 'Dose 2', title = "Second Dose",
             labels = {'Dose 2': 'Number of Second Dose administered'})
fig.update_layout(width=1300, height=700)
fig.show()

In [55]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=dataset1['State/UTs'], y=dataset1['Dose 1'],
                    mode='lines+markers',
                    name='First Dose'))
fig.add_trace(go.Scatter(x=dataset1['State/UTs'], y=dataset1['Dose 2'],
                    mode='lines+markers',
                    name='Second Dose'))
fig.update_layout(width=1300, height=700)
fig.show()

**Q) Which state or union territory is the most vaccinated?**

* As we can see from the above graph(Acumulated figures for vacinations) Uttar Pradesh is the state with highest number of Vaccinations.
* *Vaccinations: 60,606,763*

**Q) Which state or union territory is the least vaccinated?**

* As we can see from the above graph(Acumulated figures for vacinations) Lakshadweep is the union territory with lowest number of Vaccinations.
* *Vaccinations: 70,217*

### Analysis of the Top 10 States

In [56]:
df = dataset1.sort_values('Total Vaccination Doses', ascending = False)[:10]
fig = go.Figure(data=[go.Bar(x = df['State/UTs'], y = df['Total Vaccination Doses'], text = df['Total Vaccination Doses'])])
fig.update_traces(marker_color='rgb(100,200,200)', marker_line_color='rgb(0,0,0)',
                  marker_line_width=2, opacity=0.5, texttemplate = '%{text:.2s}', textposition = 'outside')
fig.update_layout(title_text='Top 10 States with the most Vaccinated Population', uniformtext_minsize = 12, uniformtext_mode='hide')
fig.show()

In [57]:
df = dataset1.sort_values('Dose 1', ascending = False)[:10]
fig = go.Figure(data=[go.Bar(x = df['State/UTs'], y = df['Dose 1'], text = df['Dose 1'])])
fig.update_traces(marker_color='rgb(100,100,200)', marker_line_color='rgb(0,0,0)',
                  marker_line_width=2, opacity=0.5, texttemplate = '%{text:.2s}', textposition = 'outside')
fig.update_layout(title_text='Top 10 states with the most number of First Dose administered', uniformtext_minsize = 12, uniformtext_mode='hide')
fig.show()

In [58]:
df = dataset1.sort_values('Dose 2', ascending = False)[:10]
fig = go.Figure(data=[go.Bar(x = df['State/UTs'], y = df['Dose 2'], text = df['Dose 2'])])
fig.update_traces(marker_color='rgb(100,300,150)', marker_line_color='rgb(0,0,0)',
                  marker_line_width=2, opacity=0.5, texttemplate = '%{text:.2s}', textposition = 'outside')
fig.update_layout(title_text='Top 10 states with the most number of Second Dose administered', uniformtext_minsize = 12, uniformtext_mode='hide')
fig.show()

In [59]:
df = dataset1.sort_values('Total Vaccination Doses', ascending = True)[:10]
fig = go.Figure(data=[go.Bar(x = df['State/UTs'], y = df['Total Vaccination Doses'], text = df['Total Vaccination Doses'])])
fig.update_traces(marker_color='rgb(100,200,200)', marker_line_color='rgb(0,0,0)',
                  marker_line_width=2, opacity=0.5, texttemplate = '%{text:.2s}', textposition = 'outside')
fig.update_layout(title_text='Top 10 States with the least Vaccinated Population', uniformtext_minsize = 12, uniformtext_mode='hide')
fig.show()

In [60]:
df = dataset1.sort_values('Dose 1', ascending = True)[:10]
fig = go.Figure(data=[go.Bar(x = df['State/UTs'], y = df['Dose 1'], text = df['Dose 1'])])
fig.update_traces(marker_color='rgb(100,100,200)', marker_line_color='rgb(0,0,0)',
                  marker_line_width=2, opacity=0.5, texttemplate = '%{text:.2s}', textposition = 'outside')
fig.update_layout(title_text='Top 10 states with the least number of First Dose administered', uniformtext_minsize = 12, uniformtext_mode='hide')
fig.show()

In [61]:
df = dataset1.sort_values('Dose 2', ascending = True)[:10]
fig = go.Figure(data=[go.Bar(x = df['State/UTs'], y = df['Dose 2'], text = df['Dose 2'])])
fig.update_traces(marker_color='rgb(100,300,150)', marker_line_color='rgb(0,0,0)',
                  marker_line_width=2, opacity=0.5, texttemplate = '%{text:.2s}', textposition = 'outside')
fig.update_layout(title_text='Top 10 states with the least number of Second Dose administered', uniformtext_minsize = 12, uniformtext_mode='hide')
fig.show()

<hr>

## **Forecasting New Cases and Deaths for India**

It was observed that the data needed to have limiting upper and lower values otherwise predictions tend to go negative while approaching flatness. Hence, the dataset at necessary points are given tentative floor and cap values on the required points.

About Prophet:

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.

In [64]:
dataset2 = pd.read_csv('./input/covid19_by_country.csv')
dataset2.head()

Unnamed: 0,Country,CountryAlpha3Code,Date,confirmed,deaths,recoveries,confirmed_inc,deaths_inc,recoveries_inc,ECR,GRTStringencyIndex,DaysSince1Cases,DaysSince100Cases,confirmed_PopPct,deaths_PopPct,recoveries_PopPct
0,Afghanistan,AFG,2020-01-22,0,0,0.0,41.0,0.0,6.0,0.0,0.0,-33,-66,0.0,0.0,0.0
1,Afghanistan,AFG,2020-01-23,0,0,0.0,0.0,0.0,0.0,0.0,0.0,-32,-65,0.0,0.0,0.0
2,Afghanistan,AFG,2020-01-24,0,0,0.0,0.0,0.0,0.0,0.0,0.0,-31,-64,0.0,0.0,0.0
3,Afghanistan,AFG,2020-01-25,0,0,0.0,0.0,0.0,0.0,0.0,0.0,-30,-63,0.0,0.0,0.0
4,Afghanistan,AFG,2020-01-26,0,0,0.0,0.0,0.0,0.0,0.0,0.0,-29,-62,0.0,0.0,0.0


In [65]:
dataset2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2819733 entries, 0 to 2819732
Data columns (total 16 columns):
 #   Column              Dtype  
---  ------              -----  
 0   Country             object 
 1   CountryAlpha3Code   object 
 2   Date                object 
 3   confirmed           int64  
 4   deaths              int64  
 5   recoveries          float64
 6   confirmed_inc       float64
 7   deaths_inc          float64
 8   recoveries_inc      float64
 9   ECR                 float64
 10  GRTStringencyIndex  float64
 11  DaysSince1Cases     int64  
 12  DaysSince100Cases   int64  
 13  confirmed_PopPct    float64
 14  deaths_PopPct       float64
 15  recoveries_PopPct   float64
dtypes: float64(9), int64(4), object(3)
memory usage: 344.2+ MB


In [66]:
dataset2.describe(include = 'all')

Unnamed: 0,Country,CountryAlpha3Code,Date,confirmed,deaths,recoveries,confirmed_inc,deaths_inc,recoveries_inc,ECR,GRTStringencyIndex,DaysSince1Cases,DaysSince100Cases,confirmed_PopPct,deaths_PopPct,recoveries_PopPct
count,2819733,2819733,2819733,2819733.0,2819733.0,2819733.0,2819733.0,2819733.0,2819733.0,2819733.0,2710901.0,2819733.0,2819733.0,2818587.0,2818587.0,2818587.0
unique,192,193,573,,,,,,,,,,,,,
top,US,USA,2020-05-13,,,,,,,,,,,,,
freq,1549392,1549392,4921,,,,,,,,,,,,,
mean,,,,9602711.0,198171.7,1448514.0,41287.74,760.9592,0.0003021563,0.02958784,54.71362,278.0461,248.2223,3.214012,0.06783137,0.6303311
std,,,,12279460.0,218702.9,3205724.0,56058.48,922.5106,356242.3,0.1159332,19.20189,166.7367,168.9478,3.819176,0.07183008,1.53398
min,,,,0.0,0.0,0.0,-348667.0,-1918.0,-30974750.0,-0.75,0.0,-573.0,-573.0,0.0,0.0,0.0
25%,,,,95150.0,4748.0,0.0,94.0,2.0,0.0,0.0007060036,46.3,135.0,105.0,0.007257195,0.0003451222,0.0
50%,,,,2802040.0,124540.0,78314.0,24528.0,491.0,29.0,0.004307544,56.48,278.0,249.0,1.365393,0.04525429,0.00604267
75%,,,,17412770.0,360228.0,1171447.0,56636.0,1123.0,14279.0,0.01246603,67.59,422.0,393.0,6.559611,0.1248644,0.4457541


In [67]:
dataset2.isnull().sum()

Country                    0
CountryAlpha3Code          0
Date                       0
confirmed                  0
deaths                     0
recoveries                 0
confirmed_inc              0
deaths_inc                 0
recoveries_inc             0
ECR                        0
GRTStringencyIndex    108832
DaysSince1Cases            0
DaysSince100Cases          0
confirmed_PopPct        1146
deaths_PopPct           1146
recoveries_PopPct       1146
dtype: int64

In [68]:
df = dataset2.copy()
df.set_index("Country", inplace = True)
data_India = df.loc[["India"]]
data_India = data_India.reset_index()
data_India.drop(['GRTStringencyIndex', 'confirmed_PopPct', 'deaths_PopPct', 'recoveries_PopPct'], inplace = True, axis = 1)
data_India.head(2)

Unnamed: 0,Country,CountryAlpha3Code,Date,confirmed,deaths,recoveries,confirmed_inc,deaths_inc,recoveries_inc,ECR,DaysSince1Cases,DaysSince100Cases
0,India,IND,2020-01-22,0,0,0.0,41.0,0.0,6.0,0.0,-8,-52
1,India,IND,2020-01-23,0,0,0.0,0.0,0.0,0.0,0.0,-7,-51


In [69]:
data = data_India.copy()
data = data[['Date', 'deaths_inc']]
data['Date']=pd.to_datetime(data['Date'])
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 573 entries, 0 to 572
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Date        573 non-null    datetime64[ns]
 1   deaths_inc  573 non-null    float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 9.1 KB


In [70]:
data=data.rename(columns={data.columns[0]:'ds', data.columns[1]:'y'})
plt.figure(figsize=(20,6))
plt.scatter(data['ds'], data['y'])
plt.title('Covid-19 deaths per day in India', size=20)
plt.xlabel('Dates')
plt.ylabel('No. of Deaths')
data['floor']=0                                     #here floor value is provided so as to avoid negative value predictions
data['cap']=6000                                    #here cap values are provided because it's necessary for logistic growth
data.head(2)

Unnamed: 0,ds,y,floor,cap
0,2020-01-22,0.0,0,6000
1,2020-01-23,0.0,0,6000


In [71]:
model= Prophet(changepoint_prior_scale=0.9, growth='logistic')      #growth is kept logistic so that the graph can be consistent along with changing points
model.fit(data)
x= model.make_future_dataframe(periods=30,freq='D')
x['floor']=0                                                  #here floor value is provided so as to avoid negative value predictions
x['cap']=6000                                                 #here cap values are provided because it's necessary for logistic growth
forecast=model.predict(x)
forecast.info()

INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 603 entries, 0 to 602
Data columns (total 18 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   ds                          603 non-null    datetime64[ns]
 1   trend                       603 non-null    float64       
 2   cap                         603 non-null    int64         
 3   floor                       603 non-null    int64         
 4   yhat_lower                  603 non-null    float64       
 5   yhat_upper                  603 non-null    float64       
 6   trend_lower                 603 non-null    float64       
 7   trend_upper                 603 non-null    float64       
 8   additive_terms              603 non-null    float64       
 9   additive_terms_lower        603 non-null    float64       
 10  additive_terms_upper        603 non-null    float64       
 11  weekly                      603 non-null    float64       

In [73]:
model.plot(forecast,figsize=(20,6));
plt.title("Covid-19 Deaths in India", size=30)
plt.xlabel("Date", size=20)
plt.ylabel("Deaths", size=20)
#Extra lines for number
forecast_plus=forecast.loc[[550,570,590,600]]
for a,b in zip(forecast_plus['ds'], forecast_plus['yhat']):
    plt.text(a, b, str(int(b)),bbox=dict(facecolor='green', alpha=0.4), fontsize=14)

**From the above forecast we can say that the death rate will steadily decline thus reaching a flatness point.**

In [74]:
data = data_India.copy()
data = data[['Date','confirmed_inc']]
data['Date']=pd.to_datetime(data['Date'])
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 573 entries, 0 to 572
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Date           573 non-null    datetime64[ns]
 1   confirmed_inc  573 non-null    float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 9.1 KB


In [75]:
import matplotlib.pyplot as plt
data=data.rename(columns={data.columns[0]:'ds', data.columns[1]:'y'})
plt.figure(figsize=(20,7))
plt.scatter(data['ds'], data['y'])
plt.title('Covid-19 cases day in India', size=20)
plt.xlabel('Dates')
plt.ylabel('No. of cases')
data['floor']=0                              #here floor value is provided so as to avoid negative value predictions
data['cap']=500000                          #here cap values are provided because it's necessary for logistic growth

In [76]:
from fbprophet import Prophet
model2= Prophet(changepoint_prior_scale=0.6, growth='logistic')     #growth is kept logistic so that the graph can be consistent along with changing points
model2.fit(data)
x2= model2.make_future_dataframe(periods=30,freq='D')
x2['floor']=0                                                           #here floor value is provided so as to avoid negative value predictions
x2['cap']=500000                                                        #here cap values are provided because it's necessary for logistic growth
forecast2=model2.predict(x2)
forecast.info()

INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 603 entries, 0 to 602
Data columns (total 18 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   ds                          603 non-null    datetime64[ns]
 1   trend                       603 non-null    float64       
 2   cap                         603 non-null    int64         
 3   floor                       603 non-null    int64         
 4   yhat_lower                  603 non-null    float64       
 5   yhat_upper                  603 non-null    float64       
 6   trend_lower                 603 non-null    float64       
 7   trend_upper                 603 non-null    float64       
 8   additive_terms              603 non-null    float64       
 9   additive_terms_lower        603 non-null    float64       
 10  additive_terms_upper        603 non-null    float64       
 11  weekly                      603 non-null    float64       

In [77]:
model2.plot(forecast2,figsize=(20,6), xlabel='Dates', ylabel='Covid-19 Cases');
plt.title("Covid-19 Cases", size=30)
plt.xlabel("Date", size=20)
plt.ylabel("Cases", size=20)
#Extra lines for number
forecast_plus2=forecast2.loc[[550,570,590,600]]
for a,b in zip(forecast_plus2['ds'], forecast_plus2['yhat']):
  plt.text(a, b, str(int(b)),bbox=dict(facecolor='green', alpha=0.4), fontsize=14)

**From the above forecast we can say that the new cases will steadily decline thus flattening the curve eventually.**

**Final Analysis**

In [78]:
cases = dataset['Total Cases'].sum()
deaths = dataset['Deaths'].sum()
vaccinations = dataset1['Total Vaccination Doses'].sum()
print('Current acumulated Indian figures:')
print('Cases:\t\t',f'{int(cases):,}')
print('Deaths:\t\t',f'{int(deaths):,}')
print('Vaccinations:\t',f'{int(vaccinations):,}')

Current acumulated Indian figures:
Cases:		 32,285,857
Deaths:		 432,519
Vaccinations:	 560,652,030
