# Timeline

<img src="https://i.ibb.co/KKVVDjW/Image-Tittle.jpg" alt="Image-Tittle" border="0">

<img src="https://i.ibb.co/HKtKPy6/announcments.jpg" alt="announcments" border="0">

# Declaration

* According to MoHFW https://www.mohfw.gov.in/ some new cases have been reassigned states as per latest information
* Confirmed cases includes both Indian and Foriegn citizen
* Cured cases includes cured, discharged and migrated patients 

## Data Source
* https://www.mohfw.gov.in/


### GitHub Repository for this Notebook
<a href="https://github.com/sreyaz01/covid-19-india-data-analysis"><img src="https://i.ibb.co/B3vvzTy/0-s-Y-XTIBzlfd2zskq.png" alt="0-s-Y-XTIBzlfd2zskq" border="0" width="300" height="100"></a>

In [49]:
#!conda install -c conda-forge cufflinks-py

# Libraries

In [50]:
## utility libraries
from IPython.core.display import HTML
from datetime import datetime
from datetime import timedelta


# storing and anaysis
import pandas as pd
import geopandas as gpd
import numpy as np

#Visualization Libraries
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import style

import plotly
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.offline as pyo
from plotly.offline import init_notebook_mode,plot,iplot

import folium
import seaborn as sns

import cufflinks as cf

# Warning
import warnings
warnings.filterwarnings('ignore')


print('Pandas Version' , pd.__version__)
print('Matplotlib Version' , matplotlib.__version__)
print('Plotly Version' , plotly.__version__)
print('Seaborn Version' , sns.__version__)

Pandas Version 1.0.1
Matplotlib Version 3.2.1
Plotly Version 4.5.4
Seaborn Version 0.9.0


In [51]:
# setting up some setting for libraries
%matplotlib inline
plt.rcParams['figure.figsize'] = 17,8
pyo.init_notebook_mode(connected=True)
cf.go_offline()

#style.use('ggplot')

In [52]:
# color pallette
cnf = '#393e46' # confirmed - grey
dth = '#ff2e63' # death - red
rec = '#21bf73' # recovered - cyan
act = '#fe9801' # active case - yellow

# Dataset 

In [53]:
%ls C:\Users\sreya\Desktop\Pythone_File\COVID-19_INDIA\input\covid19-corona-virus-india-dataset

 Volume in drive C has no label.
 Volume Serial Number is 0821-B132

 Directory of C:\Users\sreya\Desktop\Pythone_File\COVID-19_INDIA\input\covid19-corona-virus-india-dataset

25-04-2020  22:38    <DIR>          .
25-04-2020  22:38    <DIR>          ..
26-04-2020  20:04             8,790 cases_over_time_flourish.csv
26-04-2020  20:03            66,363 complete.csv
10-04-2020  16:57         1,219,997 patients_data.csv
24-04-2020  23:26               731 pop2018.csv
               4 File(s)      1,295,881 bytes
               2 Dir(s)  198,980,116,480 bytes free


In [54]:
#importing data
df = pd.read_csv(r'C:\Users\sreya\Desktop\Pythone_File\COVID-19_INDIA\input\covid19-corona-virus-india-dataset\complete.csv',
                parse_dates = ['Date'])

df.tail()

Unnamed: 0,Date,Name of State / UT,Total Confirmed cases (Indian National),Total Confirmed cases ( Foreign National ),Cured/Discharged/Migrated,Latitude,Longitude,Death,Total Confirmed cases
1298,2020-04-26,Telengana,0,0,280,18.1124,79.0193,26,991
1299,2020-04-26,Tripura,0,0,2,23.9408,91.9882,0,2
1300,2020-04-26,Uttar Pradesh,0,0,289,26.8467,80.9462,29,1843
1301,2020-04-26,Uttarakhand,0,0,26,30.0668,79.0193,0,50
1302,2020-04-26,West Bengal,0,0,105,22.9868,87.855,18,611


In [55]:
df.columns

Index(['Date', 'Name of State / UT', 'Total Confirmed cases (Indian National)',
       'Total Confirmed cases ( Foreign National )',
       'Cured/Discharged/Migrated', 'Latitude', 'Longitude', 'Death',
       'Total Confirmed cases'],
      dtype='object')

In [56]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1303 entries, 0 to 1302
Data columns (total 9 columns):
 #   Column                                      Non-Null Count  Dtype         
---  ------                                      --------------  -----         
 0   Date                                        1303 non-null   datetime64[ns]
 1   Name of State / UT                          1303 non-null   object        
 2   Total Confirmed cases (Indian National)     1303 non-null   int64         
 3   Total Confirmed cases ( Foreign National )  1303 non-null   int64         
 4   Cured/Discharged/Migrated                   1303 non-null   int64         
 5   Latitude                                    1303 non-null   float64       
 6   Longitude                                   1303 non-null   float64       
 7   Death                                       1303 non-null   int64         
 8   Total Confirmed cases                       1303 non-null   int64         
dtypes: dateti

## State Map data

In [57]:
# shape files
# map_data = gpd.read_file(r'C:\Users\sreya\Desktop\Pythone_File\COVID-19_INDIA\input\india-district-wise-shape-files\output.shp')

# map_data.head()

In [58]:
#State Wise Grouping Data
# states = map_data.dissolve(by='statename').reset_index()
# states.head()

In [59]:
# states['statename'] = states['statename'].str.replace('&', 'and')
# states['statename'] = states['statename'].str.replace('NCT of ', '')
# states['statename'] = states['statename'].str.replace('Chhatisgarh', 'Chhattisgarh')
# states['statename'] = states['statename'].str.replace('Orissa', 'Odisha')
# states['statename'] = states['statename'].str.replace('Pondicherry', 'Puducherry')

# states['statename'].unique()

# Preprocessing


## Cleaning

In [60]:
df_clean = df[['Date', 'Name of State / UT', 'Latitude', 'Longitude', 'Total Confirmed cases', 'Death', 'Cured/Discharged/Migrated']]
df_clean.columns = ['Date', 'State/UT', 'Latitude', 'Longitude', 'Confirmed', 'Deaths', 'Cured']

df_clean['Date'] = df_clean['Date'].dt.date

df_clean['Active'] = df_clean['Confirmed']-(df_clean['Deaths']+df_clean['Cured'])
df_clean['Mortality Rate'] = df_clean['Deaths']/df_clean['Confirmed']
df_clean['Recovery Rate'] = df_clean['Cured']/df_clean['Confirmed']
df_clean.tail()

Unnamed: 0,Date,State/UT,Latitude,Longitude,Confirmed,Deaths,Cured,Active,Mortality Rate,Recovery Rate
1298,2020-04-26,Telengana,18.1124,79.0193,991,26,280,685,0.026236,0.282543
1299,2020-04-26,Tripura,23.9408,91.9882,2,0,2,0,0.0,1.0
1300,2020-04-26,Uttar Pradesh,26.8467,80.9462,1843,29,289,1525,0.015735,0.15681
1301,2020-04-26,Uttarakhand,30.0668,79.0193,50,0,26,24,0.0,0.52
1302,2020-04-26,West Bengal,22.9868,87.855,611,18,105,488,0.02946,0.171849


## Finding Latest Data Only

In [61]:
latest = df_clean[df_clean['Date']==max(df_clean['Date'])]


total_confirm = latest['Confirmed'].sum()
total_active = latest['Active'].sum()
total_cured = latest['Cured'].sum()
total_death = latest['Deaths'].sum()


now  = datetime.now().strftime("%B %d, %Y")

print(u"\u2022",f'Total Number of Confirmed Covid 2019 Cases across India till date ({now}):', total_confirm)
print(u"\u2022",f'Total Number of Active Cases till date ({now}):', total_active)
print(u"\u2022",f'Total Number of Cured Cases across India till date ({now}):', total_cured)
print(u"\u2022",f'Total Number of Deaths across India till date ({now}):', total_death)

• Total Number of Confirmed Covid 2019 Cases across India till date (April 26, 2020): 26605
• Total Number of Active Cases till date (April 26, 2020): 19865
• Total Number of Cured Cases across India till date (April 26, 2020): 5914
• Total Number of Deaths across India till date (April 26, 2020): 826


In [62]:
tm = latest.melt(id_vars="Date", value_vars=['Active', 'Deaths', 'Cured'])
tm.head()

Unnamed: 0,Date,variable,value
0,2020-04-26,Active,22
1,2020-04-26,Active,835
2,2020-04-26,Active,0
3,2020-04-26,Active,16
4,2020-04-26,Active,203


In [63]:
fig = px.treemap(tm, path=["variable"], values="value",height=250, width=800,
                 color_discrete_sequence=[act, rec, dth], title='Latest Stats')

fig.data[0].textinfo = 'label+value+text'
fig.show()

In [64]:
latest.head()

Unnamed: 0,Date,State/UT,Latitude,Longitude,Confirmed,Deaths,Cured,Active,Mortality Rate,Recovery Rate
1271,2020-04-26,Andaman and Nicobar Islands,11.7401,92.6586,33,0,11,22,0.0,0.333333
1272,2020-04-26,Andhra Pradesh,15.9129,79.74,1097,31,231,835,0.028259,0.210574
1273,2020-04-26,Arunachal Pradesh,28.218,94.7278,1,0,1,0,0.0,1.0
1274,2020-04-26,Assam,26.2006,92.9376,36,1,19,16,0.027778,0.527778
1275,2020-04-26,Bihar,25.0961,85.3131,251,2,46,203,0.007968,0.183267


In [65]:
temp = latest.groupby(by = ['State/UT']).sum()

temp.tail()

Unnamed: 0_level_0,Latitude,Longitude,Confirmed,Deaths,Cured,Active,Mortality Rate,Recovery Rate
State/UT,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Telengana,18.1124,79.0193,991,26,280,685,0.026236,0.282543
Tripura,23.9408,91.9882,2,0,2,0,0.0,1.0
Uttar Pradesh,26.8467,80.9462,1843,29,289,1525,0.015735,0.15681
Uttarakhand,30.0668,79.0193,50,0,26,24,0.0,0.52
West Bengal,22.9868,87.855,611,18,105,488,0.02946,0.171849


In [66]:
temp = temp[['Confirmed','Deaths','Cured','Active','Mortality Rate','Recovery Rate']]
temp.sort_values('Confirmed',ascending=False,inplace = True)
#temp.head()

temp.style\
    .background_gradient(cmap="Blues", subset=['Active','Confirmed'])\
    .background_gradient(cmap="Greens", subset=['Cured', 'Recovery Rate'])\
    .background_gradient(cmap="Reds", subset=['Deaths', 'Mortality Rate'])

Unnamed: 0_level_0,Confirmed,Deaths,Cured,Active,Mortality Rate,Recovery Rate
State/UT,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Maharashtra,7628,323,1076,6229,0.042344,0.141059
Gujarat,3071,133,282,2656,0.043308,0.091827
Delhi,2625,54,869,1702,0.020571,0.331048
Madhya Pradesh,2096,99,210,1787,0.047233,0.100191
Rajasthan,2083,33,493,1557,0.015843,0.236678
Uttar Pradesh,1843,29,289,1525,0.015735,0.15681
Tamil Nadu,1821,23,960,838,0.01263,0.527183
Andhra Pradesh,1097,31,231,835,0.028259,0.210574
Telengana,991,26,280,685,0.026236,0.282543
West Bengal,611,18,105,488,0.02946,0.171849


In [67]:
temp.columns

Index(['Confirmed', 'Deaths', 'Cured', 'Active', 'Mortality Rate',
       'Recovery Rate'],
      dtype='object')

In [68]:
#Visualization
temp_1 = temp[['Confirmed', 'Deaths', 'Cured']]

temp_1.iplot(kind = 'bar',xTitle= 'State/UT' , yTitle='Numbers of Cases',mode = 'markers+lines',
            title = f'Cases State Wise on {now}')

In [69]:
temp_2 = temp[['Mortality Rate','Recovery Rate']]
               
temp_2.iplot(kind ='scatter',xTitle='State/UT',yTitle='Avrage',title = f'Mortality and Recovery Rate on {now}',
             mode = 'markers', size = 5)

In [70]:
# Date wise data visualization whole country

temp = df_clean.groupby(by = ['Date']).sum()
temp.drop(['Latitude','Longitude','Mortality Rate','Recovery Rate'],axis=1,inplace=True)

temp.tail()

Unnamed: 0_level_0,Confirmed,Deaths,Cured,Active
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-04-22,18985,603,3260,15122
2020-04-23,21393,681,4258,16454
2020-04-24,23077,718,4749,17610
2020-04-25,24893,779,5210,18904
2020-04-26,26605,826,5914,19865


In [71]:
temp.iplot(title = 'Covid-19 Growth in India', yTitle='Cases',size=5,mode='markers+lines')

## No. of New Cases EveryDay After 50

In [72]:
#df_clean.columns

In [73]:
cases_df = df_clean.groupby('Date')['Confirmed', 'Deaths'].sum()

filt_cnf = (cases_df['Confirmed'] >= 50)

temp = cases_df[filt_cnf].diff().dropna()

col_y = ['Confirmed','Deaths']
colr = [cnf,dth]

for i,x in enumerate(col_y):
    temp.iplot(kind = 'scatter',mode = "markers+lines" ,size = 5,y = col_y[i],color=colr[i],yTitle=col_y[i],title=f'New {col_y[i]} Cases after Crossing 50 Confirmed Cases')

## Top 10 States

In [74]:
top_10 = latest.groupby(by = ['State/UT']).agg({'Confirmed': 'sum', 'Deaths': 'sum', 'Cured' : 'sum', 'Active' : 'sum'})\
                .nlargest(10,['Confirmed','Deaths','Cured','Active'])

top_10

Unnamed: 0_level_0,Confirmed,Deaths,Cured,Active
State/UT,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Maharashtra,7628,323,1076,6229
Gujarat,3071,133,282,2656
Delhi,2625,54,869,1702
Madhya Pradesh,2096,99,210,1787
Rajasthan,2083,33,493,1557
Uttar Pradesh,1843,29,289,1525
Tamil Nadu,1821,23,960,838
Andhra Pradesh,1097,31,231,835
Telengana,991,26,280,685
West Bengal,611,18,105,488


In [75]:
#Creating Figures
plot_c = px.bar(top_10.sort_values('Confirmed') ,x="Confirmed",y = top_10.sort_values('Confirmed').index,
               text='Confirmed', orientation='h', color_discrete_sequence = [cnf])

plot_d = px.bar(top_10.sort_values('Deaths'),x="Deaths",y = top_10.sort_values('Deaths').index,
               text='Deaths', orientation='h', color_discrete_sequence = [dth])

plot_r = px.bar(top_10.sort_values('Cured'),x="Cured",y = top_10.sort_values('Cured').index,
               text='Cured', orientation='h', color_discrete_sequence = [rec])

plot_a = px.bar(top_10.sort_values('Active'),x="Active",y = top_10.sort_values('Active').index,
               text='Active', orientation='h', color_discrete_sequence = [act])


# plot
fig = make_subplots(rows=2, cols=2, shared_xaxes=False, horizontal_spacing=0.14, vertical_spacing=0.08,
                    subplot_titles=('Confirmed cases', 'Deaths reported', 'Recovered', 'Active cases'))

fig.add_trace(plot_c['data'][0],row=1, col=1)
fig.add_trace(plot_d['data'][0],row=1, col=2)
fig.add_trace(plot_r['data'][0],row=2, col=1)
fig.add_trace(plot_a['data'][0],row=2, col=2)

fig.update_layout(height=600 ,title_text="Top 10 States ")


# Cases Rises in State Over Time

https://app.flourish.studio/visualisation/1977187/edit

In [91]:
HTML('<div class="flourish-embed flourish-bar-chart-race" data-src="visualisation/1977187" data-url="https://flo.uri.sh/visualisation/1977187/embed"><script src="https://public.flourish.studio/resources/embed.js"></script></div>')

In [77]:
df_clean.columns

Index(['Date', 'State/UT', 'Latitude', 'Longitude', 'Confirmed', 'Deaths',
       'Cured', 'Active', 'Mortality Rate', 'Recovery Rate'],
      dtype='object')

In [78]:
#Deaths,Cured ,Active Cases Date Wise

col = ['Deaths','Cured','Active']

for i,val in enumerate(col):
    p_df = pd.pivot_table(df_clean,index  = 'Date', values = val, columns ='State/UT').fillna(0).astype('int').reset_index()
    p_df.iplot(x = 'Date' ,title = col[i], xTitle = 'Date',yTitle = 'Cases')

# Geographical Map

In [79]:
latest.columns

Index(['Date', 'State/UT', 'Latitude', 'Longitude', 'Confirmed', 'Deaths',
       'Cured', 'Active', 'Mortality Rate', 'Recovery Rate'],
      dtype='object')

In [80]:
geo_map = folium.Map([20.5937,78.9629],zoom_start=4,tiles ='cartodbpositron' )

for lat,long,active,deaths,cured,name in zip(latest['Latitude'],latest['Longitude'],\
                                             latest['Active'],latest['Deaths'],\
                                             latest['Cured'],latest['State/UT']):

    folium.CircleMarker([lat,long],radius=active*0.01\
                       ,tooltip = (f'''<strong>name</strong>: {str(name).capitalize()} <br>
                               <strong>Active</strong>: {str(active)}<br>
                               <strong>Deaths</strong>: {str(deaths)}<br>
                               <strong>Cured</strong>: {str(cured)}<br>''')\
                       ,color = 'red',fill_color = 'red',fill_opacity=0.3).add_to(geo_map)

geo_map


## No of State affected Over Time

In [81]:
grp_states = df_clean.groupby('Date')['State/UT']
affected_states = grp_states.unique().apply(len).values
#affected_states


dates = grp_states.unique().apply(len).index
#dates

In [82]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=dates, y=[36 for i in range(len(affected_states))], 
                         mode='lines', name='Total no. of States+UT', 
                         line = dict(color='#222831', dash='longdashdot')))

fig.add_trace(go.Scatter(x=dates, y=affected_states, hoverinfo='x+y',
                         mode='lines', name='No. of affected States+UT', 
                         line = dict(color='#c70039')))

fig.update_layout(title='No. of affected States/UT over Time', 
                  xaxis_title='Dates', yaxis_title='No. of affected States/UT')
fig.show()

## Confirmed Vs Deaths

In [83]:
latest_cnf_dth = latest[latest['Confirmed']>10]
latest_cnf_dth = latest_cnf_dth[['State/UT','Confirmed','Deaths']]

px.scatter(latest_cnf_dth,x = 'Confirmed', y = 'Deaths', color ='State/UT', size = "Confirmed",log_x=True, title = 'Confirmed vs Deaths')



### Estimate Population 2018 vs Confirmed Cases
<br>
<li> <b> Data Source :</b> </li>
    http://statisticstimes.com/demographics/population-of-indian-states.php

In [84]:
# Reading Popolation Data

pop2018 = pd.read_csv(r'C:\Users\sreya\Desktop\Pythone_File\COVID-19_INDIA\input\covid19-corona-virus-india-dataset/pop2018.csv')

pop2018.rename(columns = {'State': 'State/UT'},inplace = True)
pop2018.dtypes

pop2018['State/UT'].replace('Telangana', 'Telengana', inplace=True)
pop2018['State/UT'].replace('Jammu & Kashmir', 'Jammu and Kashmir', inplace=True)
pop2018['State/UT'].replace('A.& N.Islands', 'Andaman and Nicobar Islands', inplace=True)

In [85]:
states_l = []
for state in latest['State/UT'].tolist():
    latest_state_list = pop2018['State/UT'].tolist()
    if state in latest_state_list:
        pass
    else:
        print(state, 'is not in latest DataFrame')
        
        
len(states_l)

Ladakh is not in latest DataFrame


0

In [86]:
pop2018.shape

(36, 2)

In [87]:
latest.shape

(32, 10)

In [88]:
merge = pd.merge(latest,pop2018, on = 'State/UT')

In [89]:
merge

Unnamed: 0,Date,State/UT,Latitude,Longitude,Confirmed,Deaths,Cured,Active,Mortality Rate,Recovery Rate,2018
0,2020-04-26,Andaman and Nicobar Islands,11.7401,92.6586,33,0,11,22,0.0,0.333333,419978
1,2020-04-26,Andhra Pradesh,15.9129,79.74,1097,31,231,835,0.028259,0.210574,52883163
2,2020-04-26,Arunachal Pradesh,28.218,94.7278,1,0,1,0,0.0,1.0,1528296
3,2020-04-26,Assam,26.2006,92.9376,36,1,19,16,0.027778,0.527778,34586234
4,2020-04-26,Bihar,25.0961,85.3131,251,2,46,203,0.007968,0.183267,119461013
5,2020-04-26,Chandigarh,30.7333,76.7794,30,0,17,13,0.0,0.566667,1126705
6,2020-04-26,Chhattisgarh,21.2787,81.8661,37,0,32,5,0.0,0.864865,28566990
7,2020-04-26,Delhi,28.7041,77.1025,2625,54,869,1702,0.020571,0.331048,18345784
8,2020-04-26,Goa,15.2993,74.124,7,0,7,0,0.0,1.0,1542750
9,2020-04-26,Gujarat,22.2587,71.1924,3071,133,282,2656,0.043308,0.091827,63907200


In [90]:
Ladakh = 290,492
Andaman and Nicobar Islands = 380,581

SyntaxError: invalid syntax (<ipython-input-90-0070294af5f0>, line 2)