The first case of the 2019â€“20 coronavirus pandemic in India was reported on 30 January 2020, originating from China. Experts suggest the number of infections could be much higher as India's testing rates are among the lowest in the world. The infection rate of COVID-19 in India is reported to be 1.7, significantly lower than in the worst affected countries.

The outbreak has been declared an epidemic in more than a dozen states and union territories, where provisions of the Epidemic Diseases Act, 1897 have been invoked, and educational institutions and many commercial establishments have been shut down. India has suspended all tourist visas, as a majority of the confirmed cases were linked to other countries.

On 22 March 2020, India observed a 14-hour voluntary public curfew at the instance of the prime minister Narendra Modi. The government followed it up with lockdowns in 75 districts where COVID cases had occurred as well as all major cities. Further, on 24 March, the prime minister ordered a nationwide lockdown for 21 days, affecting the entire 1.3 billion population of India.

In [None]:
from IPython.core.display import HTML
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline

from datetime import datetime, timedelta
from urllib.request import urlopen
import folium
import geopandas as gpd




# hide warnings
import warnings
warnings.filterwarnings('ignore')

df1 = pd.read_csv('../input/covid19-corona-virus-india-dataset/complete.csv',parse_dates=['Date'])
df2 = pd.read_csv('../input/covid19-corona-virus-india-dataset/patients_data.csv',parse_dates=['date_announced','status_change_date','estimated_onset_date'])


In [None]:
HTML('''<div class="flourish-embed flourish-cards" data-src="visualisation/1786965" data-url="https://flo.uri.sh/visualisation/1786965/embed"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

In [None]:
## We will do some data cleaning activities on both the datasets since both are public datasets and might have missing/mistyped/unwanted values
## df1 dataset seems a clean one, all the features are relevant for making visualizations. Also df1 contains no missing values.
df1.isnull().sum()

In [None]:
## We will do some basic data changes.
## For instance executing the following command : df1['Name of State / UT'].unique() displays values such as
## Union Territory of Chandigarh and Union Territory of Jammu and Kashmir
## We will remove the words 'Union Territory of' and will simply keep the state name
## For example: 'Union Territory of Jammu and Kashmir' will become 'Jammu and Kashmir'

df1['State'] = df1['Name of State / UT'].str.replace('Union Territory of ','')

## Also we do not require the following features for our current visualization: 
## Total Confirmed cases (Indian National) 
## Total Confirmed cases ( Foreign National ) 
## So we will be droppin those columns
df1.drop(['Total Confirmed cases (Indian National)','Total Confirmed cases ( Foreign National )'], axis = 1, inplace = True)

## After we check the existing column names, we will do a slightly more intuitive column renaming. But before that we should rearrange
## the column ordering
## For checking -> df1.columns
## Rearranging:
df1 = df1[['Date', 'Name of State / UT', 'Latitude', 'Longitude', 'Total Confirmed cases', 'Death', 'Cured/Discharged/Migrated']]
## Renaming
df1.columns = ['Date', 'State/UT', 'Latitude', 'Longitude', 'Confirmed', 'Deaths', 'Cured']

## Next we will do some feature engineering
## We will create three new features - > 'Active', 'Mortality rate','Recovery rate'
df1['Active'] = df1['Confirmed'] - df1['Cured']- df1['Deaths']

## Next let us rearrange once more
df1= df1[['Date', 'State/UT', 'Latitude', 'Longitude', 'Confirmed', 'Active', 'Deaths', 'Cured']]

## df1[df1['State/UT'] == 'Kerala']

## Lets view the final dataframe
df1.head(5)
## So our dataset contains daily date wise count of covid-19 instances statewise.
##. eg:  Date	    State/UT	Latitude	Longitude	Confirmed	Active	Deaths	Cured
##      2020-04-23	Kerala	    10.8505	    76.2711	      438	      112	 3	    323

# Summary of Covid outcomes at a glimpse

In [None]:
summary_data = df1[df1['Date'] == max(df1['Date'])].reset_index(drop = True)
summary_data = summary_data.groupby(['Date'], sort=False)['Confirmed','Active','Cured','Deaths'].sum().reset_index()
summary_data['Date'] = summary_data['Date'].dt.strftime('%d %B %Y')
summary_data.style.format({"Total Confirmed Cases": "{:,.0f}", "Active Cases": "{:,.0f}", "Cured/Discharged/Migrated": "{:,.0f}", "Death": "{:,.0f}"})


styles = [
    dict(selector="th", props=[("font-size", "120%"),
                               ("text-align", "center"),
                              ("font-weight", "normal"),
                              ("color", "grey")]),
    dict(selector="td", props=[("font-size", "300%"),
                               ("text-align", "center"),
                              ("background-color", "white"),
                           ("color", "dodgerblue")]),
    #dict(selector=".row_heading, .blank", props=[("display", "none;")])
]

html = (summary_data.style.set_table_styles(styles))
html

# HORIZONTAL BAR PLOT

In [None]:

plt.figure(figsize=(10, 7))
plt.barh('Confirmed Cases',df1['Confirmed'].sum())
plt.barh('Active Cases', df1['Active'].sum())
plt.barh('Deaths',df1['Deaths'].sum())
plt.barh('Cured',df1['Cured'].sum())
plt.title('# of Coronavirus Confirmed Cases', size=20)
plt.xticks(size=10)
plt.yticks(size=10)
plt.show()

# Feature Extraction

In [None]:
## We also wish to add one more feature 'New Cases'. It will be derived by subtracting the current day's number of
## cases and the previous days number of cases.
curr_day = df1[df1['Date'] == max(df1['Date'])].set_index('State/UT')
prev_day = df1[df1['Date'] == max(df1['Date'])- timedelta(days= 1)].set_index('State/UT')
new_cases = curr_day['Confirmed'] - prev_day['Confirmed']
new_cases = new_cases.fillna(0)
new_cases

In [None]:
## So we will create a new dataset that will have two additional columns namely 'Mortality Rate' and 'Recovery Rate'
## From the current dataframe, we will choose the row corresponding to the maximum date, so that we can calculate the latest 'Mortality Rate' and 'Recovery Rate'
## Since the rows will correspond to the latest rows from the original dataset, so the indexes will be also from the old dataset.
## Hence we should be re-setting the indexes as well
df = df1[df1['Date'] == max(df1['Date'])].reset_index(drop = True)

## We are adding the feature 'new_cases'
df['New Cases'] = df['State/UT'].map(new_cases)

## Next we are adding two new features that we have derived.They are 'Mortality Rate' and 'Recovery Rate'.
df['Mortality Rate'] = df['Deaths']/df['Confirmed']
df['Recovery Rate'] =  df['Cured']/df['Confirmed']

## We are dropping the date column, since it contains the same date. (maximum date)
df.drop( ['Date'], axis = 1, inplace = True )
# fix datatype
# for i in ['Confirmed', 'Deaths', 'Cured']:
#    df[i] = df[i].astype('int')


# 1. View By Background Gradient

In [None]:
temp = df[['State/UT', 'Confirmed', 'Active', 'New Cases', 'Deaths', 'Mortality Rate', 'Cured', 'Recovery Rate']]
temp = temp.sort_values('Confirmed', ascending=False).reset_index(drop=True)

temp.style\
    .background_gradient(cmap="Blues", subset=['Confirmed', 'Active', 'New Cases'])\
    .background_gradient(cmap="Greens", subset=['Cured', 'Recovery Rate'])\
    .background_gradient(cmap="Reds", subset=['Deaths', 'Mortality Rate'])

# 2. View By Choropleth

In [None]:
## Lets us visualize the covid-19 affected statewise in a geographical map
## We will be using python folium library
## https://coderzcolumn.com/tutorials/data-science/interactive-maps-choropleth-scattermap-using-folium
## https://thedatafrog.com/en/articles/choropleth-maps-python/

## To create a choropleth, we need two files. The first file will be the shape file and the second file will be the data file.
## The shape file contains the coordinate information related to the outlines of the map

## Shape files

## .shp files can be read using geopandas using .read_file() function
## .shp files contain geometry column in which the shape of each individual piece inside a map
## geometry can be a point, line, polygon, or multipolygon

## First Step:
# import shape file
dist = gpd.read_file('../input/india-states/Igismap/Indian_States.shp')
dist.plot()
## On plotting the shape file, we get the map of india having all the states outlined.
dist.head(40)

In [None]:
## The column 'st_nm' is the key. So we need to make sure that the values of the column 'State/UT' in the dataframe 
## matches with the values with the column 'st_nm' in the shape file. We find that there are discrepancies. So we need to fix them first.

## Copy the columns State/UT' and 'Confirmed' of dataframe df to a new dataframe
#df_choropleth = df[['State/UT','Confirmed']].copy()


## Some data manipulation on dataset df1, since this datset is pathetic with highly inconsistent records
df1['State/UT'] = df1['State/UT'].str.replace('Union Territory of Jammu and Kashmir','Jammu & Kashmir')
df1['State/UT'] = df1['State/UT'].str.replace('Jammu and Kashmir','Jammu & Kashmir')
df1['State/UT'] = df1['State/UT'].str.replace('Union Territory of Ladakh', 'Ladakh')
df1['State/UT'] = df1['State/UT'].str.replace('Union Territory of Chandigarh', 'Chandigarh')

## Also I found that the data has certain discrepancies, such as all State/UT records are not present for every date.
## So I had to create a dataset from the df1 dataframe by grouping on the State/UT column and taking all the rows that correspond to the
## maximum Date for that particular State.
## eg:Rajasthan                              2020-04-26
##    Uttar Pradesh                          2020-04-25
df_choropleth = df1.groupby(['State/UT'], sort=False)['Date'].max().reset_index('State/UT')

## Next I need to merge df_choropleth with the original dataframe df1, so that I can get the remaining columns.
df_choropleth = pd.merge(df1, df_choropleth,  how='inner', left_on=['State/UT','Date'], right_on = ['State/UT','Date'])


## change the state names to match with the names in the shape file
df_choropleth['State/UT'] = df_choropleth['State/UT'].str.replace('Andaman and Nicobar Islands', 'Andaman and Nicobar Island')
df_choropleth['State/UT'] = df_choropleth['State/UT'].str.replace('Delhi', 'NCT of Delhi')
df_choropleth['State/UT'] = df_choropleth['State/UT'].str.replace('Jammu and Kashmir', 'Jammu & Kashmir')
df_choropleth['State/UT'] = df_choropleth['State/UT'].str.replace('Ladakh', 'Jammu & Kashmir') ## to match our shape file
df_choropleth['State/UT'] = df_choropleth['State/UT'].str.replace('Telengana', 'Telangana')
df_choropleth['State/UT'] = df_choropleth['State/UT'].str.replace('Arunachal Pradesh', 'Arunanchal Pradesh')

## We may get two rows with the same State/UT name because of the rename just done above. We want to combine them as one and have the
## sum of the Confirmed cases based on State/UT.
df_choropleth[['State/UT','Confirmed']].groupby(['State/UT'],as_index = False).sum()



In [None]:
## https://python-visualization.github.io/folium/quickstart.html

## The first dimension I wanted to show is geolocation.
## I defined my map to initialize at these coordinates.
latitude = 23.00
longitude = 78.98
covid_map = folium.Map(location=[latitude, longitude], min_zoom=4, max_zoom=6, zoom_start=4)

## User defined bin intervals.Bins are generally equidistant.
bin_range = [0, 10000]
bin_input = '10, 50, 100, 300, 600, 1000 , 3000, 5000'
bin_input_parsed = [int(x.strip()) for x in bin_input.split(',')]

bins = [bin_range[0]] + bin_input_parsed + [bin_range[1]]
pd.cut(df_choropleth['Confirmed'], bins)

# NO More Used Block
## The second dimension wanted to show is covid confirmed cases.
## I grouped the 'Confirmed' variable into 4 bins.
#bins  = df_choropleth['Confirmed'].cut([0, 10, 50, 100, 300, 600, 1000, 2000 , 3000, 4000, 5000 ,10000])

## Next, I created a dictionary for the colors of my markers. 
## I decided on a range of primary colors from lightblue (0) to red (3). 
## I wanted the highest quartile (i.e. the most congested cities) to stand out with a bold red, 
## and the lowest quartile (i.e. the least congested and least viable candidates) to more or less fade into the base-map.

# NO More Used Block ends here

folium.Choropleth(dist, ## name of the shape file
                  data = df_choropleth, ## name of datafile
                  key_on='feature.properties.st_nm', ## key feature on the shape file
                  columns=['State/UT', 'Confirmed'], ## corresponding key column(to the shape file) and the data column
                  fill_color= 'YlGnBu',              ## colour scale
                  bins = bins, ## user defined bins and bin-size. 
                  ## We could also specify like this :bins = 9. This would create 9 equal bins. Note that 9 is the maximum number
                  line_weight=0.9, ## The border lines
                  line_opacity=0.5,
                  legend_name='No. of reported cases').add_to(covid_map)

folium.LayerControl().add_to(covid_map)
covid_map



# 3. Data Visualization By Bar charts

In [None]:
# BARPLOTS FOR STATE WISE REPRESENTATION
fig = px.bar(df_choropleth.sort_values('Confirmed', ascending= False).sort_values('Confirmed', ascending=False), 
             color='State/UT',
             x="Confirmed", y="State/UT", 
             title='Total Confirmed Cases Statewise', 
             text='Confirmed', 
             orientation='h', 
             width=800, height=1000,color_discrete_sequence = px.colors.cyclical.Phase)
fig.update_layout(plot_bgcolor='rgb(275, 260, 265)')

fig.show()

In [None]:
# BARPLOTS FOR STATE WISE REPRESENTATION
fig = px.bar(df_choropleth.sort_values('Active', ascending= False).sort_values('Active', ascending=False), 
             color='State/UT',
             x="Active", y="State/UT", 
             title='Total Active Cases Statewise', 
             text='Active', 
             orientation='h', 
             width=800, height=1000,color_discrete_sequence = px.colors.cyclical.Phase)
fig.update_layout(plot_bgcolor='rgb(275, 260, 265)')

fig.show()

In [None]:
# BARPLOTS FOR STATE WISE REPRESENTATION
fig = px.bar(df_choropleth.sort_values('Cured', ascending= False).sort_values('Cured', ascending=False), 
             color='State/UT',
             x="Cured", y="State/UT", 
             title='Total Cured Cases Statewise', 
             text='Cured', 
             orientation='h', 
             width=800, height=1000,color_discrete_sequence = px.colors.cyclical.Phase)
fig.update_layout(plot_bgcolor='rgb(275, 260, 265)')

fig.show()

In [None]:
# BARPLOTS FOR STATE WISE REPRESENTATION
fig = px.bar(df_choropleth.sort_values('Deaths', ascending= False).sort_values('Deaths', ascending=False), 
             color='State/UT',
             x="Deaths", y="State/UT", 
             title='Total Deaths Statewise', 
             text='Cured', 
             orientation='h', 
             width=800, height=1000,color_discrete_sequence = px.colors.cyclical.Phase)
fig.update_layout(plot_bgcolor='rgb(275, 260, 265)')

fig.show()