<a href="https://www.kaggle.com/code/kirtimathur/india-covid19-eda?scriptVersionId=116108145" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Basic Understanding of data

#### Loading the data set and checking its size

In [None]:
df=pd.read_csv("/kaggle/input/covid19-india-status/latest Covid-19 India Status1.csv")
df.shape

#### Checking top 5 rows of the data

In [None]:
df.head()

#### Basic information about data

In [None]:
df.info()

#### Checking Null values sum for columns of the dataset

In [None]:
df.isnull().sum()

- There are no missing values.

#### Unique values and total number of unique values for all columns

In [None]:
for i in df.columns:
    print(i,"---------",df[i].unique(),"----------",df[i].nunique())

# Feature Engineering

#### Recovered cases, that is total cases after subtracting the sum of active and dead cases

In [None]:
recovered_cases= df['Total Cases'] - (df['Active'] + df['Deaths'])
df["recovered_cases"]=pd.Series(recovered_cases)

#### Recovered ratio, that is recovered cases with respect to total cases

In [None]:
recovered_ratio= df["recovered_cases"]/df['Total Cases']*100
df["recovered_ratio(%)"]=pd.Series(recovered_ratio)

In [None]:
df.head()

# Exploratory Data Analyis (EDA)

#### Total cases of each state and union territory

In [None]:
k=df.groupby(["State/UTs"])["Total Cases"].agg(["max","min","sum"]).sort_values(by="max",ascending=False)
k.reset_index(inplace=True)
k

In [None]:
plt.figure(figsize=(12,6))
sns.barplot(k["State/UTs"],k["sum"],data=df)
plt.xticks(rotation=90)
plt.show()

- 1e6 means 1*10^6.
- Highest number of cases were in Maharashtra.
- Lowest number of cases were in Andaman and Nicobar islands.

#### Top 10 affected states

In [None]:
plt.figure(figsize=(12,6))
sns.barplot(k["State/UTs"].head(10),k["sum"].head(10),data=df)
plt.xticks(rotation=90)
plt.show()

#### Bottom 10 affected states

In [None]:
plt.figure(figsize=(12,6))
sns.barplot(k["State/UTs"].tail(10),k["sum"].tail(10),data=df);
plt.xticks(rotation=90)
plt.show()

#### Percent distribution of total cases with respect to top 10 states

In [None]:
plt.figure(figsize=(24,12))
plt.pie(x=k["sum"].head(10),data=df,labels=k["State/UTs"].head(10),autopct="%1f%%");
plt.title("Top 10 States",fontsize=20)
plt.show()

#### Most affected and least affected states in terms of active ratio, that is active cases with respect to total cases.

In [None]:
plt.figure(figsize=(12,6))
plt.plot(df["State/UTs"],df["Active Ratio (%)"],color="red",marker="o")
plt.title("States and active cases",fontsize=20)
plt.xticks(rotation=90);

- Most active cases with respect to total cases at that time were at Mizoram, Arunachal Pradesh, Meghalaya, Manipur, Tripura, Assam and Sikkim, which all the eastern states of India.
- So, east of India was caught by this time by Corona virus.
- States in the north and middle India like Gujarat, Delhi, Haryana, Madhya Pradesh, and Bihar had the least active cases. 
- States in the south also had some active cases.

#### Most affected and least affected states in terms of death ratio, that is deaths with respect to total cases

In [None]:
plt.figure(figsize=(12,6))
plt.plot(df["State/UTs"],df["Death Ratio (%)"],color="red",marker="o")
plt.title("Top 10 States of death cases",fontsize=20)
plt.xticks(rotation=90);

- Most deaths with respect to total number of cases happened in Punjab.
- Least were in Dadra and Nagar Haveli and Daman and Diu.

#### State wise average recovery rate

In [None]:
plt.figure(figsize=(12,6))
sns.barplot(df["State/UTs"],df["recovered_cases"],data=df)
plt.xticks(rotation=90);

- Maharashtra had highest average recovery rate.
- Andaman and Nicobar had the least. 

In [None]:
df.groupby(["State/UTs"])["recovered_cases"].mean().sort_values(ascending=False)

- Maharashtra had highest average recovery rate corresponding to 5881167 recovered cases.
- Andaman and Nicobar had the least corresponding to 7349 recovered cases. 

#### Recovered ratio of each state

In [None]:
df.groupby(["State/UTs"])["recovered_ratio(%)"].mean().sort_values(ascending=False)

- Overall, recovered ratio, that is the recovery rate against the virus is fairly good in India. 
- Most states had more the 96% recovery rate. 

#### Trend of deaths, actives and recovered cases state wise

In [None]:
df.groupby(["Active","Deaths","recovered_cases"])["State/UTs"].sum()

In [None]:
plt.figure(figsize=(12,6))
plt.plot(df["State/UTs"],df["Active Ratio (%)"],color="indigo",marker="o",label="Active Ratio")
plt.plot(df["State/UTs"],df["Death Ratio (%)"],color="indianred",marker="o",label="Death Ratio")
plt.plot(df["State/UTs"],df["recovered_ratio(%)"],color="green",marker="o",label="Recovered Ratio")
plt.title("Trend of deaths, actives and recovered ratio state wise",fontsize=20)
plt.legend()
plt.xticks(rotation=90);

- There was a significant ratio difference in active and death ratio with respect to recovered ratio. 
- So, it can be infered that with respect to total cases, recovery for most states was highly appreciable. 