About

This dataset contains latest Covid-19 India state-wise data as on August 18, 2021. This dataset can be used to analyze covid condition in India. 
This dataset is great for Exploratory Data Analysis

Attribute Information

State/UTs - Names of Indian States and Union Territories.
Total Cases - Total number of confirmed cases
Active - Total number of active cases
Discharged - Total number of discharged cases
Deaths - Total number of deaths
Active Ratio (%) - Ratio of number of active cases to total cases
Discharge Ratio (%) - Ratio of number of discharged cases to total cases
Death Ratio (%) - Ratio of number of deaths to total cases
Source

Link : https://www.mygov.in/covid-19

If you find this dataset useful, please consider upvoting ❤️
EDA by AlexanderPerez 

In [None]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
%matplotlib inline

**Importing the Data from our Csv file **

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
data = pd.read_csv('/kaggle/input/latest-covid19-india-statewise-data/Latest Covid-19 India Status.csv')
data.info()

In [None]:
data['Total Cases']

In [None]:
# Statistical Analisis
data.describe()

**Setting up Our Statistics to find , good insights**

In [None]:
sns.set_style('dark')
plt.figure(figsize = (10, 12))
plt.title('Total cases for each state in India')
sns.barplot(data = data, y = 'State/UTs', x = 'Total Cases')
plt.xlabel('Total Cases (million)')

In [None]:
plt.figure(figsize = (13, 8))
plt.title("Active case vs Total Case trend", fontsize = 16)
sns.regplot(data = data, x = 'Total Cases', y = 'Active', color = 'red')
plt.xlabel('Total Cases (millions)')

We can see from the data that Maharashtra has the highest total number of cases while Andaman and Nicobar has the lowest total number of cases in India

In [None]:
sns.lmplot(data = data, x = 'Total Cases', y = 'Active', hue = 'State/UTs')
plt.title('Active case vs Total case for each states', fontsize = 12)

Number of recoveries VS number of total case

In [None]:
plt.figure(figsize = (13, 8))
plt.title("Discharged vs Total Case trend", fontsize = 16)
sns.regplot(data = data, x = 'Total Cases', y = 'Discharged', color = 'green')
plt.xlabel('Total Cases (millions)')

In [None]:
high_total_case = data[data['Total Cases'] == max(data['Total Cases'])]
high_total_case

In [None]:
low_total_case = data[data['Total Cases'] == min(data['Total Cases'])]
low_total_case

10 States with most cases in India

10 States with minimum active  Cases 

In [None]:
df1 = data.sort_values(by='Active', ascending=False).head(10)
states = df1['State/UTs']
cases = df1['Active']
plt.barh(states,cases, color = 'red')
plt.xlabel('Active Cases')
plt.ylabel('State')
plt.title('State with  more Active Cases in India')
plt.show()

Top 10 states with max number of covid death cases

In [None]:
df2 = data.sort_values(by='Active').head(10)
states = df2['State/UTs']
cases = df2['Active']
plt.barh(states,cases, color = 'green')
plt.xlabel('Active Cases')
plt.ylabel('State')
plt.title('State with minimum Active Cases in India')
plt.show()

In [None]:
df2 = data.sort_values(by='Death Ratio (%)',ascending=False)
df2

In [None]:
px.line(data, x='State/UTs', y='Death Ratio (%)')

In [None]:
px.scatter_3d(data, x='State/UTs', z='Total Cases', y='Active', color = 'Active', width=700, height=600)

n the above 3D plot Maharastra may have the maximum number of covid cases but Kerala has the maximum number of active cases.

Punjab is the place with the most Deaths along the numbers

In [None]:
px.bar_polar(data, r="Death Ratio (%)", theta="State/UTs", color="Deaths",
              title="Deaths vs Death Ratio Visualisation"
            )

Discharge ratio vs Total Dischaege 

In [None]:
px.bar_polar(data, r="Discharge Ratio (%)", theta="State/UTs", color="Discharged",
                   title="Discharged vs Discharge Ratio Visualization"
                  )

In [None]:
barFig = px.scatter(data, x="State/UTs", y="Total Cases", color="Discharge Ratio (%)")
# Rotate labels 45 degrees
barFig.update_layout(xaxis_tickangle=-90)

Visualizing each states on the basis of Active cases, Discharge Ratio and Total Cases

In [None]:
px.scatter_3d(data, x='State/UTs', z='Active', y='Discharge Ratio (%)', color ='Total Cases', width=700, height=600)

In [None]:
px.scatter(data, x="State/UTs", y="Death Ratio (%)", color="Discharge Ratio (%)")

Active VS Deaths

In [None]:
barFig = px.scatter(data, x="State/UTs", y="Active", color="Deaths")
# Rotate labels 45 degrees
barFig.update_layout(xaxis_tickangle=-90)

In [None]:
#High Correlation between Total Cases and recovery(Discharged); Lowest Correlation between Active cases and Deaths
sns.heatmap(data.corr())

Conclusions :

Active Ratio is codependant to Discharge Ratio
- Place with more Deaths is Punjab
- Place with less Deaths is Dadra and Nagar Haveli and Daman and Diu
- In the above 3D plot Maharastra may have the maximum number of covid cases but Kerala has the maximum number of active cases.
- Total cases has a relation with the death cases.
- Jharkhand is the place with less active cases