# COVID-19 BAR CHART RACE OF INDIAN STATES
In this notebook, we're creating a bar chart race of Indian states showing the increasing cases on a day-to-day basis which at the end will tell us what are those states where there is a large number of corona confirmed cases, in other words, what are those states where there is high infection/spread of corona virus.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Installing & importing required libraries

In [None]:
!pip install bar_chart_race

import bar_chart_race as bcr

# Loading required data into dataframe

In [None]:
India_df = pd.read_csv("/kaggle/input/covid19-in-india/covid_19_india.csv")
India_df.head()

# Finding if there is any null value in any column

In [None]:
India_df.info()

# Data Cleaning by removing duplicate rows, unwanted rows, changing states names etc.
In this dataset, we've rows where the data is duplicate so, we are deleting those rows.
Also, while converting dates into datetime format, it is by default taking the first dates(first two digits) as a month so, here we're using 'dayfirst=True' which then considers the first 2 digits of the dates as day & not the month. 

In [None]:
India_df.Date = pd.to_datetime(India_df['Date'], dayfirst=True)
India_df.drop(India_df.index[5091:5126], axis=0, inplace=True)
India_df

If we look at the State/Union Territory names then we've multiple names for the same state. Some having different spellings or combining two states into one. For ex- 'Telangana' as 'Telengana' or "Telengana***" or "Telangana***"; 'Dadra and Nagar Haveli and Daman and Diu' as separately 'Dadar Nagar Haveli' & 'Daman & Diu'. So, we're considering these names as single one.

In [None]:
India_df['State/UnionTerritory'].unique()

In [None]:
India_df['State/UnionTerritory'].replace({"Telengana" : "Telangana", "Telengana***" : "Telangana",
                                          "Telangana***" : "Telangana"}, inplace = True)

India_df['State/UnionTerritory'].replace({"Daman & Diu" : "Dadra and Nagar Haveli and Daman and Diu",
                                          "Dadar Nagar Haveli" : "Dadra and Nagar Haveli and Daman and Diu"},
                                         inplace = True)

There are some rows where the 'State/Union Territory' is not defined properly. So, we are not considering those states for our bar chart race and thus removing them.

In [None]:
India_df = India_df[(India_df['State/UnionTerritory'] != 'Unassigned') &
                    (India_df['State/UnionTerritory'] != 'Cases being reassigned to states')]

Considering only the required columns for the bar chart race and then converting State column values into columns using pivot_table.

In [None]:
India_df = India_df[['Date', 'State/UnionTerritory', 'Confirmed']]
India = pd.pivot_table(India_df, values = 'Confirmed', index = 'Date', columns = 'State/UnionTerritory')

In this dataset, the data starts from the 30th Jan, 2020 and end at 12th Aug, 2020. But, till the Feb month there are only 3 cases & in Kerala only. Therefore, we're taking the dataset from the beginning of March. Filling the null values as 0 means that there are no cases on that date & in that particular state, so, filling it as 0.

In [None]:
India.fillna(0, inplace = True)
India = India[India.index >= '2020-03-01']
India

# Plotting the Bar Chart Race of Indian States

In [None]:
# use filename = 'covid-19.mp4' if you want to download the video.

bcr.bar_chart_race(df = India, title = 'COVID-19 CASES ACROSS INDIA', figsize=(6,4), steps_per_period=10,
                  period_summary_func = lambda v, r: {'x': .98, 'y': .18, 
                                      's': f'Total Cases: {v.sum():,.0f}',
                                      'ha': 'right', 'size': 12, 'family': 'Courier New'})