In [1]:
'''
Ignore Warnings to prevent cluttering the output
'''
import warnings 
warnings.filterwarnings('ignore')

import pandas as pd 
import matplotlib.pyplot as plt  
import seaborn as sns  
import statsmodels.api as sm

'''
Importing all functions from scipy.stats for statistical analysis
'''
from scipy.stats import *  

'''
Importing timedelta and datetime for handling time-related operations
'''
from datetime import timedelta
import datetime  

'''
This is to enable inline plotting in matplotlib
'''
%matplotlib inline

'''
This line is to set the seaborn plot style to dark
which is a built-in style in seaborn
'''
sns.set_style('dark')  


This code snippet performs several important tasks to enable data analysis and visualization:

1. **Ignoring Warnings**: The `warnings.filterwarnings('ignore')` line ensures that warning messages are suppressed, which helps in keeping the output clean and focused.

2. **Importing Libraries**:
   - `pandas`: Used for data manipulation and analysis.
   - `matplotlib.pyplot`: Enables plotting functionalities.
   - `seaborn`: Provides high-level interface for creating informative statistical graphics.
   - `statsmodels.api`: Used for statistical analysis and modeling.

3. **Importing Functions**: 
   - The line `from scipy.stats import *` imports all functions from the `scipy.stats` module, facilitating statistical analysis.

4. **Handling Time-related Operations**:
   - `timedelta` from `datetime` module: Used for arithmetic operations on dates and times.
   - `datetime`: Provides classes for manipulating dates and times.

5. **Enabling Inline Plotting**: `%matplotlib inline` command ensures that plots generated using `matplotlib` are displayed inline within Jupyter notebooks or IPython environments.

6. **Setting Plot Style**:
   - `sns.set_style('dark')`: Sets the default style for seaborn plots to a dark background, enhancing visual appeal and readability of plots.

In [2]:
'''
Reading the dataset from the provided URL using pandas
This dataset contains daily Covid-19 data for different states in India
'''
train = pd.read_csv('https://api.covid19india.org/csv/latest/state_wise_daily.csv')

'''
Converting the 'Date' column to datetime format for better handling of dates
'''
train['Date'] = pd.to_datetime(train['Date'], format="%d-%b-%y")

'''
Displaying the last few rows of the dataset to verify the changes
'''
train.tail()


Unnamed: 0,Date,Date_YMD,Status,TT,AN,AP,AR,AS,BR,CH,...,PB,RJ,SK,TN,TG,TR,UP,UT,WB,UN
1786,2021-10-30,2021-10-30,Recovered,14672,1,535,10,371,2,2,...,27,1,3,1172,191,8,9,6,880,0
1787,2021-10-30,2021-10-30,Deceased,445,0,2,0,4,0,0,...,1,0,0,14,1,0,0,0,13,0
1788,2021-10-31,2021-10-31,Confirmed,12907,0,385,1,212,8,5,...,26,2,21,1009,121,12,6,5,914,0
1789,2021-10-31,2021-10-31,Recovered,13152,0,675,9,236,9,3,...,25,2,8,1183,183,2,6,9,913,0
1790,2021-10-31,2021-10-31,Deceased,251,0,4,0,1,0,0,...,1,0,1,19,1,0,0,0,15,0


1. **We read the dataset** from the provided URL, which contains daily Covid-19 data for different states in India, and store it in the variable `train`.

2. We **convert** the '`Date`' column in the dataset to *datetime* format using `pd.to_datetime()` for better handling of dates.

3. We display the **last few rows** of the dataset using `train.tail()` to verify the changes made.


In [6]:
'''
List of state-wise columns to be dropped as we are predicting total cases ('TT')
'''
cols = ['AN','AP',	'AR',	'AS',	'BR',	'CH',	'CT',	'DD',	'DL',	'DN',	'GA',	'GJ',	'HP',	'HR',	'JH', 'JK',	'KA',	'KL',	'LA',	'LD',	'MH',	'ML',	'MN',	'MP',	'MZ',	'NL',	'OR',	'PB',	'PY',	'RJ',	'SK',	'TG',	'TN',	'TR',	'UP',	'UT',	'WB']

'''
Dropping state-wise columns from the dataset
'''
train.drop(cols, axis=1, inplace=True)

'''
Setting the index of the dataframe to 'Status' column
'''
train = train.set_index('Status')

'''
Dropping 'Recovered' and 'Deceased' rows as we are focusing on total cases
'''
train.drop(['Recovered', 'Deceased'], inplace=True)

'''
Resetting the index after dropping unnecessary rows
'''
train = train.reset_index()

'''
Dropping the 'Status' column as it is no longer needed for analysis
'''
train.drop(["Status"], axis=1, inplace=True)
train.tail()

'''
Copying data from the 'train' DataFrame to the 'train_df' DataFrame to preserve the original dataset
'''
train_df = train
train_df.head()

KeyError: "['AN', 'AP', 'AR', 'AS', 'BR', 'CH', 'CT', 'DN', 'DD', 'DL', 'GA', 'GJ', 'HR', 'HP', 'JK', 'JH', 'KA', 'KL', 'LA', 'LD', 'MP', 'MH', 'MN', 'ML', 'MZ', 'NL', 'OR', 'PY', 'PB', 'RJ', 'SK', 'TN', 'TG', 'TR', 'UP', 'UT', 'WB'] not found in axis"

1. **We define a list** `cols` containing the names of state-wise columns to be dropped.
2. We use `drop()` method along `axis=1` to **drop the specified columns** from the `train` dataframe in-place.
3. We **set the index of the dataframe** to the 'Status' column, assuming it contains categories like 'Confirmed', 'Recovered', and 'Deceased'.
4. We **drop rows corresponding to** 'Recovered' and 'Deceased' status to focus on the total cases ('TT').
5. We **reset the index to default** after dropping rows to avoid any index inconsistencies.
6. We **drop the** `Status` column as it is no longer needed for further analysis.
7. Finally, **we create a copy of** `train` to `train_df` to preserve the original data and be able to work on the dataframe.