### **COVID-19 Data Analysis**
#### A Country-wise Data Analysis of the COVID-19 Pandemic

##### **Step 1:** Importing the necessary libraries

In [4]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

##### **Step 2:** Loading the Data

In [24]:
df_raw = pd.read_csv(filepath_or_buffer="./data/country_wise_latest.csv")

##### **Step 3:** Exploring the Data

##### **3.1:** Overview of Data

In [38]:
# Overview of data
print("Dataset Shape:", df_raw.shape)
print("\nColumn Names:\n", df_raw.columns.to_list)
print("\nData Types:\n", df_raw.dtypes)

Dataset Shape: (187, 15)

Column Names:
 <bound method IndexOpsMixin.tolist of Index(['Country/Region', 'Confirmed', 'Deaths', 'Recovered', 'Active',
       'New cases', 'New deaths', 'New recovered', 'Deaths / 100 Cases',
       'Recovered / 100 Cases', 'Deaths / 100 Recovered',
       'Confirmed last week', '1 week change', '1 week % increase',
       'WHO Region'],
      dtype='object')>

Data Types:
 Country/Region             object
Confirmed                   int64
Deaths                      int64
Recovered                   int64
Active                      int64
New cases                   int64
New deaths                  int64
New recovered               int64
Deaths / 100 Cases        float64
Recovered / 100 Cases     float64
Deaths / 100 Recovered    float64
Confirmed last week         int64
1 week change               int64
1 week % increase         float64
WHO Region                 object
dtype: object


##### **3.2:** Quick Peek of Data

In [46]:
# First and last few rows
print("First 5 rows:")
print(df_raw.head())

print("\nLast 5 rows:")
print(df_raw.tail())

print("\nRandom 5 rows:")
print(df_raw.sample(5))

First 5 rows:
  Country/Region  Confirmed  Deaths  Recovered  Active  New cases  New deaths  \
0    Afghanistan      36263    1269      25198    9796        106          10   
1        Albania       4880     144       2745    1991        117           6   
2        Algeria      27973    1163      18837    7973        616           8   
3        Andorra        907      52        803      52         10           0   
4         Angola        950      41        242     667         18           1   

   New recovered  Deaths / 100 Cases  Recovered / 100 Cases  \
0             18                3.50                  69.49   
1             63                2.95                  56.25   
2            749                4.16                  67.34   
3              0                5.73                  88.53   
4              0                4.32                  25.47   

   Deaths / 100 Recovered  Confirmed last week  1 week change  \
0                    5.04                35526         

##### **3.3:** Quality Check of Data

In [48]:
# Checking for missing values

print("Missing values per column:")
missing = df_raw.isna().sum()
print(missing[missing > 0])

print("\nMissing value percentages:")
missing_percentage = (df_raw.isna().sum() / len(df_raw)) * 100
print(missing_percentage[missing_percentage > 0])

Missing values per column:
Series([], dtype: int64)

Missing value percentages:
Series([], dtype: float64)
