# 🦠 COVID-19 Global Data Tracker

This project analyzes global COVID-19 data from [Our World in Data](https://ourworldindata.org/coronavirus), focusing on confirmed cases, deaths, and vaccination trends over time across countries.

---


## 📥 Step 1 & 2: Data Collection and Loading

We load the dataset from OWID and explore the structure to understand its contents.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Configure plot styles
sns.set(style='whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

# Load the dataset
data_path = "data/owid-covid-data.csv"
df = pd.read_csv(data_path)

# Preview the data
df.head()


In [None]:
# Check shape and column names
print("Shape:", df.shape)
df.columns


In [None]:
# Check for missing values
df.isnull().sum().sort_values(ascending=False).head(20)


## 🧹 Step 3: Data Cleaning

We filter relevant countries and handle missing values to prepare for analysis.


In [None]:
# Focus on a few countries
countries = ['Kenya', 'United States', 'India', 'Brazil', 'South Africa']
df = df[df['location'].isin(countries)]

# Convert date column
df['date'] = pd.to_datetime(df['date'])

# Drop rows with missing critical values
df = df.dropna(subset=['total_cases', 'total_deaths'])

# Fill or interpolate missing numeric values
df = df.fillna(method='ffill')


## 📊 Step 4: Exploratory Data Analysis (EDA)

Let's explore COVID-19 cases and deaths over time.


In [None]:
# Plot total cases over time
for country in countries:
    subset = df[df['location'] == country]
    plt.plot(subset['date'], subset['total_cases'], label=country)

plt.title('Total COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.legend()
plt.show()


In [None]:
# Plot total deaths over time
for country in countries:
    subset = df[df['location'] == country]
    plt.plot(subset['date'], subset['total_deaths'], label=country)

plt.title('Total COVID-19 Deaths Over Time')
plt.xlabel('Date')
plt.ylabel('Total Deaths')
plt.legend()
plt.show()


In [None]:
# Daily new cases comparison
for country in countries:
    subset = df[df['location'] == country]
    plt.plot(subset['date'], subset['new_cases'], label=country)

plt.title('Daily New COVID-19 Cases')
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.legend()
plt.show()


In [None]:
# Calculate death rate
df['death_rate'] = df['total_deaths'] / df['total_cases']
df[['location', 'date', 'death_rate']].groupby('location').last()


## 💉 Step 5: Visualizing Vaccination Progress

We now explore vaccination trends by country.


In [None]:
# Plot total vaccinations over time
for country in countries:
    subset = df[df['location'] == country]
    plt.plot(subset['date'], subset['total_vaccinations'], label=country)

plt.title('Cumulative COVID-19 Vaccinations')
plt.xlabel('Date')
plt.ylabel('Total Vaccinations')
plt.legend()
plt.show()


## 🧠 Step 6: Key Insights

Here are some key observations:

1. The United States had the highest total number of confirmed cases.
2. India and the US showed large spikes in daily new cases, especially during major waves.
3. Vaccination rollouts were fastest in the US and Brazil among the selected countries.
4. Death rates were highest in countries with slower early vaccination rates.
5. South Africa's trends showed multiple distinct waves of infections.

---
