# COVID-19 Global Data Tracker

This Jupyter Notebook analyzes global COVID-19 trends, including cases, deaths, and vaccinations, using the **Our World in Data COVID-19 dataset**. The project involves data cleaning, exploratory data analysis (EDA), visualizations, and a narrative report summarizing key insights.

## Objectives
- Import and clean COVID-19 global data.
- Analyze time trends for cases, deaths, and vaccinations.
- Compare metrics across countries.
- Visualize trends using charts and maps.
- Summarize findings in a clear, reproducible report.

## Prerequisites
- **Dataset**: Download `owid-covid-data.csv` from [Our World in Data](https://ourworldindata.org/covid-deaths) or directly from [this GitHub link](https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv).
- Save the CSV file in the same directory as this notebook.
- **Libraries**: Ensure `pandas`, `matplotlib`, `seaborn`, and `plotly` are installed. Install them using:
  ```bash
  pip install pandas matplotlib seaborn plotly
  ```

Let's begin!

## Step 1: Data Loading & Exploration

Load the dataset using pandas and explore its structure.

In [None]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Load the dataset
df = pd.read_csv('owid-covid-data.csv')

# Preview the first few rows
print("Dataset Preview:")
print(df.head())

# Display column names
print("\nColumns:")
print(df.columns.tolist())

# Check for missing values
print("\nMissing Values:")
print(df.isnull().sum())

# Display basic info
print("\nDataset Info:")
print(df.info())

## Step 2: Data Cleaning

Clean the dataset by filtering relevant countries, handling missing values, and converting data types.

In [None]:
# Select countries of interest
countries = ['Kenya', 'United States', 'India']
df_filtered = df[df['location'].isin(countries)]

# Convert date column to datetime
df_filtered['date'] = pd.to_datetime(df_filtered['date'])

# Handle missing values for critical columns
df_filtered['total_cases'] = df_filtered['total_cases'].fillna(0)
df_filtered['total_deaths'] = df_filtered['total_deaths'].fillna(0)
df_filtered['new_cases'] = df_filtered['new_cases'].fillna(0)
df_filtered['total_vaccinations'] = df_filtered['total_vaccinations'].interpolate(method='linear', limit_direction='forward').fillna(0)

# Verify cleaning
print("Cleaned Dataset Preview:")
print(df_filtered[['date', 'location', 'total_cases', 'total_deaths', 'total_vaccinations']].head())
print("\nMissing Values After Cleaning:")
print(df_filtered.isnull().sum())

## Step 3: Exploratory Data Analysis (EDA)

Analyze trends in cases, deaths, and calculate death rates.

In [None]:
# Calculate death rate (total_deaths / total_cases)
df_filtered['death_rate'] = df_filtered['total_deaths'] / df_filtered['total_cases']
df_filtered['death_rate'] = df_filtered['death_rate'].fillna(0)

# Summary statistics for selected countries
print("Summary Statistics:")
print(df_filtered.groupby('location')[['total_cases', 'total_deaths', 'death_rate']].max())

# Plot total cases over time
plt.figure(figsize=(12, 6))
for country in countries:
    country_data = df_filtered[df_filtered['location'] == country]
    plt.plot(country_data['date'], country_data['total_cases'], label=country)
plt.title('Total COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.legend()
plt.grid(True)
plt.show()

# Plot total deaths over time
plt.figure(figsize=(12, 6))
for country in countries:
    country_data = df_filtered[df_filtered['location'] == country]
    plt.plot(country_data['date'], country_data['total_deaths'], label=country)
plt.title('Total COVID-19 Deaths Over Time')
plt.xlabel('Date')
plt.ylabel('Total Deaths')
plt.legend()
plt.grid(True)
plt.show()

## Step 4: Visualizing Vaccination Progress

Analyze and visualize vaccination rollouts.

In [None]:
# Plot total vaccinations over time
plt.figure(figsize=(12, 6))
for country in countries:
    country_data = df_filtered[df_filtered['location'] == country]
    plt.plot(country_data['date'], country_data['total_vaccinations'], label=country)
plt.title('Total COVID-19 Vaccinations Over Time')
plt.xlabel('Date')
plt.ylabel('Total Vaccinations')
plt.legend()
plt.grid(True)
plt.show()

# Calculate vaccination rate (total_vaccinations / population)
df_filtered['vaccination_rate'] = df_filtered['total_vaccinations'] / df_filtered['population'] * 100
df_filtered['vaccination_rate'] = df_filtered['vaccination_rate'].fillna(0)

# Bar chart for vaccination rates (latest data)
latest_data = df_filtered[df_filtered['date'] == df_filtered['date'].max()]
plt.figure(figsize=(8, 5))
sns.barplot(x='location', y='vaccination_rate', data=latest_data)
plt.title('Vaccination Rate (% of Population)')
plt.xlabel('Country')
plt.ylabel('Vaccination Rate (%)')
plt.show()

## Step 5: Choropleth Map

Visualize total cases by country on a world map using Plotly.

In [None]:
# Prepare data for choropleth (latest date, all countries)
latest_df = df[df['date'] == df['date'].max()][['iso_code', 'location', 'total_cases']]
latest_df = latest_df.dropna(subset=['iso_code', 'total_cases'])

# Create choropleth map
fig = px.choropleth(
    latest_df,
    locations='iso_code',
    color='total_cases',
    hover_name='location',
    color_continuous_scale=px.colors.sequential.Plasma,
    title='Global COVID-19 Total Cases (Latest Date)'
)
fig.show()

## Step 6: Insights & Narrative

### Key Insights
1. **Case Trends**: The United States reported the highest total cases among the selected countries, followed by India, with Kenya showing significantly lower cases, likely due to population differences.
2. **Death Rates**: The death rate (total deaths / total cases) varied significantly, with [Country X] showing the highest rate, indicating potential differences in healthcare systems or reporting.
3. **Vaccination Rollout**: [Country Y] achieved the fastest vaccination rollout, with [Z%] of its population vaccinated by the latest date, compared to slower progress in [Country Z].
4. **Anomalies**: Spikes in new cases were observed in [Country X] around [specific date], possibly due to policy changes or new variants.
5. **Global Perspective**: The choropleth map highlights high case densities in North America and parts of Asia, with lower cases in Africa, reflecting testing disparities.

### Conclusion
This analysis reveals significant variations in COVID-19 impacts across countries, driven by factors like population, healthcare infrastructure, and vaccination efforts. The visualizations and metrics provide a clear picture of global trends, suitable for policymakers or public health researchers.

### Future Work
- Incorporate hospitalization or ICU data for deeper insights.
- Build an interactive dashboard using Streamlit for user-driven exploration.
- Analyze the impact of specific policies (e.g., lockdowns) on case trends.

## Submission Instructions

1. Save this notebook as `covid19_global_tracker.ipynb`.
2. Ensure `owid-covid-data.csv` is in the same directory or update the file path in the code.
3. Upload the notebook to your GitHub repository via the GitHub website:
   - Go to your repository.
   - Click **Add file** > **Create new file** or **Upload files**.
   - If creating a new file, paste the entire notebook content (JSON format) and name it `covid19_global_tracker.ipynb`.
   - If uploading, upload the `.ipynb` file and the CSV dataset.
4. Submit the repository link or notebook file to the assignment platform as required.