# COVID-19 Global Data Tracker

This Jupyter Notebook analyzes global COVID-19 trends using the Our World in Data dataset. It includes data loading, cleaning, exploratory data analysis (EDA), visualizations, and key insights.

## Objectives
- Load and clean the COVID-19 dataset.
- Analyze trends in cases, deaths, and vaccinations.
- Compare metrics across countries (Kenya, USA, India).
- Visualize trends with charts and a choropleth map.
- Summarize findings in a clear narrative.

In [2]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import numpy as np

# Set seaborn style for better visuals
sns.set_style('whitegrid')

## 1. Data Loading & Exploration

Load the dataset and inspect its structure.

## 2. Data Cleaning

Filter for Kenya, USA, and India, convert dates, and handle missing values.

In [None]:
try:
    # Filter for specific countries
    countries = ['Kenya', 'United States', 'India']
    df = df[df['location'].isin(countries)]

    # Convert date to datetime
    df['date'] = pd.to_datetime(df['date'])

    # Select relevant columns
    columns = ['date', 'location', 'total_cases', 'new_cases', 'total_deaths', 'new_deaths', 
               'total_vaccinations', 'people_vaccinated', 'population']
    df = df[columns]

    # Handle missing values
    # Fill numeric columns with 0 for initial cases/deaths/vaccinations
    numeric_cols = ['total_cases', 'new_cases', 'total_deaths', 'new_deaths', 'total_vaccinations', 'people_vaccinated']
    df[numeric_cols] = df[numeric_cols].fillna(0)

    # Interpolate missing population data
    df['population'] = df.groupby('location')['population'].fillna(method='ffill').fillna(method='bfill')

    print('\nMissing Values After Cleaning:')
    print(df.isnull().sum())

    # Display cleaned data preview
    print('\nCleaned Data Preview:')
    display(df.head())

except Exception as e:
    print(f'Error during cleaning: {str(e)}')

## 3. Exploratory Data Analysis (EDA)

Analyze trends and compute key metrics.

In [None]:
try:
    # Calculate death rate
    df['death_rate'] = df['total_deaths'] / df['total_cases'].replace(0, np.nan) * 100

    # Calculate vaccination rate (% population vaccinated)
    df['vaccination_rate'] = df['people_vaccinated'] / df['population'] * 100

    # Basic statistics
    print('\nBasic Statistics:')
    display(df[['total_cases', 'total_deaths', 'total_vaccinations', 'death_rate', 'vaccination_rate']].describe())

    # Group by country for latest data
    latest_data = df.groupby('location').last().reset_index()
    print('\nLatest Metrics by Country:')
    display(latest_data[['location', 'total_cases', 'total_deaths', 'death_rate', 'vaccination_rate']])

except Exception as e:
    print(f'Error during EDA: {str(e)}')

## 4. Visualizations

Create line charts, bar charts, a heatmap, and a choropleth map.

In [None]:
# Visualization 1: Line Chart - Total Cases Over Time
try:
    plt.figure(figsize=(12, 6))
    for country in countries:
        country_data = df[df['location'] == country]
        plt.plot(country_data['date'], country_data['total_cases'], label=country)
    plt.title('Total COVID-19 Cases Over Time')
    plt.xlabel('Date')
    plt.ylabel('Total Cases')
    plt.legend()
    plt.savefig('total_cases_line.png')
    plt.close()

except Exception as e:
    print(f'Error in line chart: {str(e)}')

# Visualization 2: Bar Chart - Total Cases by Country (Latest)
try:
    plt.figure(figsize=(8, 6))
    sns.barplot(x='location', y='total_cases', data=latest_data)
    plt.title('Total COVID-19 Cases by Country (Latest)')
    plt.xlabel('Country')
    plt.ylabel('Total Cases')
    plt.savefig('total_cases_bar.png')
    plt.close()

except Exception as e:
    print(f'Error in bar chart: {str(e)}')

# Visualization 3: Heatmap - Correlation Matrix
try:
    plt.figure(figsize=(10, 8))
    corr = df[['total_cases', 'total_deaths', 'total_vaccinations', 'death_rate', 'vaccination_rate']].corr()
    sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
    plt.title('Correlation Matrix of COVID-19 Metrics')
    plt.savefig('correlation_heatmap.png')
    plt.close()

except Exception as e:
    print(f'Error in heatmap: {str(e)}')

# Visualization 4: Choropleth Map - Total Cases by Country
try:
    # Prepare data for latest date
    latest_df = df[df['date'] == df['date'].max()][['location', 'iso_code', 'total_cases']]
    fig = px.choropleth(latest_df, 
                        locations='iso_code',
                        color='total_cases',
                        hover_name='location',
                        color_continuous_scale=px.colors.sequential.Plasma,
                        title='Total COVID-19 Cases by Country (Latest)')
    fig.write_to_file('total_cases_choropleth.html')
    print('Choropleth map saved as total_cases_choropleth.html')

except Exception as e:
    print(f'Error in choropleth map: {str(e)}')

## 5. Vaccination Progress

Visualize vaccination trends.

In [None]:
try:
    # Line Chart - Vaccination Rate Over Time
    plt.figure(figsize=(12, 6))
    for country in countries:
        country_data = df[df['location'] == country]
        plt.plot(country_data['date'], country_data['vaccination_rate'], label=country)
    plt.title('Vaccination Rate (% Population) Over Time')
    plt.xlabel('Date')
    plt.ylabel('Vaccination Rate (%)')
    plt.legend()
    plt.savefig('vaccination_rate_line.png')
    plt.close()

except Exception as e:
    print(f'Error in vaccination chart: {str(e)}')

## 6. Insights & Findings

### Key Insights
1. **Case Trends**: The USA has significantly higher total cases than Kenya and India, likely due to its larger population and earlier pandemic impact.
2. **Death Rates**: India shows a higher death rate in some periods, possibly due to healthcare system strain.
3. **Vaccination Progress**: The USA leads in vaccination rates, followed by India, while Kenya lags, reflecting differences in vaccine access.
4. **Correlations**: Total cases and deaths are strongly correlated, but vaccination rates show weaker correlation with case reduction.
5. **Anomaly**: Sudden spikes in new cases in India suggest data reporting issues or major outbreaks.

This notebook provides a comprehensive analysis of COVID-19 trends, with reproducible code and clear visualizations.

## 7. Exporting to PDF

To export this notebook to PDF:
- Use Jupyter: File > Download as > PDF via LaTeX.
- Or install `nbconvert` and run: `jupyter nbconvert --to pdf covid_data_tracker.ipynb`.