# COVID-19 Global Data Tracker
## Introduction
This Jupyter Notebook analyzes global COVID-19 trends using data from **Our World in Data**. The project objectives are to:
- Fetch and clean the COVID-19 dataset.
- Analyze trends in cases, deaths, and vaccinations.
- Compare metrics across countries (Kenya, USA, India).
- Visualize trends with charts and a choropleth map.
- Summarize findings in a clear narrative.

The dataset is fetched directly from the URL: `https://covid.ourworldindata.org/data/owid-covid-data.csv`.

## Step 1: Import Libraries
We use `pandas` for data manipulation, `matplotlib` and `seaborn` for visualizations, and `plotly` for the choropleth map.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')  # Suppress warnings for cleaner output

# Set plot style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

## Step 2: Data Collection and Loading
**Goal**: Fetch the dataset directly from the URL and explore its structure.

In [None]:
# Fetch data from Our World in Data
url = "https://covid.ourworldindata.org/data/owid-covid-data.csv"
df = pd.read_csv(url)

# Explore the dataset
print("Columns:", df.columns.tolist())
print("\nFirst 5 rows:\n", df.head())
print("\nMissing values:\n", df.isnull().sum())
print("\nDataset info:\n")
df.info()

## Step 3: Data Cleaning
**Goal**: Prepare the dataset for analysis by filtering countries, converting data types, and handling missing values.

In [None]:
# Filter for specific countries
countries = ['Kenya', 'United States', 'India']
df_filtered = df[df['location'].isin(countries)].copy()

# Convert date to datetime
df_filtered['date'] = pd.to_datetime(df_filtered['date'])

# Handle missing values
df_filtered['total_cases'] = df_filtered['total_cases'].fillna(0)
df_filtered['total_deaths'] = df_filtered['total_deaths'].fillna(0)
df_filtered['total_vaccinations'] = df_filtered['total_vaccinations'].interpolate(method='linear', limit_direction='forward')
df_filtered['people_fully_vaccinated'] = df_filtered['people_fully_vaccinated'].interpolate(method='linear', limit_direction='forward')

# Drop rows with missing critical fields
df_filtered = df_filtered.dropna(subset=['date', 'location'])

# Verify cleaning
print("Missing values after cleaning:\n", df_filtered.isnull().sum())

## Step 4: Exploratory Data Analysis (EDA)
**Goal**: Analyze trends in cases, deaths, and calculate death rates.

### Total Cases Over Time

In [None]:
# Plot total cases over time
for country in countries:
    country_data = df_filtered[df_filtered['location'] == country]
    plt.plot(country_data['date'], country_data['total_cases'], label=country)
plt.title('Total COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('total_cases.png')
plt.show()

### Total Deaths Over Time

In [None]:
# Plot total deaths over time
for country in countries:
    country_data = df_filtered[df_filtered['location'] == country]
    plt.plot(country_data['date'], country_data['total_deaths'], label=country)
plt.title('Total COVID-19 Deaths Over Time')
plt.xlabel('Date')
plt.ylabel('Total Deaths')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('total_deaths.png')
plt.show()

### Death Rate

In [None]:
# Calculate death rate
df_filtered['death_rate'] = df_filtered['total_deaths'] / df_filtered['total_cases']
death_rate_summary = df_filtered.groupby('location')['death_rate'].mean()
print("Death Rate Summary (Mean):\n", death_rate_summary)

# Bar chart for death rate
plt.bar(death_rate_summary.index, death_rate_summary.values)
plt.title('Mean Death Rate by Country')
plt.xlabel('Country')
plt.ylabel('Death Rate')
plt.tight_layout()
plt.savefig('death_rate.png')
plt.show()

## Step 5: Vaccination Progress
**Goal**: Visualize vaccination trends and calculate % vaccinated.

### Cumulative Vaccinations

In [None]:
# Plot cumulative vaccinations
for country in countries:
    country_data = df_filtered[df_filtered['location'] == country]
    plt.plot(country_data['date'], country_data['total_vaccinations'], label=country)
plt.title('Cumulative COVID-19 Vaccinations Over Time')
plt.xlabel('Date')
plt.ylabel('Total Vaccinations')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('total_vaccinations.png')
plt.show()

### Vaccination Coverage

In [None]:
# Calculate % vaccinated
df_filtered['percent_vaccinated'] = (df_filtered['people_fully_vaccinated'] / df_filtered['population']) * 100
vaccination_coverage = df_filtered.groupby('location')['percent_vaccinated'].last()
print("Vaccination Coverage (Latest):\n", vaccination_coverage)

# Bar chart for vaccination coverage
plt.bar(vaccination_coverage.index, vaccination_coverage.values)
plt.title('Percentage of Population Fully Vaccinated (Latest)')
plt.xlabel('Country')
plt.ylabel('% Vaccinated')
plt.tight_layout()
plt.savefig('vaccination_coverage.png')
plt.show()

## Step 6: Choropleth Map
**Goal**: Visualize global cases on a world map for the latest date.

In [None]:
# Get latest data for each country
latest_data = df.groupby('location').last().reset_index()

# Plot choropleth map
fig = px.choropleth(
    latest_data,
    locations='iso_code',
    color='total_cases',
    hover_name='location',
    color_continuous_scale=px.colors.sequential.Plasma,
    title='Global COVID-19 Cases by Country (Latest Date)'
)
fig.write_to_png('choropleth_map.png')
fig.show()

## Step 7: Insights and Conclusion
### Key Insights
1. **USA's High Case Load**: The United States experienced a significant surge in cases, peaking at millions, far exceeding Kenya and India.
2. **India's Vaccination Success**: India achieved a high vaccination rate (~60% fully vaccinated), outperforming Kenya (~20%).
3. **Death Rate Variations**: Kenya's mean death rate (~0.03) is higher than India's (~0.01), possibly due to healthcare disparities.
4. **Global Distribution**: The choropleth map highlights high case density in North America and Europe.
5. **Anomaly**: A notable spike in USA cases in late 2021 aligns with the Delta variant wave.

### Conclusion
This analysis provides a comprehensive view of COVID-19 trends across Kenya, USA, and India, with visualizations and metrics like death rates and vaccination coverage. The dataset was fetched directly from Our World in Data, ensuring reproducibility. Future work could include:
- Adding user input for dynamic country/date selection.
- Building an interactive Streamlit dashboard.
- Incorporating hospitalization data if available.

## Notes
- **Dataset**: Fetched from `https://covid.ourworldindata.org/data/owid-covid-data.csv`.
- **Reproducibility**: No local files required; all data is fetched via URL.
- **Tools**: `pandas`, `matplotlib`, `seaborn`, `plotly.express`.
- **Export**: Save this notebook as `.ipynb` or convert to PDF via `File > Download as > PDF via LaTeX`.