# COVID-19 Data Analysis & Insights 📊🦠

**Objective:**  
This report analyzes the spread, impact, and vaccination efforts for COVID-19 across selected countries using the [Our World in Data COVID-19 Dataset](https://www.kaggle.com/datasets/gpreda/covid-world-vaccination-progress).

**Focus Countries:** India 🇮🇳, United States 🇺🇸, Kenya 🇰🇪  
**Tools Used:** Pandas, Matplotlib, Seaborn  
**Deliverables:** Visualizations, Insights, and a Summary Report

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv("/kaggle/input/our-world-in-data-covid19-dataset/owid-covid-data.csv")

# Convert date to datetime
df['date'] = pd.to_datetime(df['date'])

# Focus countries
countries = ['India', 'United States', 'Kenya']
df = df[df['location'].isin(countries)]

In [None]:
# Drop rows with missing values in key columns
df = df.dropna(subset=['date', 'new_cases', 'new_deaths'])

# Fill missing values for selected numeric columns
df[['total_cases', 'total_deaths', 'new_deaths_smoothed']] = df[
    ['total_cases', 'total_deaths', 'new_deaths_smoothed']
].fillna(method='ffill')

# Create death rate column
df['death_rate'] = df['total_deaths'] / df['total_cases']

## 📈 Visualizations

This section explores trends over time and compares countries based on total cases, deaths, and vaccination rates.

In [None]:
plt.figure(figsize=(12, 6))
for country in countries:
    data = df[df['location'] == country]
    plt.plot(data['date'], data['total_cases'], label=country)

plt.title('Total COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
df_sorted = df.sort_values('date')
latest_vax = df_sorted.dropna(subset=['total_vaccinations', 'population'])
latest_vax = latest_vax.groupby('location').tail(1)
latest_vax['percent_vaccinated'] = (latest_vax['total_vaccinations'] / latest_vax['population']) * 100

plt.figure(figsize=(8, 6))
sns.barplot(x='percent_vaccinated', y='location', data=latest_vax, palette='Greens')
plt.title('Estimated % of Population Vaccinated')
plt.xlabel('% Vaccinated')
plt.ylabel('Country')
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(12, 6))
for country in countries:
    data = df[df['location'] == country]
    plt.plot(data['date'], data['new_cases'], label=country)

plt.title('Daily New COVID-19 Cases')
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.xticks(rotation=45)
plt.show()

## 💡 Key Insights

1. **United States recorded the highest total COVID-19 cases** throughout the pandemic, with clear surges in winter periods.
2. **India had a rapid vaccination drive**, but still trails behind the U.S. in total population vaccinated.
3. **Kenya showed significantly fewer reported cases and vaccinations**, possibly due to limited testing and reporting.
4. **Death rates remained highest during early waves**, before vaccines became widely available.
5. **Vaccination progress correlates with lower death rates**, suggesting impact of immunization.

> 🔍 Note: Some values were filled forward due to missing entries in the dataset.

In [None]:
# Numerical correlation analysis
corr = df[['total_cases', 'total_deaths', 'new_cases', 'new_deaths', 'death_rate']].corr()

plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.tight_layout()
plt.show()

## 🧾 Conclusion

- The COVID-19 impact varied significantly across countries.
- Timely vaccination efforts played a key role in reducing new deaths.
- Gaps in data reporting remain a challenge in analyzing true trends.

Future work may involve:
- Live dashboards using Python APIs
- Regional and demographic breakdowns
- Predictive modeling using ML