# COVID-19 Global Data Analysis

This notebook analyzes global COVID-19 trends using data from Our World in Data.

## 1. Setup and Data Loading

First, let's import the required libraries and load our dataset.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Set style for better visualizations
plt.style.use('seaborn-v0_8')
sns.set_palette('husl')

OSError: 'seaborn' is not a valid package style, path of style file, URL of style file, or library style name (library styles are listed in `style.available`)

In [None]:
# Load the dataset
df = pd.read_csv('../data/owid-covid-data.csv')

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])

# Display basic information about the dataset
print('Dataset Info:')
df.info()

## 2. Data Cleaning

Let's clean our data and prepare it for analysis.

In [None]:
# Check for missing values
print('Missing values per column:')
df.isnull().sum()

In [None]:
# Select countries of interest
countries = ['Kenya', 'United States', 'India', 'United Kingdom', 'South Africa']
df_selected = df[df['location'].isin(countries)].copy()

# Handle missing values for key metrics
numeric_columns = ['total_cases', 'new_cases', 'total_deaths', 'new_deaths']
df_selected[numeric_columns] = df_selected[numeric_columns].fillna(0)

## 3. Exploratory Data Analysis

Let's analyze trends in cases and deaths.

In [None]:
# Plot total cases over time
plt.figure(figsize=(12, 6))
for country in countries:
    country_data = df_selected[df_selected['location'] == country]
    plt.plot(country_data['date'], country_data['total_cases'], label=country)

plt.title('Total COVID-19 Cases Over Time')
plt.xlabel('Date')
plt.ylabel('Total Cases')
plt.legend()
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

In [None]:
# Calculate and plot death rates
df_selected['death_rate'] = (df_selected['total_deaths'] / df_selected['total_cases'] * 100).round(2)

plt.figure(figsize=(10, 6))
sns.boxplot(data=df_selected, x='location', y='death_rate')
plt.title('Death Rate Distribution by Country')
plt.xlabel('Country')
plt.ylabel('Death Rate (%)')
plt.xticks(rotation=45)
plt.show()

## 4. Vaccination Analysis

Let's analyze vaccination progress across countries.

In [None]:
# Plot vaccination progress
plt.figure(figsize=(12, 6))
for country in countries:
    country_data = df_selected[df_selected['location'] == country]
    plt.plot(country_data['date'], country_data['people_fully_vaccinated_per_hundred'], label=country)

plt.title('Vaccination Progress (% of Population Fully Vaccinated)')
plt.xlabel('Date')
plt.ylabel('% Fully Vaccinated')
plt.legend()
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

## 5. Key Insights

Based on our analysis:

1. [This section will be filled with actual insights after running the analysis]
2. [Compare trends between countries]
3. [Discuss vaccination progress]
4. [Note any interesting patterns or anomalies]