# ðŸ“Š Descriptive Analysis of COVID-19 Survey Data

This notebook presents descriptive analysis based on the cleaned Ontario COVID-19 survey dataset.

## Import Required Libraries

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")


## Load Cleaned Dataset

In [None]:

df = pd.read_csv("covid_python_Dec25.csv")
df.head()


## 1. COVID-19 Test Results Across Age Categories

**Question:** What is the distribution of COVID-19 test results across different age categories?

**Interpretation:** This analysis shows how COVID-19 positive and negative cases are distributed across age groups. It helps identify which age categories appear most frequently in the dataset and how testing outcomes vary among them.

In [None]:

age_covid_dist = df.groupby(['age_category','covid_positive']).size().reset_index(name='count')
sns.barplot(data=age_covid_dist, x='age_category', y='count', hue='covid_positive')
plt.xticks(rotation=45)
plt.title("COVID-19 Results by Age Category")
plt.show()


## 2. COVID-19 Positivity by Province

**Question:** How does COVID-19 positivity vary across Ontario provinces?

**Interpretation:** This visualization highlights differences in reported COVID-19 outcomes across provinces. It provides an overview of geographic variation in case distribution.

In [None]:

province_dist = df.groupby(['province','covid_positive']).size().reset_index(name='count')
sns.barplot(data=province_dist, x='province', y='count', hue='covid_positive')
plt.xticks(rotation=45)
plt.title("COVID-19 Positivity by Province")
plt.show()


## 3. Frequency of Reported Symptoms

**Question:** What proportion of respondents reported each symptom?

**Interpretation:** This analysis identifies the most commonly reported symptoms among respondents, giving insight into prevalent COVID-19 related experiences.

In [None]:

symptom_counts = df['symptoms'].value_counts().head(10)
sns.barplot(x=symptom_counts.values, y=symptom_counts.index)
plt.title("Most Common Symptoms")
plt.show()


## 4. Urban vs Rural Distribution

**Question:** What is the distribution of COVID-19 cases between urban and rural regions?

**Interpretation:** The chart compares reported COVID-19 cases across urban and rural areas, helping understand population-level differences.

In [None]:

region_dist = df.groupby(['region','covid_positive']).size().reset_index(name='count')
sns.barplot(data=region_dist, x='region', y='count', hue='covid_positive')
plt.title("COVID-19 Cases by Region")
plt.show()


## 5. Distribution by Sex

**Question:** How are COVID-19 cases distributed by sex?

**Interpretation:** This breakdown shows how COVID-19 outcomes differ by sex, providing a demographic overview of the dataset.

In [None]:

sex_dist = df.groupby(['sex','covid_positive']).size().reset_index(name='count')
sns.barplot(data=sex_dist, x='sex', y='count', hue='covid_positive')
plt.title("COVID-19 Cases by Sex")
plt.show()


## 6. Testing Status

**Question:** What percentage of respondents were tested and what were the results?

**Interpretation:** This analysis summarizes testing coverage and corresponding outcomes, helping assess testing prevalence.

In [None]:

testing_dist = df.groupby(['tested','covid_positive']).size().reset_index(name='count')
sns.barplot(data=testing_dist, x='tested', y='count', hue='covid_positive')
plt.title("Testing Status and Results")
plt.show()


## 7. Mental Health Impact

**Question:** What is the frequency of reported mental health impact?

**Interpretation:** This visualization summarizes how respondents reported mental health impacts during the pandemic.

In [None]:

mh_dist = df['mental_health_impact'].value_counts()
sns.barplot(x=mh_dist.index, y=mh_dist.values)
plt.xticks(rotation=45)
plt.title("Mental Health Impact Distribution")
plt.show()


## 8. Household Contact

**Question:** How many respondents reported contact with COVID-19 cases in their household?

**Interpretation:** This distribution shows the prevalence of household exposure to COVID-19 among respondents.

In [None]:

contact_dist = df['contact_in_household'].value_counts()
contact_dist


## 9. Travel for Work or School

**Question:** What is the distribution of respondents by travel for work or school status?

**Interpretation:** This chart describes mobility behavior during the pandemic among survey participants.

In [None]:

travel_dist = df['travel_work_school'].value_counts()
sns.barplot(x=travel_dist.index, y=travel_dist.values)
plt.xticks(rotation=45)
plt.title("Travel for Work or School")
plt.show()


## 10. Financial Impact

**Question:** What proportion of respondents reported financial obligations being impacted?

**Interpretation:** This analysis highlights the economic effects of COVID-19 on respondents.

In [None]:

fin_dist = df['financial_obligations_impact'].value_counts()
sns.barplot(x=fin_dist.index, y=fin_dist.values)
plt.xticks(rotation=45)
plt.title("Financial Impact Distribution")
plt.show()
