<a href="https://colab.research.google.com/github/gusair/covid-19-analyses/blob/main/COVID19_USA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **COVID-19 in the United States: A Comprehensive Overview** #

## **Introduction**

In the year 2020, the entire world was devastated by the Coronavirus. This virus caused an pandedemic with no precedents in history, reshaping social integration and disrupting the way people relate to each other, how society is organized, and how we conduct business.

COVID-19 marked the beginning of a new era for humanity, advancing agendas of social distancing, remote work, and sanitary regulations. Although it peaked in 2021, we can still see its traces today.

In this notebook, we will explore the consequences of the pandemic in the United States of America, the country that suffered the most and faced significant losses, according to official reports.

### **Data Acquisition** ##

All the data used in this analysis comes from [Our World in Data](https://ourworldindata.org/coronavirus), a scientific publication that provides fascinating open-source data. They make it freely accessible through their portal, which I highly recommend exploring for its wealth of information.

Let's proceed to our analysis.

### **Dictionary of Variables**



* `country` - The name of the country or region for which the data is recorded.
* `date` - The date when the data was reported.
* `total_cases` - The cumulative number of confirmed COVID-19 cases.
* `new_cases` - The number of new confirmed COVID-19 cases reported on that specific day.
* `new_cases_smoothed` -  The smoothed (7-day rolling average) number of new confirmed cases.
* `total_cases_per_million`- The total number of confirmed COVID-19 cases per million people.
* `new_cases_per_million`- The number of new COVID-19 cases per million people.
* `new_cases_smoothed_per_million`-  The smoothed (7-day rolling average) number of new cases per million people.
* `total_deaths` - The cumulative number of confirmed COVID-19 deaths.
* `new_deaths` - The number of new COVID-19 deaths reported on that specific day.
* `new_deaths_smoothed` - The smoothed (7-day rolling average) number of new deaths.
* `total_deaths_per_million` -  The total number of confirmed COVID-19 deaths per million people.
* `new_deaths_per_million` - The number of new COVID-19 deaths per million people.
* `new_deaths_smoothed_per_million` - The smoothed (7-day rolling average) number of new deaths per million people.
* `excess_mortality` - The number of deaths above expected levels, due to both COVID-19 and its indirect impacts.
* `excess_mortality_cumulative` - The cumulative excess mortality over a period of time.
* `excess_mortality_cumulative_absolute` - The total excess mortality, expressed as an absolute number.
* `excess_mortality_cumulative_per_million` - The cumulative excess mortality per million people.
* `hosp_patients` - The number of hospitalized COVID-19 patients.
* `hosp_patients_per_million` - The number of hospitalized COVID-19 patients per million people.
* `weekly_hosp_admissions` - The number of new COVID-19 hospital admissions in a week.
* `weekly_hosp_admissions_per_million` - The number of new hospital admissions per million people in a week.
* `icu_patients` - The number of COVID-19 patients in intensive care units (ICU).
* `icu_patients_per_million` - The number of ICU patients per million people.
* `weekly_icu_admissions` - The number of new COVID-19 ICU admissions in a week.
* `weekly_icu_admissions_per_million` - The number of new ICU admissions per million people in a week.
* `stringency_index` - A measure of the strictness of government responses to the pandemic (e.g., lockdowns, travel restrictions).
* `reproduction_rate` - The average number of people who will be infected by a single person with COVID-19.
* `total_tests` - The total number of COVID-19 tests conducted.
* `new_tests` - The number of new COVID-19 tests conducted on that specific day.
* '`total_tests_per_thousand` - The total number of COVID-19 tests per thousand people.
* `new_tests_per_thousand` - The number of new COVID-19 tests per thousand people.
* `new_tests_smoothed` - The smoothed (7-day rolling average) number of new COVID-19 tests.
* `new_tests_smoothed_per_thousand` - The smoothed (7-day rolling average) number of new tests per thousand people.
* `positive_rate` - The percentage of COVID-19 tests that returned positive results.
* `tests_per_case` - The number of COVID-19 tests conducted per confirmed case.
* `total_vaccinations` - The total number of COVID-19 vaccinations administered.
* `people_vaccinated` - The number of people who have received at least one dose of a COVID-19 vaccine.
* `people_fully_vaccinated` - The number of people who have received all required doses of a COVID-19 vaccine (e.g., both doses for a two-dose vaccine).
* `total_boosters` - The total number of COVID-19 booster shots administered.
* `new_vaccinations` - The number of new COVID-19 vaccinations administered on that specific day.
* `new_vaccinations_smoothed` - The smoothed (7-day rolling average) number of new COVID-19 vaccinations.
* `total_vaccinations_per_hundred` - The total number of COVID-19 vaccinations administered per 100 people.
* `people_vaccinated_per_hundred` - The number of people vaccinated per 100 people.
* `people_fully_vaccinated_per_hundred` - The number of people fully vaccinated per 100 people.
* `total_boosters_per_hundred` - The total number of COVID-19 boosters administered per 100 people.
* `new_vaccinations_smoothed_per_million': The smoothed (7-day rolling average) number of new vaccinations per million people.
* `new_people_vaccinated_smoothed` - The smoothed (7-day rolling average) number of new people vaccinated.
* `new_people_vaccinated_smoothed_per_hundred` - The smoothed (7-day rolling average) number of new people vaccinated per 100 people.
* `code` - A code associated with the country or region (e.g., ISO code).
* `continent` - The continent to which the country or region belongs.
* `population` - The population of the country or region.
* `population_density` - The number of people per unit area (usually per square kilometer).
* `median_age` - The median age of the population.
* `life_expectancy` - The average life expectancy of the population.
* `gdp_per_capita` - The gross domestic product (GDP) per capita, often used as an indicator of economic health.
* `extreme_poverty` - The percentage of the population living below the extreme poverty line (often $1.90/day).
* `diabetes_prevalence` - The percentage of the population with diabetes.
* `handwashing_facilities` - The percentage of the population with access to handwashing facilities.
* `hospital_beds_per_thousand` - The number of hospital beds available per thousand people.
* `human_development_index` - A composite measure of human development, combining indicators like life expectancy, education level, and income.


In [None]:
# import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style()
%matplotlib inline

# import dataset COVID-19
df = pd.read_csv('https://catalog.ourworldindata.org/garden/covid/latest/compact/compact.csv')

## **Exploratory Analysis: United States** ##

After gaining a broad understanding of the impact of COVID-19 from a global perspective, we now turn our attention to the most affected country.

As observed in the initial part of our analysis, the United States stands out as the country with the highest number of cases and deaths, a sobering reality that reflects the magnitude of the pandemic's impact. The USA has reported an irreparable loss of 1.2 million lives from a staggering 103 million total cases.

In this section, we will dive deeper into the numbers, exploring and analyzing the data specific to the United States to uncover any insights about the unique patterns and factors that shaped the pandemic's trajectory in the country.

In [None]:
# dataframe for the USA
df_usa = df.loc[df.country == "United States"].copy()

In [None]:
# head
df_usa.head()

Unnamed: 0,country,date,total_cases,new_cases,new_cases_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths,new_deaths,...,population,population_density,median_age,life_expectancy,gdp_per_capita,extreme_poverty,diabetes_prevalence,handwashing_facilities,hospital_beds_per_thousand,human_development_index
429681,United States,2020-01-01,,,,,,,,,...,341534041.0,37.33665,37.72,,64623.125,1.249186,10.7,,2.87,0.927
429682,United States,2020-01-02,,,,,,,,,...,341534041.0,37.33665,37.72,,64623.125,1.249186,10.7,,2.87,0.927
429683,United States,2020-01-03,,,,,,,,,...,341534041.0,37.33665,37.72,,64623.125,1.249186,10.7,,2.87,0.927
429684,United States,2020-01-04,0.0,0.0,,0.0,0.0,,0.0,0.0,...,341534041.0,37.33665,37.72,,64623.125,1.249186,10.7,,2.87,0.927
429685,United States,2020-01-05,0.0,0.0,,0.0,0.0,,0.0,0.0,...,341534041.0,37.33665,37.72,,64623.125,1.249186,10.7,,2.87,0.927


In [None]:
# identify first reported case in the USA
df_usa.loc[df_usa['new_cases'] > 0.0].head(1)

Unnamed: 0,country,date,total_cases,new_cases,new_cases_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths,new_deaths,...,population,population_density,median_age,life_expectancy,gdp_per_capita,extreme_poverty,diabetes_prevalence,handwashing_facilities,hospital_beds_per_thousand,human_development_index
429700,United States,2020-01-20,1.0,1.0,0.142857,0.002928,0.002928,0.000418,0.0,0.0,...,341534041.0,37.33665,37.72,,64623.125,1.249186,10.7,,2.87,0.927


In [None]:
# identify first reported death in the USA
df_usa.loc[df_usa['new_deaths'] > 0.0].head(1)

Unnamed: 0,country,date,total_cases,new_cases,new_cases_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths,new_deaths,...,population,population_density,median_age,life_expectancy,gdp_per_capita,extreme_poverty,diabetes_prevalence,handwashing_facilities,hospital_beds_per_thousand,human_development_index
429740,United States,2020-02-29,69.0,5.0,4.857143,0.20203,0.01464,0.014222,1.0,1.0,...,341534041.0,37.33665,37.72,,64623.125,1.249186,10.7,,2.87,0.927


In [None]:
# identify the time lenght from the first case and the first death in USA
df_usa.date = pd.to_datetime(df_usa.date)

df_usa.date.loc[429700] - df_usa.date.loc[429740]

Timedelta('-40 days +00:00:00')

The first officially reported case of COVID-19 in the United States occurred on January 20, 2020. Exactly 40 days later, the first official death was reported. This first death took place after 69 confirmed cases had been documented.

In [None]:
# check mortality_rate
df_usa['mortality_rate'] = (df_usa['total_deaths'] / df_usa['total_cases']) * 100

df_usa.loc[df_usa['date'] == '2024-11-24', ['mortality_rate']]

Unnamed: 0,mortality_rate
431470,1.168619


In [None]:
# plot graphs
