# Exploring COVID-19 Data: An In-depth Analysis

<div>
<img src=https://thanhtra.com.vn/data/images/0/2021/12/05/congdinh/blue-covid-banner.jpg width="1500">
</div>

# Introduction

In the face of the global COVID-19 pandemic, understanding the data behind the numbers is crucial. This project dives deep into COVID-19 datasets, leveraging data analysis and SQL querying to extract meaningful insights. We explore key metrics such as total cases, deaths, vaccinations, and their percentages, shedding light on the pandemic's impact worldwide.

## Project Highlights

1. **Connect to the Database:**
   - Utilize SQLite to connect to an SQLite database for efficient data manipulation.

2. **Load the Dataset:**
   - Import COVID-19 datasets from CSV files into the SQLite database for easy access.

3. **Data Exploration:**
   - Preprocess `COVID_DEATHS` and `COVID_VACCINATIONS` datasets, correcting data types and creating processed tables.

4. **Global COVID-19 Overview:**
   - Analyze global COVID-19 statistics, including total cases, deaths, vaccination efforts, and related percentages.
   - Visualize trends in new COVID-19 cases, deaths, and vaccinations worldwide over time.

5. **COVID-19 Impact Across Continents:**
   - Explore COVID-19 metrics for continents, examining infection rates, mortality, and vaccination progress.

6. **COVID-19 Impact Across Income Levels:**
   - Analyze COVID-19 impact and vaccination responses based on income levels, highlighting disparities.

7. **COVID-19 Impact Across Countries:**
   - Delve into COVID-19 metrics for individual countries, focusing on population, total cases, deaths, vaccinations, and trends.
   - Visualize regional patterns, vaccination progress, and challenges faced by countries.

8. **COVID-19 Trends in Vietnam:**
   - Focus specifically on Vietnam's COVID-19 situation, examining total cases, deaths, vaccinations, and trends over time.
   - Visualize the progression of the pandemic in Vietnam, highlighting key insights.

## Skills Utilized

- **Joins:** Connecting data from multiple sources for comprehensive analysis.

- **Common Table Expressions (CTE's):** Simplifying complex queries and calculations.

- **Temporary Tables:** Storing interim results for efficient data manipulation.

- **Window Functions:** Performing computations over specified subsets of data.

- **Aggregate Functions:** Summarizing data to reveal trends and patterns.

- **Creating Views:** Organizing data for future reference and visualization.

- **Converting Data Types:** Ensuring data compatibility and accuracy.


This project not only showcases SQL proficiency but also aims to uncover trends, disparities, and progress in the fight against COVID-19. Join us on this data-driven journey as we unravel the story behind the numbers.

## Overview of the Dataset

In this project, we focus on analyzing specific aspects of the extensive [COVID-19 dataset](https://ourworldindata.org/covid-deaths). This dataset is diligently maintained by [Our World in Data](https://ourworldindata.org), with daily updates to ensure it reflects the latest information relevant to the COVID-19 pandemic. It encompasses a diverse range of crucial metrics sourced from reputable institutions, guaranteeing accuracy and reliability.

Our analysis hones in on the following key variables:

- **Confirmed Cases:** Total and new confirmed cases, smoothed averages, and cases per million people.
- **Confirmed Deaths:** Total and new deaths attributed to COVID-19, smoothed averages, and deaths per million people.
- **Vaccinations:** Total doses administered, people vaccinated, fully vaccinated, and booster doses.

The dataset we're using contains 390,786 rows spanning from January 1, 2020, to April 18, 2024. It is divided into two CSV files:

- **CovidDeaths.csv**: Contains the data on confirmed cases and deaths.
- **CovidVaccinations.csv**: Contains the data on vaccinations.

For further insights, you can explore the complete dataset [here](https://github.com/owid/covid-19-data/tree/master/public/data).

# Connect to Database

We will use the SQLite database in this project to easily work with SQL on Google Colab.

In [None]:
import sqlite3

# Connect to an SQLite database; use ':memory:' for an in-memory database
conn = sqlite3.connect('covid_data.db')

In [None]:
%%capture
# Install ipython-sql
!pip install ipython-sql

In [None]:
# Load the SQL extension
%load_ext sql

# Create a SQLite database
%sql sqlite:///covid_data.db

# Load the Dataset

We will import the dataset from CSV files into the SQLite database we've created.

In [None]:
from google.colab import drive

drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Datasets/CovidDeaths.csv")
df.to_sql("COVID_DEATHS", conn, if_exists="append", chunksize=100, index=False, method="multi")

390786

In [None]:
df = pd.read_csv("/content/drive/MyDrive/Datasets/CovidVaccinations.csv")
df.to_sql("COVID_VACCINATIONS", conn, if_exists="append", chunksize=100, index=False, method="multi")

390786

# Data Preprocessing

Let's start with the `COVID_DEATHS` dataset, which contains the data on confirmed cases and deaths.

In [None]:
%%sql
SELECT continent, location, date, population, total_cases, new_cases, total_deaths, new_deaths
FROM COVID_DEATHS
ORDER BY location, date
LIMIT 100 -- due to limited resources

 * sqlite:///covid_data.db
Done.


continent,location,date,population,total_cases,new_cases,total_deaths,new_deaths
Asia,Afghanistan,1/1/2021,41128772,51848.0,0.0,2158.0,0.0
Asia,Afghanistan,1/1/2022,41128772,157902.0,0.0,7352.0,0.0
Asia,Afghanistan,1/1/2023,41128772,207579.0,257.0,7849.0,4.0
Asia,Afghanistan,1/1/2024,41128772,230375.0,0.0,7973.0,0.0
Asia,Afghanistan,1/10/2020,41128772,,0.0,,0.0
Asia,Afghanistan,1/10/2021,41128772,53489.0,780.0,2277.0,56.0
Asia,Afghanistan,1/10/2022,41128772,158345.0,0.0,7369.0,0.0
Asia,Afghanistan,1/10/2023,41128772,207780.0,0.0,7851.0,0.0
Asia,Afghanistan,1/10/2024,41128772,230642.0,0.0,7973.0,0.0
Asia,Afghanistan,1/11/2020,41128772,,0.0,,0.0


We can observe that the dataset is updated weekly, with new cases and new deaths reported only on Sundays. Therefore, in this analysis, the dates of events (if mentioned) are only accurate in weeks.

Additionally, there seems to be an issue with the date column as the ordering does not function correctly in the previous query. Let's examine the table schema to identify the problem.

In [None]:
%%sql
PRAGMA table_info(COVID_DEATHS)

 * sqlite:///covid_data.db
Done.


cid,name,type,notnull,dflt_value,pk
0,iso_code,TEXT,0,,0
1,continent,TEXT,0,,0
2,location,TEXT,0,,0
3,date,TEXT,0,,0
4,population,INTEGER,0,,0
5,total_cases,REAL,0,,0
6,new_cases,REAL,0,,0
7,new_cases_smoothed,REAL,0,,0
8,total_deaths,REAL,0,,0
9,new_deaths,REAL,0,,0


We see that the `date` column is now stored as a string, which is not the correct datatype for a date. To fix this, we will convert this column to a `DATE` datatype.

Next, we'll save the processed `date` column along with the other columns we're interested in into a temporary table called `COVID_DEATHS_PROCESSED` for future analysis.

In [None]:
%%sql
DROP TABLE IF EXISTS COVID_DEATHS_PROCESSED;

CREATE TABLE COVID_DEATHS_PROCESSED
(
    continent TEXT,
    location TEXT,
    date DATE,
    population REAL,
    total_cases REAL,
    new_cases REAL,
    total_deaths REAL,
    new_deaths REAL
);

INSERT INTO COVID_DEATHS_PROCESSED
WITH date_split AS (
    SELECT
      continent,
      location,
      SUBSTR(date, 1, INSTR(date, '/') - 1) AS month,
      SUBSTR(SUBSTR(date, INSTR(date, '/') + 1), 1, INSTR(SUBSTR(date, INSTR(date, '/') + 1), '/') - 1) AS date,
      SUBSTR(SUBSTR(date, INSTR(date, '/') + 1), INSTR(SUBSTR(date, INSTR(date, '/') + 1), '/') + 1) AS year,
      population,
      total_cases,
      new_cases,
      total_deaths,
      new_deaths
    FROM COVID_DEATHS
),
date_normalize AS (
    SELECT
      continent,
      location,
      year,
      CASE WHEN LENGTH(month) == 1 THEN '0' || month ELSE month END AS month,
      CASE WHEN LENGTH(date) == 1 THEN '0' || date ELSE date END AS date,
      population,
      total_cases,
      new_cases,
      total_deaths,
      new_deaths
    FROM date_split
)

SELECT
    continent,
    location,
    DATE(year || '-' || month || '-' || date) AS date,
    population,
    total_cases,
    new_cases,
    total_deaths,
    new_deaths
FROM date_normalize
ORDER BY location, date

 * sqlite:///covid_data.db
Done.
Done.
390786 rows affected.


[]

Let's review our new staging table.

In [None]:
%%sql
SELECT *
FROM COVID_DEATHS_PROCESSED
ORDER BY location, date
LIMIT 100 -- due to limited resources

 * sqlite:///covid_data.db
Done.


continent,location,date,population,total_cases,new_cases,total_deaths,new_deaths
Asia,Afghanistan,2020-01-05,41128772.0,,0.0,,0.0
Asia,Afghanistan,2020-01-06,41128772.0,,0.0,,0.0
Asia,Afghanistan,2020-01-07,41128772.0,,0.0,,0.0
Asia,Afghanistan,2020-01-08,41128772.0,,0.0,,0.0
Asia,Afghanistan,2020-01-09,41128772.0,,0.0,,0.0
Asia,Afghanistan,2020-01-10,41128772.0,,0.0,,0.0
Asia,Afghanistan,2020-01-11,41128772.0,,0.0,,0.0
Asia,Afghanistan,2020-01-12,41128772.0,,0.0,,0.0
Asia,Afghanistan,2020-01-13,41128772.0,,0.0,,0.0
Asia,Afghanistan,2020-01-14,41128772.0,,0.0,,0.0


In [None]:
%%sql
PRAGMA table_info(COVID_DEATHS_PROCESSED)

 * sqlite:///covid_data.db
Done.


cid,name,type,notnull,dflt_value,pk
0,continent,TEXT,0,,0
1,location,TEXT,0,,0
2,date,DATE,0,,0
3,population,REAL,0,,0
4,total_cases,REAL,0,,0
5,new_cases,REAL,0,,0
6,total_deaths,REAL,0,,0
7,new_deaths,REAL,0,,0


Everything looks good. Now, let's explore the number of unique locations and continents in our dataset using the processed table.

In [None]:
%%sql
SELECT continent, GROUP_CONCAT(location, ',') AS location
FROM (SELECT DISTINCT continent, location FROM COVID_DEATHS_PROCESSED)
GROUP BY continent
ORDER BY 1

 * sqlite:///covid_data.db
Done.


continent,location
,"Africa,Asia,Europe,European Union,High income,Low income,Lower middle income,North America,Oceania,South America,Upper middle income,World"
Africa,"Algeria,Angola,Benin,Botswana,Burkina Faso,Burundi,Cameroon,Cape Verde,Central African Republic,Chad,Comoros,Congo,Cote d'Ivoire,Democratic Republic of Congo,Djibouti,Egypt,Equatorial Guinea,Eritrea,Eswatini,Ethiopia,Gabon,Gambia,Ghana,Guinea,Guinea-Bissau,Kenya,Lesotho,Liberia,Libya,Madagascar,Malawi,Mali,Mauritania,Mauritius,Mayotte,Morocco,Mozambique,Namibia,Niger,Nigeria,Reunion,Rwanda,Saint Helena,Sao Tome and Principe,Senegal,Seychelles,Sierra Leone,Somalia,South Africa,South Sudan,Sudan,Tanzania,Togo,Tunisia,Uganda,Western Sahara,Zambia,Zimbabwe"
Asia,"Afghanistan,Armenia,Azerbaijan,Bahrain,Bangladesh,Bhutan,Brunei,Cambodia,China,Georgia,Hong Kong,India,Indonesia,Iran,Iraq,Israel,Japan,Jordan,Kazakhstan,Kuwait,Kyrgyzstan,Laos,Lebanon,Macao,Malaysia,Maldives,Mongolia,Myanmar,Nepal,North Korea,Northern Cyprus,Oman,Pakistan,Palestine,Philippines,Qatar,Saudi Arabia,Singapore,South Korea,Sri Lanka,Syria,Taiwan,Tajikistan,Thailand,Timor,Turkey,Turkmenistan,United Arab Emirates,Uzbekistan,Vietnam,Yemen"
Europe,"Albania,Andorra,Austria,Belarus,Belgium,Bosnia and Herzegovina,Bulgaria,Croatia,Cyprus,Czechia,Denmark,England,Estonia,Faeroe Islands,Finland,France,Germany,Gibraltar,Greece,Guernsey,Hungary,Iceland,Ireland,Isle of Man,Italy,Jersey,Kosovo,Latvia,Liechtenstein,Lithuania,Luxembourg,Malta,Moldova,Monaco,Montenegro,Netherlands,North Macedonia,Northern Ireland,Norway,Poland,Portugal,Romania,Russia,San Marino,Scotland,Serbia,Slovakia,Slovenia,Spain,Sweden,Switzerland,Ukraine,United Kingdom,Vatican,Wales"
North America,"Anguilla,Antigua and Barbuda,Aruba,Bahamas,Barbados,Belize,Bermuda,Bonaire Sint Eustatius and Saba,British Virgin Islands,Canada,Cayman Islands,Costa Rica,Cuba,Curacao,Dominica,Dominican Republic,El Salvador,Greenland,Grenada,Guadeloupe,Guatemala,Haiti,Honduras,Jamaica,Martinique,Mexico,Montserrat,Nicaragua,Panama,Puerto Rico,Saint Barthelemy,Saint Kitts and Nevis,Saint Lucia,Saint Martin (French part),Saint Pierre and Miquelon,Saint Vincent and the Grenadines,Sint Maarten (Dutch part),Trinidad and Tobago,Turks and Caicos Islands,United States,United States Virgin Islands"
Oceania,"American Samoa,Australia,Cook Islands,Fiji,French Polynesia,Guam,Kiribati,Marshall Islands,Micronesia (country),Nauru,New Caledonia,New Zealand,Niue,Northern Mariana Islands,Palau,Papua New Guinea,Pitcairn,Samoa,Solomon Islands,Tokelau,Tonga,Tuvalu,Vanuatu,Wallis and Futuna"
South America,"Argentina,Bolivia,Brazil,Chile,Colombia,Ecuador,Falkland Islands,French Guiana,Guyana,Paraguay,Peru,Suriname,Uruguay,Venezuela"


We can observe that there are rows where the `continent` is null, and the `location` is the continent itself (such as Asia, Africa, Europe, etc.), or descriptors like High income, Low income, Lower middle income, Upper middle income, or World.

For our further analysis in this project, to obtain accurate numbers for each country, we will filter out the rows where the `continent` is null. If we need to gather the numbers for each continent, we will use the rows where the `continent` is null and the `location` is the respective continent.

Let's redirect our focus to the second table, `COVID_VACCINATIONS`.

In [None]:
%%sql
SELECT
    continent,
    location,
    date,
    total_vaccinations,
    new_vaccinations,
    people_vaccinated,
    people_fully_vaccinated
FROM COVID_VACCINATIONS
ORDER BY location, date
LIMIT 100 -- due to limited resources

 * sqlite:///covid_data.db
Done.


continent,location,date,total_vaccinations,new_vaccinations,people_vaccinated,people_fully_vaccinated
Asia,Afghanistan,1/1/2021,,,,
Asia,Afghanistan,1/1/2022,,,,
Asia,Afghanistan,1/1/2023,,,,
Asia,Afghanistan,1/1/2024,,,,
Asia,Afghanistan,1/10/2020,,,,
Asia,Afghanistan,1/10/2021,,,,
Asia,Afghanistan,1/10/2022,,,,
Asia,Afghanistan,1/10/2023,,,,
Asia,Afghanistan,1/10/2024,,,,
Asia,Afghanistan,1/11/2020,,,,


We've identified an issue with the data column, much like the `COVID_DEATHS` dataset. To address this, we'll preprocess it as we did earlier with the `COVID_DEATHS` dataset and then save it in a temporary table named `COVID_VACCINATIONS_PROCESSED`.

In [None]:
%%sql
DROP TABLE IF EXISTS COVID_VACCINATIONS_PROCESSED;

CREATE TABLE COVID_VACCINATIONS_PROCESSED
(
    continent TEXT,
    location TEXT,
    date DATE,
    total_vaccinations REAL,
    new_vaccinations REAL,
    people_vaccinated REAL,
    people_fully_vaccinated REAL
);

INSERT INTO COVID_VACCINATIONS_PROCESSED
WITH date_split AS (
    SELECT
      continent,
      location,
      SUBSTR(date, 1, INSTR(date, '/') - 1) AS month,
      SUBSTR(SUBSTR(date, INSTR(date, '/') + 1), 1, INSTR(SUBSTR(date, INSTR(date, '/') + 1), '/') - 1) AS date,
      SUBSTR(SUBSTR(date, INSTR(date, '/') + 1), INSTR(SUBSTR(date, INSTR(date, '/') + 1), '/') + 1) AS year,
      total_vaccinations,
      new_vaccinations,
      people_vaccinated,
      people_fully_vaccinated
    FROM COVID_VACCINATIONS
),
date_normalize AS (
    SELECT
      continent,
      location,
      year,
      CASE WHEN LENGTH(month) == 1 THEN '0' || month ELSE month END AS month,
      CASE WHEN LENGTH(date) == 1 THEN '0' || date ELSE date END AS date,
      total_vaccinations,
      new_vaccinations,
      people_vaccinated,
      people_fully_vaccinated
    FROM date_split
)

SELECT
    continent,
    location,
    DATE(year || '-' || month || '-' || date) AS date,
    total_vaccinations,
    new_vaccinations,
    people_vaccinated,
    people_fully_vaccinated
FROM date_normalize
ORDER BY location, date

 * sqlite:///covid_data.db
Done.
Done.
390786 rows affected.


[]

Let's take a look at our new table.

In [None]:
%%sql
SELECT *
FROM COVID_VACCINATIONS_PROCESSED
WHERE date > '2021-01-01'
ORDER BY location, date
LIMIT 100 -- due to limited resources

 * sqlite:///covid_data.db
Done.


continent,location,date,total_vaccinations,new_vaccinations,people_vaccinated,people_fully_vaccinated
Asia,Afghanistan,2021-01-02,,,,
Asia,Afghanistan,2021-01-03,,,,
Asia,Afghanistan,2021-01-04,,,,
Asia,Afghanistan,2021-01-05,,,,
Asia,Afghanistan,2021-01-06,,,,
Asia,Afghanistan,2021-01-07,,,,
Asia,Afghanistan,2021-01-08,,,,
Asia,Afghanistan,2021-01-09,,,,
Asia,Afghanistan,2021-01-10,,,,
Asia,Afghanistan,2021-01-11,,,,


We've identified another challenge with our table: the `total_vaccinations` column contains data in only a few rows, leaving most entries empty. This could disrupt our line chart if we intend to showcase total vaccinations on our dashboard. The `people_vaccinated` and `people_fully_vaccinated` columns face similar issues as `total_vaccinations`.

Additionally, the `new_vaccinations` column is filled with numerous null values, which is unexpected considering the increase in `total_vaccinations`.

To tackle these issues, we'll follow these steps:

1. For every null value in the `total_vaccinations`, `people_vaccinated`, and `people_fully_vaccinated` columns, we will replace it with the nearest non-null value of the respective column for that country.
2. Next, we'll calculate the `new_vaccinations` by subtracting two consecutive `total_vaccinations` values.
3. Finally, we'll save the processed data in a new temporary table named `COVID_VACCINATIONS_PROCESSED_V2`.

In [None]:
%%sql
DROP TABLE IF EXISTS COVID_VACCINATIONS_PROCESSED_V2;

CREATE TABLE COVID_VACCINATIONS_PROCESSED_V2
(
    continent TEXT,
    location TEXT,
    date DATE,
    total_vaccinations REAL,
    new_vaccinations REAL,
    people_vaccinated REAL,
    people_fully_vaccinated REAL
);

INSERT INTO COVID_VACCINATIONS_PROCESSED_V2
WITH total_vaccinations_portion AS (
    SELECT
        continent,
        location,
        date,
        total_vaccinations,
        SUM(CASE WHEN total_vaccinations IS NULL THEN 0 ELSE 1 END) OVER (PARTITION BY continent, location ORDER BY date) AS total_vaccinations_partition,
        new_vaccinations,
        people_vaccinated,
        SUM(CASE WHEN people_vaccinated IS NULL THEN 0 ELSE 1 END) OVER (PARTITION BY continent, location ORDER BY date) AS people_vaccinated_partition,
        people_fully_vaccinated,
        SUM(CASE WHEN people_fully_vaccinated IS NULL THEN 0 ELSE 1 END) OVER (PARTITION BY continent, location ORDER BY date) AS people_fully_vaccinated_partition
    FROM COVID_VACCINATIONS_PROCESSED
),
new_total_vaccinations AS (
    SELECT
        continent,
        location,
        date,
        FIRST_VALUE(total_vaccinations) OVER(PARTITION BY total_vaccinations_partition, continent, location) AS total_vaccinations,
        new_vaccinations,
        FIRST_VALUE(people_vaccinated) OVER(PARTITION BY people_vaccinated_partition, continent, location) AS people_vaccinated,
        FIRST_VALUE(people_fully_vaccinated) OVER(PARTITION BY people_fully_vaccinated_partition, continent, location) AS people_fully_vaccinated
    FROM total_vaccinations_portion
)

SELECT
    continent,
    location,
    date,
    total_vaccinations,
    total_vaccinations - LAG(total_vaccinations) OVER(PARTITION BY continent, location ORDER BY date) AS new_vaccinations,
    people_vaccinated,
    people_fully_vaccinated
FROM new_total_vaccinations
ORDER BY location, date

 * sqlite:///covid_data.db
Done.
Done.
390786 rows affected.


[]

Let's once more review our processed table.

In [None]:
%%sql
SELECT *
FROM COVID_VACCINATIONS_PROCESSED_V2
WHERE date > '2021-01-01'
ORDER BY location, date
LIMIT 100 -- due to limited resources

 * sqlite:///covid_data.db
Done.


continent,location,date,total_vaccinations,new_vaccinations,people_vaccinated,people_fully_vaccinated
Asia,Afghanistan,2021-01-02,,,,
Asia,Afghanistan,2021-01-03,,,,
Asia,Afghanistan,2021-01-04,,,,
Asia,Afghanistan,2021-01-05,,,,
Asia,Afghanistan,2021-01-06,,,,
Asia,Afghanistan,2021-01-07,,,,
Asia,Afghanistan,2021-01-08,,,,
Asia,Afghanistan,2021-01-09,,,,
Asia,Afghanistan,2021-01-10,,,,
Asia,Afghanistan,2021-01-11,,,,


Finally, let's create a view that combines the processed `COVID_DEATHS` and processed `COVID_VACCINATIONS` tables to facilitate easier analysis of the data in both tables later. Let's name it `COVID_COMBINE_VIEW`.

In [None]:
%%sql
DROP VIEW IF EXISTS COVID_COMBINE_VIEW;

CREATE VIEW COVID_COMBINE_VIEW
AS
SELECT
    dae.*,
    vac.total_vaccinations,
    vac.new_vaccinations,
    vac.people_vaccinated,
    vac.people_fully_vaccinated
FROM COVID_DEATHS_PROCESSED dae
JOIN COVID_VACCINATIONS_PROCESSED_V2 vac
    ON IFNULL(dae.continent, "") = IFNULL(vac.continent, "")
    AND dae.location = vac.location
    AND dae.date = vac.date
ORDER BY 1, 2, 3

 * sqlite:///covid_data.db
Done.
Done.


[]

So now we can query from the view.

In [None]:
%%sql
SELECT *
FROM COVID_COMBINE_VIEW
ORDER BY location, date
LIMIT 100 -- due to limited resources

 * sqlite:///covid_data.db
Done.


continent,location,date,population,total_cases,new_cases,total_deaths,new_deaths,total_vaccinations,new_vaccinations,people_vaccinated,people_fully_vaccinated
Asia,Afghanistan,2020-01-05,41128772.0,,0.0,,0.0,,,,
Asia,Afghanistan,2020-01-06,41128772.0,,0.0,,0.0,,,,
Asia,Afghanistan,2020-01-07,41128772.0,,0.0,,0.0,,,,
Asia,Afghanistan,2020-01-08,41128772.0,,0.0,,0.0,,,,
Asia,Afghanistan,2020-01-09,41128772.0,,0.0,,0.0,,,,
Asia,Afghanistan,2020-01-10,41128772.0,,0.0,,0.0,,,,
Asia,Afghanistan,2020-01-11,41128772.0,,0.0,,0.0,,,,
Asia,Afghanistan,2020-01-12,41128772.0,,0.0,,0.0,,,,
Asia,Afghanistan,2020-01-13,41128772.0,,0.0,,0.0,,,,
Asia,Afghanistan,2020-01-14,41128772.0,,0.0,,0.0,,,,


Ensure that the created view has the same number of rows as the original tables.

In [None]:
%%sql
SELECT 'COVID_COMBINE_VIEW' AS TABLE_NAME, COUNT(*) AS ROW_COUNT
FROM COVID_COMBINE_VIEW

UNION ALL
SELECT 'COVID_DEATHS' AS TABLE_NAME, COUNT(*) AS ROW_COUNT
FROM COVID_DEATHS

UNION ALL
SELECT 'COVID_VACCINATIONS' AS TABLE_NAME, COUNT(*) AS ROW_COUNT
FROM COVID_VACCINATIONS

 * sqlite:///covid_data.db
Done.


TABLE_NAME,ROW_COUNT
COVID_COMBINE_VIEW,390786
COVID_DEATHS,390786
COVID_VACCINATIONS,390786


Everything looks good. Let's delve into our analysis.

# Data Exploration

## Global COVID-19 Overview

Let's explore global COVID-19 statistics.

In [None]:
# Let's look at the global numbers
%%sql
SELECT
  MAX(population) AS global_population,
  MAX(total_cases) AS global_total_cases,
  MAX(total_deaths) AS global_total_deaths,
  MAX(total_cases) / MAX(population) AS global_covid_percentage,
  MAX(total_deaths) / MAX(total_cases) AS global_death_percentage,
  MAX(total_vaccinations) AS global_total_vaccinations,
  MAX(people_vaccinated) AS global_people_vaccinated,
  MAX(people_fully_vaccinated) AS global_people_fully_vaccinated,
  MAX(people_vaccinated) / MAX(population) AS global_vaccinated_percentage,
  MAX(people_fully_vaccinated) / MAX(population) AS global_fully_vaccinated_percentage,
  MAX(total_vaccinations) / MAX(people_vaccinated) AS global_avg_vaccination
FROM COVID_COMBINE_VIEW
WHERE location == 'World'

 * sqlite:///covid_data.db
Done.


global_population,global_total_cases,global_total_deaths,global_covid_percentage,global_death_percentage,global_total_vaccinations,global_people_vaccinated,global_people_fully_vaccinated,global_vaccinated_percentage,global_fully_vaccinated_percentage,global_avg_vaccination
7975105024.0,775251765.0,7043660.0,0.0972089725046861,0.0090856420043106,13570830469.0,5630428544.0,5176873972.0,0.7060005513477235,0.6491292536488106,2.41026599715249


**Global COVID-19 Overview**

**Overview:**
- **Global Population**: 7.98 billion
- **Global Total Cases**: 775.25 million
- **Global Total Deaths**: 7.04 million
- **Global COVID Percentage**: 9.72%
- **Global Death Percentage**: 0.91%

**Vaccination Data:**
- **Global Total Vaccinations**: 13.57 billion doses
- **Global People Vaccinated**: 5.63 billion (70.6% of the global population)
- **Global People Fully Vaccinated**: 5.18 billion (64.9% of the global population)
- **Global Vaccinated Percentage**: 70.6%
- **Global Fully Vaccinated Percentage**: 64.9%
- **Global Average Vaccination**: 2.41 doses per person

**Insights:**
- The global COVID-19 pandemic has impacted a significant portion of the population, with 775.25 million reported cases and 7.04 million deaths.
- Approximately 9.72% of the global population has been infected with COVID-19, with a death rate of 0.91% among those infected.
- The vaccination efforts have been substantial, with 13.57 billion doses administered globally.
- 5.63 billion individuals have received at least one dose of the COVID-19 vaccine, representing 70.6% of the global population.
- 5.18 billion people are fully vaccinated, accounting for 64.9% of the global population.
- On average, each person globally has received 2.41 doses of the COVID-19 vaccine, considering fully vaccinated individuals.

These insights provide a comprehensive view of the global COVID-19 situation, showcasing the scale of infections, deaths, and vaccination efforts worldwide.

Let's explore the trend of new COVID-19 cases, deaths, and vaccinations worldwide over time.

In [None]:
%%sql
SELECT
  location,
  date,
  new_cases AS global_new_cases,
  new_deaths AS global_new_deaths,
  new_vaccinations AS global_new_vaccinations
FROM COVID_COMBINE_VIEW
WHERE location == 'World'
  AND NOT (new_cases == 0 AND new_deaths == 0)
ORDER BY date

 * sqlite:///covid_data.db
Done.


location,date,global_new_cases,global_new_deaths,global_new_vaccinations
World,2020-01-05,2.0,3.0,
World,2020-01-12,45.0,1.0,
World,2020-01-19,90.0,2.0,
World,2020-01-26,1896.0,56.0,
World,2020-02-02,12538.0,310.0,
World,2020-02-09,23059.0,545.0,
World,2020-02-16,31734.0,864.0,
World,2020-02-23,9578.0,692.0,
World,2020-03-01,8272.0,519.0,
World,2020-03-08,20207.0,650.0,


**Global COVID-19 Trends**

**Overview:**
- **New Cases and Deaths:**
  - The data covers the period from January 5, 2020, to March 31, 2024.
  - The highest recorded weekly new cases occurred on January 24, 2021, with 52.9 million cases.
  - The peak in weekly new deaths was on January 31, 2021, with 1.1 million deaths.
  - Notable peaks in new cases and deaths align with the winter months of each year, reflecting seasonal patterns.
  
**Vaccination Progress:**
- **Global New Vaccinations:**
  - The dataset shows a substantial increase in new vaccinations starting from December 2020.
  - January 24, 2021, recorded the highest weekly new vaccinations with over 216.8 million doses administered.
  - Since then, there has been a consistent effort to vaccinate, with fluctuations in weekly new vaccinations.
  - Vaccination efforts seem to have peaked in mid-2021 and have remained relatively steady since, with occasional spikes.

**Key Insights:**
- **Trends in Cases and Deaths:**
  - The data illustrates the global impact of COVID-19, with peaks in new cases and deaths during the winter months of each year.
  - Notable spikes in new cases and deaths coincide with waves of the pandemic, highlighting its cyclical nature.
  - While cases and deaths have decreased significantly from their peaks, the virus continues to present challenges globally.

- **Vaccination Efforts:**
  - The data showcases a significant escalation in vaccination efforts, especially from late 2020 onwards.
  - Peaks in new vaccinations align with key milestones in vaccine availability and distribution.
  - Despite progress, there are fluctuations in new vaccination rates, possibly due to supply chain challenges and varying global vaccination strategies.

- **Seasonal Patterns:**
  - Seasonal trends are evident, with higher case and death numbers during colder months, likely due to increased indoor gatherings and reduced ventilation.
  - This pattern underscores the importance of continued vigilance and adaptive public health measures to mitigate the impact of COVID-19.

- **Future Considerations:**
  - As vaccination rates stabilize and new variants emerge, monitoring trends in cases, deaths, and vaccination rates remains crucial.
  - Ongoing data analysis will be essential to inform public health strategies, resource allocation, and global response efforts.
  
This data analysis provides valuable insights into the global trajectory of COVID-19, highlighting trends in new cases, deaths, and vaccination efforts over the past few years. Understanding these patterns can aid in devising effective strategies to combat the ongoing pandemic and prepare for future challenges.

## COVID-19 Impact Across Continents

Let's analyze COVID-19 statistics for individual continents.

In [None]:
%%sql
SELECT
  location,
  MAX(population) AS population,
  MAX(total_cases) AS total_cases,
  MAX(total_deaths) AS total_deaths,
  MAX(total_cases) / MAX(population) AS covid_percentage,
  MAX(total_deaths) / MAX(total_cases) AS death_percentage,
  MAX(total_vaccinations) AS total_vaccinations,
  MAX(people_vaccinated) AS people_vaccinated,
  MAX(people_fully_vaccinated) AS people_fully_vaccinated,
  MAX(people_vaccinated) / MAX(population) AS vaccinated_percentage,
  MAX(people_fully_vaccinated) / MAX(population) AS fully_vaccinated_percentage,
  MAX(total_vaccinations) / MAX(people_vaccinated) AS avg_vaccination
FROM COVID_COMBINE_VIEW
WHERE continent IS NULL
  AND location NOT IN ('World', 'High income', 'Upper middle income', 'Lower middle income', 'Low income')
GROUP BY location
ORDER BY location

 * sqlite:///covid_data.db
Done.


location,population,total_cases,total_deaths,covid_percentage,death_percentage,total_vaccinations,people_vaccinated,people_fully_vaccinated,vaccinated_percentage,fully_vaccinated_percentage,avg_vaccination
Africa,1426736614.0,13140491.0,259095.0,0.0092101729717016,0.0197172997569116,863138096.0,554998291.0,462392217.0,0.3889984216806537,0.324090804471357,1.5552085654980872
Asia,4721383370.0,301392304.0,1636933.0,0.0638355923213242,0.0054312368905079,9102086165.0,3689174081.0,3461828590.0,0.7813756672337329,0.733223362457008,2.46724225128806
Europe,744807803.0,252450811.0,2099805.0,0.3389475915573887,0.0083176797558396,1395261693.0,523273038.0,493163695.0,0.7025611652997142,0.6621355106828815,2.666412353926785
European Union,450146793.0,185619587.0,1260979.0,0.4123534586638719,0.0067933509624714,951109336.0,338071424.0,327967424.0,0.7510248418008834,0.7285788305060745,2.81333845004303
North America,600323657.0,124541232.0,1661466.0,0.2074568119177086,0.0133406902542926,1158127259.0,458563506.0,394482952.0,0.7638604620240711,0.6571171190743196,2.5255547897873933
Oceania,45038860.0,14894138.0,32334.0,0.3306952707062301,0.0021709212040334,87655293.0,28960501.0,28072902.0,0.6430114128110702,0.6233040090268714,3.026718805727843
South America,436816679.0,68826532.0,1354009.0,0.1575638827655663,0.0196727767716089,964561963.0,375459127.0,336933616.0,0.8595347775170462,0.771338715296629,2.5690198842868988


**COVID-19 Impact Across Continents**

**Overview:**
- This analysis focuses on COVID-19 trends across continents, covering the period of available data.
- The dataset includes information on population, total cases, total deaths, vaccination statistics, and percentages.

**Continental Comparison:**
- **Europe and European Union:**
  - **Total Cases and Deaths:**
    - Europe has reported 252.5 million cases with 2.1 million deaths.
    - The European Union (EU) has recorded 185.6 million cases with 1.26 million deaths.
    - Europe and the EU have high total cases and deaths, reflecting the severity of the pandemic in these regions.
  - **Vaccination Rates:**
    - Both Europe and the EU have significant vaccination coverage.
    - Europe has vaccinated 38.9% of its population, while the EU has vaccinated 75.1%.
    - The EU shows a higher percentage of fully vaccinated individuals at 72.9%, compared to Europe's 32.4%.
    - On average, 2.7 million doses are administered per week in Europe and 2.8 million in the EU.

- **North America:**
  - **Total Cases and Deaths:**
    - North America reports 124.5 million cases with 1.66 million deaths.
    - The region has relatively high total cases and deaths compared to its population.
  - **Vaccination Rates:**
    - North America boasts high vaccination rates with 76.4% of the population vaccinated.
    - The percentage of fully vaccinated individuals stands at 65.7%.
    - On average, 2.5 million doses are administered per week.

- **Asia:**
  - **Total Cases and Deaths:**
    - Asia has the highest total cases globally, at 301.4 million, with 1.64 million deaths.
    - Despite its large population, the death percentage is relatively low at 0.54%.
  - **Vaccination Rates:**
    - Asia has vaccinated 78.1% of its population.
    - The region also has a high percentage of fully vaccinated individuals at 73.3%.
    - On average, 2.5 million doses are administered per week.

- **South America:**
  - **Total Cases and Deaths:**
    - South America reports 68.8 million cases with 1.35 million deaths.
    - The region has experienced significant case numbers compared to its population.
  - **Vaccination Rates:**
    - South America has a high vaccination rate of 85.9%.
    - The percentage of fully vaccinated individuals stands at 77.1%.
    - On average, 2.6 million doses are administered per week.

- **Africa and Oceania:**
  - **Total Cases and Deaths:**
    - Africa reports 13.1 million cases with 259,095 deaths, while Oceania reports 14.9 million cases with 32,334 deaths.
    - Both continents have relatively lower total cases and deaths compared to their populations.
  - **Vaccination Rates:**
    - Africa has vaccinated 38.9% of its population, with 32.4% fully vaccinated.
    - Oceania boasts a higher percentage of vaccinated individuals at 64.3%, with 62.3% fully vaccinated.
    - On average, Oceania administers 3.0 million doses per week, compared to Africa's 1.6 million.

**Key Insights:**
- **Vaccination Progress:**
  - Europe and the EU have made significant strides in vaccination, with high percentages of their populations vaccinated.
  - North America follows closely, showing a robust vaccination effort.
  - Asia, South America, and Oceania also demonstrate commendable vaccination rates.
  - Africa has lower vaccination rates, indicating a need for increased vaccine access and distribution.

- **Regional Impact:**
  - Europe and the EU continue to battle high case numbers and deaths.
  - North America, Asia, and South America have faced substantial impacts but have managed to vaccinate a significant portion of their populations.
  - Oceania shows lower case numbers and deaths, likely due to strict containment measures and effective vaccination campaigns.

- **Continued Challenges:**
  - While vaccination efforts are commendable, disparities in access and distribution persist, especially in Africa.
  - Monitoring vaccination rates and ensuring equitable access to vaccines remain critical for global recovery.

**Conclusion:**

This analysis provides a comprehensive overview of COVID-19 trends across continents, highlighting vaccination progress, case numbers, and death rates. While some regions have made significant strides in vaccination, others face challenges in achieving widespread coverage. Continued monitoring and international collaboration are essential to overcome the pandemic's global impact and ensure equitable access to vaccines.

How about examining the COVID-19 impact across different income groups?

In [None]:
%%sql
SELECT
  location,
  MAX(population) AS population,
  MAX(total_cases) AS total_cases,
  MAX(total_deaths) AS total_deaths,
  MAX(total_cases) / MAX(population) AS covid_percentage,
  MAX(total_deaths) / MAX(total_cases) AS death_percentage,
  MAX(total_vaccinations) AS total_vaccinations,
  MAX(people_vaccinated) AS people_vaccinated,
  MAX(people_fully_vaccinated) AS people_fully_vaccinated,
  MAX(people_vaccinated) / MAX(population) AS vaccinated_percentage,
  MAX(people_fully_vaccinated) / MAX(population) AS fully_vaccinated_percentage,
  MAX(total_vaccinations) / MAX(people_vaccinated) AS avg_vaccination
FROM COVID_COMBINE_VIEW
WHERE continent IS NULL
  AND location IN ('High income', 'Upper middle income', 'Lower middle income', 'Low income')
GROUP BY location
ORDER BY location

 * sqlite:///covid_data.db
Done.


location,population,total_cases,total_deaths,covid_percentage,death_percentage,total_vaccinations,people_vaccinated,people_fully_vaccinated,vaccinated_percentage,fully_vaccinated_percentage,avg_vaccination
High income,1250514600.0,428541299.0,2983098.0,0.3426919597740002,0.0069610513781543,2839741192.0,998721519.0,929255508.0,0.7986484276153193,0.7430984876146188,2.843376394696468
Low income,737604900.0,2328082.0,48045.0,0.003156272416303,0.0206371596876742,333087284.0,241363753.0,204688408.0,0.3272263416362879,0.2775041326325245,1.3800219786937105
Lower middle income,3432097300.0,97535147.0,1341389.0,0.0284185261880541,0.0137528782316799,4948362891.0,2281337753.0,2052426043.0,0.6647066075312026,0.598009282254323,2.1690619394225226
Upper middle income,2525921300.0,245634014.0,2667162.0,0.097245315600292,0.0108582763297594,5449551989.0,2108967410.0,1990468664.0,0.8349299758468326,0.7880168966467799,2.583990612258916


**COVID-19 Impact Across Income Levels**

**Overview:**
- This analysis explores COVID-19 trends based on the income level of countries, categorized into high income, low income, lower middle income, and upper middle income.
- The dataset includes information on population, total cases, total deaths, vaccination statistics, and percentages.

**Income Level Comparison:**
- **High Income Countries:**
  - **Total Cases and Deaths:**
    - High-income countries report 428.5 million cases with 2.98 million deaths.
    - These countries have experienced a high number of cases and deaths, reflecting the impact of the pandemic in affluent regions.
  - **Vaccination Rates:**
    - High-income countries have vaccinated 79.9% of their population.
    - The percentage of fully vaccinated individuals stands at 74.3%.
    - On average, 2.8 million doses are administered per week.

- **Low Income Countries:**
  - **Total Cases and Deaths:**
    - Low-income countries report 2.33 million cases with 48,045 deaths.
    - These countries have lower total cases and deaths compared to their population.
  - **Vaccination Rates:**
    - Low-income countries have vaccinated 32.7% of their population.
    - The percentage of fully vaccinated individuals stands at 27.8%.
    - On average, 1.4 million doses are administered per week.

- **Lower Middle Income Countries:**
  - **Total Cases and Deaths:**
    - Lower middle-income countries report 97.5 million cases with 1.34 million deaths.
    - These countries have experienced a moderate number of cases and deaths compared to their population.
  - **Vaccination Rates:**
    - Lower middle-income countries have vaccinated 66.5% of their population.
    - The percentage of fully vaccinated individuals stands at 59.8%.
    - On average, 2.2 million doses are administered per week.

- **Upper Middle Income Countries:**
  - **Total Cases and Deaths:**
    - Upper middle-income countries report 245.6 million cases with 2.67 million deaths.
    - These countries have a substantial number of cases and deaths, reflecting the impact of the pandemic in this income bracket.
  - **Vaccination Rates:**
    - Upper middle-income countries have vaccinated 83.5% of their population.
    - The percentage of fully vaccinated individuals stands at 78.8%.
    - On average, 2.6 million doses are administered per week.

**Key Insights:**
- **Vaccination Progress:**
  - High and upper middle-income countries demonstrate high vaccination rates, with significant portions of their populations vaccinated.
  - Lower middle-income countries show moderate progress in vaccination coverage.
  - Low-income countries lag behind, indicating challenges in vaccine access and distribution.

- **Impact of Income Level:**
  - High and upper middle-income countries have faced higher case numbers and deaths, possibly due to larger populations and higher population density.
  - Lower middle-income countries show moderate impact, reflecting their intermediate position in terms of resources and healthcare infrastructure.
  - Low-income countries have lower case numbers and deaths, likely due to various factors such as lower population density and possibly less international travel.

- **Continued Challenges:**
  - Low-income countries face significant challenges in accessing vaccines, as seen in their lower vaccination rates.
  - Ensuring equitable access to vaccines remains crucial to global recovery efforts.

**Conclusion:**

This analysis provides insights into COVID-19 trends based on the income level of countries. High and upper middle-income countries have made significant progress in vaccination, with relatively high coverage rates. Lower middle-income countries show moderate progress, while low-income countries face challenges in vaccine access. Continued efforts to improve vaccine distribution and equitable access are essential for all income levels to mitigate the impact of the pandemic and achieve global recovery.

## COVID-19 Impact Across Countries

Let's delve into the latest COVID-19 metrics for each country, focusing on population, total cases, total deaths, total vaccinations, and derived metrics.

### COVID-19 Insights

In [None]:
%%sql
SELECT
  location,
  MAX(population) AS population,
  MAX(total_cases) AS total_cases,
  MAX(total_deaths) AS total_deaths,
  MAX(total_cases) / MAX(population) AS covid_percentage,
  MAX(total_deaths) / MAX(total_cases) AS death_percentage
FROM COVID_COMBINE_VIEW
WHERE continent IS NOT NULL
GROUP BY location
ORDER BY location

 * sqlite:///covid_data.db
Done.


location,population,total_cases,total_deaths,covid_percentage,death_percentage
Afghanistan,41128772.0,232948.0,7985.0,0.0056638695655683,0.0342780362999467
Albania,2842318.0,334863.0,3605.0,0.117813348119387,0.0107655966768499
Algeria,44903228.0,272017.0,6881.0,0.0060578495603924,0.025296213104328
American Samoa,44295.0,8359.0,34.0,0.1887120442487865,0.0040674721856681
Andorra,79843.0,48015.0,159.0,0.601367684080007,0.0033114651671352
Angola,35588996.0,107357.0,1937.0,0.003016578495218,0.0180426055124491
Anguilla,15877.0,3904.0,12.0,0.2458902815393336,0.0030737704918032
Antigua and Barbuda,93772.0,9106.0,146.0,0.0971078786844687,0.0160333845815945
Argentina,45510324.0,10130118.0,130845.0,0.2225894502530898,0.012916433944797
Armenia,2780472.0,451831.0,8777.0,0.1625015465000187,0.0194254046313776


**COVID-19 Impact Across Countries**

**Countries with Highest COVID-19 Cases**
1. **United States**: Leading with approximately 166 million cases.
2. **India**: Following closely with around 45 million cases.
3. **Brazil**: Third highest with about 38 million cases.

**Countries with Highest Death Counts**
1. **United States**: Highest death toll with around 4 million deaths.
2. **Brazil**: Second highest with approximately 702,000 deaths.
3. **India**: Third highest with about 533,000 deaths.

**Countries with Highest COVID-19 Percentages (Cases/Population)**
1. **Gibraltar**: Highest COVID-19 percentage at 77.07%, indicating a significant impact on its population.
2. **Montenegro**: Second highest at 40.07%.
3. **Andorra**: Third highest with 60.14%.

**Countries with Highest Death Percentages (Deaths/Cases)**
1. **Yemen**: Highest death percentage at 37.35%, indicating a high mortality rate among reported cases.
2. **Mexico**: Second highest at 34.45%.
3. **Syria**: Third highest with 29.16%.

**Geographic Patterns**
- **Europe**: Many European countries like France, Belgium, Italy, and Spain have relatively high COVID-19 percentages, indicating widespread infections.
- **Latin America**: Countries like Brazil, Mexico, and Peru have high death percentages, suggesting challenges in managing severe cases.
- **Asia**: Countries like India and Indonesia, with large populations, show significant case numbers but relatively lower death percentages compared to the US and Brazil.

**Conclusion**
- The dataset highlights the global impact of COVID-19, with some regions experiencing higher infection rates and mortality.
- Vaccination rates and healthcare infrastructure likely contribute to the variation in death percentages.
- Ongoing monitoring and effective public health measures remain crucial to managing the pandemic.

This analysis provides a snapshot of the COVID-19 situation globally, highlighting the varied impacts on different countries and regions.

### Vaccination Insights

In [None]:
%%sql
SELECT
  location,
  MAX(population) AS population,
  MAX(total_vaccinations) AS total_vaccinations,
  MAX(people_vaccinated) AS people_vaccinated,
  MAX(people_fully_vaccinated) AS people_fully_vaccinated,
  MAX(people_vaccinated) / MAX(population) AS vaccinated_percentage,
  MAX(people_fully_vaccinated) / MAX(population) AS fully_vaccinated_percentage,
  MAX(total_vaccinations) / MAX(people_vaccinated) AS avg_vaccination
FROM COVID_COMBINE_VIEW
WHERE continent IS NOT NULL
GROUP BY location
ORDER BY location

 * sqlite:///covid_data.db
Done.


location,population,total_vaccinations,people_vaccinated,people_fully_vaccinated,vaccinated_percentage,fully_vaccinated_percentage,avg_vaccination
Afghanistan,41128772.0,22606931.0,18896999.0,18115861.0,0.4594593536612277,0.4404668585777372,1.196323871319462
Albania,2842318.0,3088966.0,1349255.0,1279333.0,0.4747023380212911,0.4501019942173958,2.2893863650681303
Algeria,44903228.0,15267442.0,7840131.0,6481186.0,0.1746006100051426,0.1443367501329748,1.9473452675726972
American Samoa,44295.0,,,,,,
Andorra,79843.0,157072.0,57913.0,53501.0,0.7253359718447453,0.6700775271470261,2.7122062403950755
Angola,35588996.0,27722924.0,16522932.0,9591203.0,0.4642708099998101,0.269499117086641,1.677845312200038
Anguilla,15877.0,24604.0,10854.0,10380.0,0.6836304087673993,0.6537759022485357,2.2668140777593515
Antigua and Barbuda,93772.0,136512.0,64290.0,62384.0,0.6855991127415433,0.6652732158853389,2.1233784414372376
Argentina,45510324.0,116978521.0,41529058.0,34900613.0,0.912519497773736,0.7668724353621389,2.816787248099873
Armenia,2780472.0,2256919.0,1150915.0,1030758.0,0.4139279230288958,0.3707133177388587,1.960978004457323


**COVID-19 Vaccinations Across Countries**

1. **Vaccination Status by Country:**
   - Most countries have a higher percentage of people vaccinated compared to fully vaccinated, indicating ongoing vaccination efforts.
   - Countries with high vaccination percentages: Gibraltar (129.07%), Macao (97.77%), Iceland (83.07%), Hong Kong (92.40%).
   - Countries with low vaccination percentages: Burundi (0.28%), Haiti (0.05%), Guinea (0.63%), Democratic Republic of Congo (0.16%).

2. **Vaccination Rates:**
   - The overall average vaccination rate across all locations is approximately 2.25.
   - Countries with high average vaccination rates: Gibraltar (3.15), England (3.29), Belgium (3.40), Chile (3.70).
   - Countries with low average vaccination rates: Burundi (1.13), Mali (1.36), Central African Republic (1.29), Guinea-Bissau (1.23).

3. **Population vs. Vaccination:**
   - Larger populations generally have higher total vaccination numbers.
   - Some larger countries have lower vaccination percentages due to the challenges of vaccinating a large population.
   - Smaller countries like Gibraltar and Macao have achieved very high vaccination rates, likely due to their smaller populations and efficient vaccination programs.

4. **Fully Vaccinated Percentage:**
   - The average fully vaccinated percentage is 39.69%, indicating that, on average, about 40% of the population across these countries is fully vaccinated.
   - Fully vaccinated percentages range from 0% (some countries with missing data) to over 90% in Gibraltar and Macao.

5. **Average Vaccination Rate:**
   - The average vaccination rate per location is 2.25, indicating the average number of doses administered per person.
   - This metric helps understand the pace of vaccination efforts in each country.

6. **Countries with Missing Data:**
   - Some countries have missing data for vaccination metrics, such as American Samoa, Eritrea, French Guiana, Guam, Marshall Islands, Mayotte, Micronesia, Montserrat, Nauru, Northern Cyprus, Palau, Papua New Guinea, Puerto Rico, Solomon Islands, South Sudan, Syria, Timor, Tonga, Tuvalu, Vanuatu, Vatican City, Venezuela, Wallis and Futuna, Western Sahara, and Yemen.

7. **Geographical Disparities:**
   - Vaccination rates vary widely by region, likely influenced by factors such as access to vaccines, distribution infrastructure, and public health campaigns.

**Conclusion:**
- The data provides a snapshot of global COVID-19 vaccination efforts.
- While some countries have achieved high vaccination rates, others lag due to various challenges.
- Ongoing efforts to improve vaccination access and education are crucial for achieving global immunity.

This analysis provides a broad overview, and deeper insights can be gained by analyzing specific regions or comparing countries with similar population sizes or geographical locations.

## COVID-19 Trends in Vietnam

After analyzing the global, continental, and individual country numbers, let's shift our focus to the cases and deaths specifically in Vietnam. Here, we delve into key metrics including total cases, deaths, vaccinations, and percentages related to the COVID-19 situation in Vietnam.

### COVID-19 Insights

In [None]:
%%sql
SELECT
  location,
  date,
  population,
  total_cases,
  total_deaths,
  total_cases / population AS covid_percentage,
  total_deaths / total_cases AS death_percentage
FROM COVID_COMBINE_VIEW
WHERE continent IS NOT NULL
  AND location == 'Vietnam'
  AND NOT (new_cases == 0 AND new_deaths == 0)
ORDER BY location, date

 * sqlite:///covid_data.db
Done.


location,date,population,total_cases,total_deaths,covid_percentage,death_percentage
Vietnam,2020-01-26,98186856.0,2.0,,2.036932519766189e-08,
Vietnam,2020-02-02,98186856.0,6.0,,6.11079755929857e-08,
Vietnam,2020-02-09,98186856.0,13.0,,1.324006137848023e-07,
Vietnam,2020-02-16,98186856.0,16.0,,1.6295460158129518e-07,
Vietnam,2020-03-08,98186856.0,20.0,,2.0369325197661894e-07,
Vietnam,2020-03-15,98186856.0,53.0,,5.397871177380402e-07,
Vietnam,2020-03-22,98186856.0,94.0,,9.57358284290109e-07,
Vietnam,2020-03-29,98186856.0,174.0,,1.7721312921965849e-06,
Vietnam,2020-04-05,98186856.0,240.0,,2.444319023719428e-06,
Vietnam,2020-04-12,98186856.0,258.0,,2.627642950498384e-06,


**COVID-19 Trends in Vietnam**

1. **COVID-19 Cases and Deaths**
   - Vietnam experienced a gradual increase in COVID-19 cases over time. The first cases were recorded in January 2020, with only 2 cases, and peaked in March 2023 with 11,527,253 cases.
   - Similarly, the number of deaths also increased steadily. Vietnam reported its first COVID-19 death in August 2020, with subsequent deaths rising to 43,186 by March 2023.

2. **COVID-19 Percentage Trends**
   - The COVID-19 percentage relative to the population provides insights into the spread of the virus. In August 2020, Vietnam reached a peak COVID-19 percentage of 0.008%.
   - Despite fluctuations, the COVID-19 percentage generally increased over time, indicating the virus's impact on the population.

3. **Death Percentage**
   - The death percentage remained relatively low throughout the analyzed period. By March 2023, the death percentage stood at 0.004%, showing Vietnam's efforts in managing COVID-19 fatalities.

**Detailed Analysis**

**COVID-19 Cases Over Time**
- Vietnam saw an initial rise in cases from January 2020 to August 2020, with cases reaching 808,578.
- A more significant surge occurred from August 2021 to November 2021, where cases rose from 157,507 to 1,094,514.
- The most substantial increase in cases was observed from November 2021 to March 2023, reaching a peak of 11,527,253 cases.

**COVID-19 Deaths Over Time**
- Deaths followed a similar pattern, with the first reported death in August 2020.
- From August 2020 to August 2021, deaths remained relatively low, ranging from 3 to 370.
- A notable increase in deaths occurred from August 2021 to March 2023, reaching a total of 43,186 deaths.

**COVID-19 Percentage and Death Percentage Trends**
- The COVID-19 percentage, which represents the proportion of cases relative to the population, steadily increased over time.
- The death percentage remained relatively low but showed a slight increase in recent months, indicating a need for continued vigilance.

**Conclusion**

Vietnam has faced significant challenges due to the COVID-19 pandemic, with cases and deaths steadily increasing over time. The data highlights periods of surges and the country's efforts to manage the impact of the virus. Despite the challenges, Vietnam's death percentage has remained comparatively low, reflecting effective public health measures.

### Vaccination Insights

In [None]:
%%sql
SELECT
  location,
  date,
  population,
  total_vaccinations AS total_vaccinations,
  people_vaccinated AS people_vaccinated,
  people_fully_vaccinated AS people_fully_vaccinated,
  people_vaccinated / population AS vaccinated_percentage,
  people_fully_vaccinated / population AS fully_vaccinated_percentage,
  total_vaccinations / people_vaccinated AS avg_vaccination
FROM COVID_COMBINE_VIEW
WHERE continent IS NOT NULL
  AND location == 'Vietnam'
  AND NOT (new_cases == 0 AND new_deaths == 0)
ORDER BY location, date

 * sqlite:///covid_data.db
Done.


location,date,population,total_vaccinations,people_vaccinated,people_fully_vaccinated,vaccinated_percentage,fully_vaccinated_percentage,avg_vaccination
Vietnam,2020-01-26,98186856.0,,,,,,
Vietnam,2020-02-02,98186856.0,,,,,,
Vietnam,2020-02-09,98186856.0,,,,,,
Vietnam,2020-02-16,98186856.0,,,,,,
Vietnam,2020-03-08,98186856.0,,,,,,
Vietnam,2020-03-15,98186856.0,,,,,,
Vietnam,2020-03-22,98186856.0,,,,,,
Vietnam,2020-03-29,98186856.0,,,,,,
Vietnam,2020-04-05,98186856.0,,,,,,
Vietnam,2020-04-12,98186856.0,,,,,,


**COVID-19 Vaccinations in Vietnam**

**Vaccination Progress:**
1. **Total Vaccinations:**
   - The total number of COVID-19 vaccine doses administered in Vietnam steadily increased over time.
   - It started from zero and reached 265,668,329 by January 22, 2023.

2. **People Vaccinated:**
   - The count of individuals who received at least one vaccine dose also rose steadily.
   - It started from zero and reached 90,156,999 by January 22, 2023.

3. **People Fully Vaccinated:**
   - The number of individuals fully vaccinated (received all required doses) increased over time.
   - It started from zero and reached 85,848,363 by January 22, 2023.

**Vaccination Rates:**

4. **Vaccinated Percentage:**
   - The percentage of the population that received at least one dose increased over time.
   - It started from 0% and reached approximately 87.43% by January 22, 2023.

5. **Fully Vaccinated Percentage:**
   - The percentage of the population fully vaccinated also increased steadily.
   - It started from 0% and reached around 87.43% by January 22, 2023.

**Average Vaccination Rate:**

6. **Average Vaccination Rate:**
   - The average number of vaccine doses administered per day increased as the vaccination campaign progressed.
   - The average vaccination rate was calculated as the total vaccinations divided by the number of days.

**Trends:**

- **Early Stages (2020-2021):**
  - Vaccination numbers remained low in 2020 and early 2021, with very few doses administered.
  - Significant vaccination efforts began around March 2021, leading to a noticeable increase in total vaccinations.

- **Rapid Expansion (Mid-2021):**
  - The vaccination campaign rapidly expanded from May to September 2021, with a sharp rise in total and fully vaccinated individuals.
  - Vaccination rates steadily increased during this period.

- **Steady Progress (Late 2021 - Early 2023):**
  - From late 2021 to early 2023, Vietnam maintained a steady increase in vaccination rates and percentages.
  - The campaign focused on achieving high rates of full vaccination.

**Observations:**

- Vietnam experienced a substantial increase in vaccination rates and percentages starting from mid-2021.
- The data suggests a successful vaccination campaign, with a large proportion of the population receiving both doses.
- By January 22, 2023, approximately 87.43% of the population had been fully vaccinated, indicating significant progress in combating COVID-19.

**Limitations:**

- The dataset does not provide information on vaccine types, specific demographics, or vaccine distribution regions.
- The average vaccination rate is a simplistic calculation and does not account for fluctuations in daily vaccination rates.

This analysis provides an overview of Vietnam's COVID-19 vaccination progress based on the provided dataset. For a more detailed analysis or specific insights, additional data and parameters would be necessary.

# Conclusion

In conclusion, this project has provided a comprehensive analysis of COVID-19 data using SQL querying techniques. Here are the key takeaways:

- **Global Overview:** We explored the global impact of COVID-19, highlighting total cases, deaths, vaccination efforts, and related percentages. The data revealed significant progress in vaccination campaigns, impacting COVID-19 percentages and trends over time.

- **Continental and Income-Level Analysis:** Our analysis across continents and income levels showcased disparities in COVID-19 impact and vaccination rates. Higher income countries generally exhibited higher vaccination rates, while lower income countries faced challenges in controlling the spread.

- **Country-Specific Insights:** Delving into individual countries' data provided a nuanced view of the pandemic. We observed varied trends in total cases, deaths, and vaccination rates, reflecting regional differences in healthcare infrastructure and response strategies.

- **Vietnam's COVID-19 Trends:** Focusing on Vietnam, we analyzed its total cases, deaths, vaccinations, and trends over time. The data revealed a notable increase in cases in 2022, corresponding with an acceleration in vaccination efforts. The decreasing death percentage suggested positive impacts from these efforts.

## Key Learnings

- **Data Analysis Skills:** Utilizing SQL and data visualization tools, we gained insights into complex datasets.
- **Regional Disparities:** Highlighted the importance of equitable access to vaccines and healthcare resources across different regions and income levels.
- **Impact of Vaccination:** Demonstrated the effectiveness of vaccination campaigns in reducing COVID-19 percentages and mortality rates.



## Future Considerations


- **Continued Monitoring:** Ongoing monitoring of COVID-19 data is crucial for adapting strategies and interventions.
- **Equitable Distribution:** Ensuring equitable distribution of vaccines to all regions and income levels remains a critical challenge.
- **Healthcare Infrastructure:** Investing in healthcare infrastructure and response capabilities is essential for future pandemic preparedness.

This project underscores the power of data analysis in understanding and combating global health crises. By examining COVID-19 data through multiple lenses, we gain valuable insights that can inform policy decisions, healthcare strategies, and international collaborations in the ongoing fight against the pandemic.