# Data Visualisation of COVID-19 in the UK using Sqlite, Pandas and Seaborn Libraries 
## Part 1: Creating the Database

This is the first notebook in a 3-part series which explores the UK Gov's COVID-19 dashboard data which is publically available for download via a REST API from gov.uk.  We make use of a Sqlite3 database to query this data using SQL and import aggregations into Pandas data frames.  We then use the Seaborn library to visualise the results.

In this notebook (Part 1) - we create a set of empty tables required for storing the data.  Part 2 then looks at loading the data using the GOV.UK REST API.  Finally Part 3 will then query and visualise the data.

The data used in this notebook is publically available and more information can be found here:
https://coronavirus.data.gov.uk/details/about-data

### Configuration and Setup

Not much to do here apart from import the sqlite3 library and create a connection to a database file.  It doesn't matter if this database doesn't yet exist.  Also import OS and Garbage collection library for cleanup later.

In [5]:
import sqlite3

sqlite_db_path = "c19.db"
conn = sqlite3.connect(sqlite_db_path)

### c19dashboard_uk__national_daily_metrics

This is used to store data downloaded where areaType = Nation.  For more details see: https://coronavirus.data.gov.uk/details/developers-guide

In [6]:
# replace any existing table using DROP TABLE
sql = """
DROP TABLE IF EXISTS c19dashboard_uk__national_daily_metrics;

CREATE TABLE c19dashboard_uk__national_daily_metrics (
    area_type                               TEXT        NOT NULL,
    area_name                               TEXT        NOT NULL,
    area_code                               TEXT        NOT NULL,
    date                                    DATE        NOT NULL,
    new_cases_by_publish_date               NUMERIC     NULL,
    cum_cases_by_publish_date               NUMERIC     NULL,
    cum_cases_by_publish_date_rate          NUMERIC     NULL,
    new_cases_by_specimen_date              NUMERIC     NULL,
    cum_cases_by_specimen_date              NUMERIC     NULL,
    cum_cases_by_specimen_date_rate         NUMERIC     NULL,
    male_cases                              TEXT        NULL,
    female_cases                            TEXT        NULL,    
    new_pillar_one_tests_by_publish_date    NUMERIC     NULL,
    cum_pillar_one_tests_by_publish_date    NUMERIC     NULL,
    new_pillar_two_tests_by_publish_date    NUMERIC     NULL,
    cum_pillar_two_tests_by_publish_date    NUMERIC     NULL,
    new_pillar_three_tests_by_publish_date  NUMERIC     NULL,
    cum_pillar_three_tests_by_publish_date  NUMERIC     NULL,
    new_admissions                          NUMERIC     NULL,
    cum_admissions                          NUMERIC     NULL,
    cum_admissions_by_age                   TEXT        NULL,
    cum_tests_by_publish_date               NUMERIC     NULL,
    new_tests_by_publish_date               NUMERIC     NULL,
    covid_occupied_mv_beds                  NUMERIC     NULL,
    hospital_cases                          NUMERIC     NULL,
    new_deaths_28_days_by_publish_date      NUMERIC     NULL,
    cum_deaths_28_days_by_publish_date      NUMERIC     NULL,
    cum_deaths_28_days_by_publish_date_rate NUMERIC     NULL,
    new_deaths_28_days_by_death_date        NUMERIC     NULL,
    cum_deaths_28_days_by_death_date        NUMERIC     NULL,
    cum_deaths_28_days_by_death_date_rate   NUMERIC     NULL
);
"""
_ = conn.executescript(sql)

### c19dashboard_uk__summary_daily_metrics

This is used to store data downloaded where areaType = Overall.  For more details see: https://coronavirus.data.gov.uk/details/developers-guide

In [7]:
# replace any existing table using DROP TABLE
sql = """
DROP TABLE IF EXISTS c19dashboard_uk__summary_daily_metrics;

CREATE TABLE IF NOT EXISTS c19dashboard_uk__summary_daily_metrics (
    area_type                               TEXT              NOT NULL, 
    area_name                               TEXT              NOT NULL, 
    area_code                               TEXT              NOT NULL, 
    date                                    DATE              NOT NULL,
    new_cases_by_publish_date               NUMERIC           NULL, 
    cum_cases_by_publish_date               NUMERIC           NULL, 
    cum_cases_by_publish_date_rate          NUMERIC           NULL,
    new_cases_by_specimen_date              NUMERIC           NULL, 
    cum_cases_by_specimen_date              NUMERIC           NULL, 
    cum_cases_by_specimen_date_rate         NUMERIC           NULL,
    new_pillar_one_tests_by_publish_date    NUMERIC           NULL, 
    cum_pillar_one_tests_by_publish_date    NUMERIC           NULL, 
    new_pillar_two_tests_by_publish_date    NUMERIC           NULL, 
    cum_pillar_two_tests_by_publish_date    NUMERIC           NULL, 
    new_pillar_three_tests_by_publish_date  NUMERIC           NULL, 
    cum_pillar_three_tests_by_publish_date  NUMERIC           NULL, 
    new_pillar_four_tests_by_publish_date   NUMERIC           NULL, 
    cum_pillar_four_tests_by_publish_date   NUMERIC           NULL,     
    new_admissions                          NUMERIC           NULL, 
    cum_admissions                          NUMERIC           NULL, 
    cum_tests_by_publish_date               NUMERIC           NULL, 
    new_tests_by_publish_date               NUMERIC           NULL, 
    covid_occupied_mv_beds                  NUMERIC           NULL, 
    hospital_cases                          NUMERIC           NULL, 
    planned_capacity_by_publish_date        NUMERIC           NULL,    
    new_deaths_28_days_by_publish_date      NUMERIC           NULL, 
    cum_deaths_28_days_by_publish_date      NUMERIC           NULL, 
    cum_deaths_28_days_by_publish_date_rate NUMERIC           NULL, 
    new_deaths_28_days_by_death_date        NUMERIC           NULL, 
    cum_deaths_28_days_by_death_date        NUMERIC           NULL, 
    cum_deaths_28_days_by_death_date_rate   NUMERIC           NULL
);
"""
_ = conn.executescript(sql)

### c19dashboard_uk__summary_daily_metrics

This is used to store data downloaded where areaType = Overall.  For more details see: https://coronavirus.data.gov.uk/details/developers-guide

### reference_geography__age_gender_populations

This provides populations at different gender and age groups for various different area types - such as UTLA and UTLA as well as National and Regional level populations.

This data is available to download from the GOV.UK website here: https://coronavirus.data.gov.uk/details/download

More information can be found on the ONS website: https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatesforukenglandandwalesscotlandandnorthernireland

In [8]:
# replace any existing table using DROP TABLE
sql = """
DROP TABLE IF EXISTS reference_geography__age_gender_populations;

CREATE TABLE IF NOT EXISTS reference_geography__age_gender_populations (
    category   TEXT     NOT NULL, 
    area_code  TEXT     NOT NULL, 
    gender     TEXT     NOT NULL, 
    age        TEXT     NOT NULL, 
    population NUMERIC  NOT NULL
);
"""
_ = conn.executescript(sql)

### Cleanup

Ensure all changes are committed and then close the Sqlite connection.  Also force garbage collection - at this point, there should be no locks on the database file so it could be zipped up and deleted.

In [9]:
import gc

# Commit and close Sqlite connection
conn.commit()
conn.close()

# Force garbage collection
_ = gc.collect(2)

### Optional: Compress Sqlite database file 

To help keep file sizes small and to allow the database to be easily stored in a Git repo, you may want to compress the database file and then remove the original.

In [None]:
import os
import gzip
import shutil

with open(sqlite_db_path, 'rb') as f_in:
    with gzip.open(sqlite_db_path + ".gz", 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)
        
os.remove(sqlite_db_path)