Skip to content

Building a dataset for analysing measures taken during COVID-19 pandemic.

Notifications You must be signed in to change notification settings

marzekan/covid-19-measures-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis of preventive measures (COVID-19)

Disclaimer: For informative purposes only. I am not a professional nor do I claim to be one. This is a school project that I extended upon in my spare time. I hold no responsibility for data accuracy or data gathering. Full credit goes to good people that made these datasets, links bellow.

That said, dataset created is free for you to use.


Dataset can be found in data folder

Project goal

Goal of this project was to create a dataset containing facts about countries and preventive measures taken by those countries during COVID-19 pandemic, so far. For these purposes, I combined data from multiple sources to create a simple data warehouse that stores data on new measures implemented for each day in the period from 01-01-2020 to 02-06-2020, for every country. With this I hope to provide a single dataset for analysing the effectiveness of preventive measures worldwide.

Data sources

Data for this project was combined from 4 sources:

  • ACAPS COVID-19: Government Measures Dataset by The Humanitarian Data Exchange - puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic.

  • COVID-19 Coronavirsu data by EU Open Data Portal - data on the geographic distribution of COVID-19 cases worldwide.

  • Oxford COVID-19 Government Response Tracker (OxCGRT) by Hale et. al. - systematically collects information on several different common policy responses that governments have taken to respond to the pandemic on 17 indicators such as school closures and travel restrictions.

  • The World by Income and Region by The World Bank - The World Bank classifies economies for analytical purposes into four income groups: low, lower-middle, upper-middle, and high income.


ETL

Image below shows ETL process. The transforming part was done with Python Pandas, code can be found in ETL.ipynb document in this repo. Main part of ETL was to clean the data and to split it into meaningful tables.

etlmodel

Data model

Data is modeled in the form of star schema consisting of 4 dimension tables and 1 fact table. Dimension tables contain: country, date and measures data. There is also one junk dimension (additional_dim) that contains additional data for each day. (Can be used in the future to get more insights)

datamodel

Data descriptions

Country dimension (country_dim)

Column name Description
country_name Name of the country.
country_code 3 letter code unique to every country.
geo_id Similar to country code.
continent Continent that country resides on. Similar (or same as) to region.
region Region of the world in which the country resides.
population_data_2018 Population number (data from 2018.).
income_group_2020 Income group of the country, as categorised by The World Bank (for the year 2020.).

Measure dimension (measures_dim)

Column name Description
measure Name of the measure. Categorical value, there is 34 of them.
measure_category Each measure is categorised in one of six categories.

Fact table (measures_by_day_fact)

Column name Description
additional_id Reference to additional_dim table.
date_id Reference to date_dim table.
measure_id Reference to measure_dim table.
country_id Reference to country_dim table.
cases New cases for each day for each county.
deaths Number of death for each day for each country.
stringency_index Index developed by OxCGRT, shows how strict government response measures are. For each day for each country.
government_response_index Index developed by OxCGRT, shows government responsivnes for each day for each country.
containement_health_index Another index by OxCGRT, more info on link
economic_support_index Another index by OxCGRT, more info on link

About

Building a dataset for analysing measures taken during COVID-19 pandemic.

Resources

Stars

Watchers

Forks