This project aims to explore and analyse several Eurostat health datasets. It is the coding part of a final thesis on European Healthcare for the Data Science Master Degree at Universitat Oberta de Catalunya.
Data gets downloaded from Eurostat and is labeled and grouped by merging multiple sources, including NUTS geographic information, various versions of ICD-10 diagnostics and ISCO-08 professional groups classification.
Eurostat datasets are aggregated with a minimum granularity of region for location and year for time. Therefore, each data point represents the Average Length Of Stay (ALOS) for a specific sex, year and region in a certain country.
Data granularity is the same as in the previous plot.
A more detailed overview can be found in this notebook, which is an extended version of per_country.Rmd
prepare_metadata.py
Downloads, process and stores metadata.download_data.py
Downloads datasets from Eurostat.transform_eurostat_data.py
Defines function to load and preprocess Eurostat health datasets.preprocess.R
Preprocess each individual dataset assigning specific factor levels.exploration.R
Datasets exploration.exploration.Rmd
Same exploration asexploration.R
, but as R markdown notebook.explore_icd10.py
Explore differences between various sources of ICD-10 codes.export_subdata.R
Exports different tables to disk as Rdata files, one file per table per country.per_country.Rmd
Explores data per country and models linear regressions for each one.latex_tables.R
Outputs coefficient tables in Latex.
These files have been manually collected from several sources, so they are included in the repository.
data/datasets_metadata.json
Includes short name, description and file url for each dataset. Sourcedata/health_professionals_metadata.json
Includes ID, name and description for each professional category. Sources: Explanatory texts Data browser 1 Data browser 2data/tags.json
Eurostat standard flags. These appear sometimes next to numerical values to indicate additional metadata about the observation. Sourcedata/COD_2012_edited.csv
Manually edited version of the 2012 Eurostat shortlist for ICD-10 (International Code for Diseases and Health Problems). The datasets use the 2007 version, but only the 2012 file includes levels to aggregate codes.