Python and R for official statistics: self-contained services to access and handle Eurostat data
PRost is part of the Methodological Network initiative on user interfaces to Eurostat online database.
|status||since 2018 – on-going|
Run your own script into a notebook, like in the examples below:
- a quick and dirty notebook reproducing the tutorial for
- an empty R test notebook with the
TSsdmxpackages to retrieve data from Eurostat database
- a notebook to test the
This contribution advocates for widening the use of Open Source Software (OSS) , "beyond just
- support new modes for production of official statistics,
- create new ways to share official statistics
in a constantly evolving data ecosystem,
R is currently the leading OSS within the statistical community, and the most widespread in statistical organisations, it is believed that one should not focus on isolated OSS, instead it should be possible to implement statistical methods in whatever OSS that fit best and integrate them seamlessly into the statistical production system.
Today's technological solutions, e.g. flexible APIs (e.g., Eurostat REST API), interactive notebooks (e.g.,
Jupyter notebook) and virtualised containers (e.g.,
docker), can support an approach where algorithms are delivered as – portable, scalable, harmonised and encapsulated – services regardless of the software used.
The notebooks are running on the binder platform, which automatically turns the
Dockerfile in this repository into an interactive notebook. Current
Dockerfile is an extension of the
Jupyter Data Science Stack.
- EU open data initiatives: pan-European public data infrastructure.
- Eurostat database: online catalog and bulk download facility.
- Eurostat web-services: access to JSON and unicode data, the REST API with its query builder.
- Package eurostat
Rto access open data from Eurostat.
Jupyternotebook docker stack, in particular the R stack and the Data Science stack. Note also list of existing images, get started and how-to.
- Binder environment to run
Jupyternotebooks. See the how-to.
- A cool notebook showing how to represent Eurostat NUTS data over a map using Python package eurostat-api-client.
- Boettiger C. and Eddelbuettel D. (2018): An introduction to Rocker: Docker containers for R, The R Journal, 9(2):527-536.
- Grazzini J., Museux J.-M. and Hahn M. (2018): Empowering and interacting with statistical produsers: A practical example with Eurostat data as a service, in Proc. Conference of European Statistics Stakeholders, doi:10.5281/zenodo.3240557.
- Beaulieu-Jones B.K. and Greene C.S. (2017): Reproducibility of computational workflows is automated using continuous analysis, Nature Biotechnology, 35:342–346, doi:10.1038/nbt.3780.
- Lahti L., Huovari J., Kainu M., and Biecek, P. (2017): Retrieval and analysis of Eurostat open data with the eurostat package, The R Journal, 9(1):385-392.
- Marwick B., Boettiger C., and Mullen L. (2017): Packaging data analytical work reproducibly using R (and friends), The American Statistician, doi:10.1080/00031305.2017.1375986.
- Piccolo S.R. and Frampton M.B. (2016): Tools and techniques for computational reproducibility, Gigascience, 5(1):30, doi:10.1186/s13742-016-0135-4.
- Boettiger C. (2015): An introduction to Docker for reproducible research, ACM SIGOPS Operating Systems Review, Special Issue on Repeatability and Sharing of Experimental Artifacts, 49(1):71-79, doi:10.1145/2723872.2723882.
- How to Dockerize an
- Generating Dockerfiles for reproducible research with
- Dockerfile basics and best practices.