Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign up| <!-- | |
| %\VignetteEngine{knitr::knitr} | |
| %\VignetteIndexEntry{rsdmx quickstart guide} | |
| --> | |
| # rsdmx quickstart guide | |
| The goal of this document is to get you up and running with rsdmx as quickly as possible. | |
| ``rsdmx`` provides a set of classes and methods to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework. | |
| ## SDMX - a short introduction | |
| The SDMX framework provides two sets of standard specifications to facilitate the exchange of statistical data: | |
| * standard formats | |
| * web-service specifications | |
| SDMX allows to disseminate both **data** (a dataset) and **metadata** (the description of the dataset). | |
| For this, the SDMX standard provides various types of _documents_, also known as _messages_. Hence there will be: | |
| * **data** SDMX-ML _documents_. The two main _document_ types are the ``Generic`` and ``Compact`` ones. The latter aims to provide a more compact XML document. They are other data _document_ types derivating from the ones previously mentioned. | |
| * **metadata** SDMX-ML _documents_. The main metadata _document_ is known a ``Data Structure Definition`` (DSD). As its name indicates, it _describes_ the structure and organization of a dataset, and will generally include all the master/reference data used to characterize a dataset. The 2 main types of metadata are (1) the ``concepts``, which correspond to the _dimensions_ and/or _attributes_ of the dataset, and (2) the ``codelists`` which inventory the possible values to be used in the representation of _dimensions_ and _attributes_. | |
| For more information about the SDMX standards, you can visit the [SDMX website](http://sdmx.org/), or this [introduction by EUROSTAT](https://ec.europa.eu/eurostat/web/sdmx-infospace/trainings-tutorials/tutorials). | |
| ## How to deal with SDMX in R | |
| [rsdmx](https://cran.r-project.org/package=rsdmx) offers a low-level set of tools to read **data** and **metadata** in the SDMX-ML format. Its strategy is to make it very easy for the user. For this, a unique function named ``readSDMX`` has to be used, whatever it is a ``data`` or ``metadata`` document, or if it is ``local`` or ``remote`` datasource. | |
| What ``rsdmx`` does support: | |
| * a SDMX format abstraction library, with focus on the the main SDMX standard XML format (SDMX-ML), and the support of the three format standard versions (``1.0``, ``2.0``, ``2.1``) | |
| * an interface to SDMX web-services for a list of well-known data providers, such as OECD, EUROSTAT, ECB, UN FAO, UN ILO, etc (a list that should grow in a near future!). See it [in action](https://github.com/opensdmx/rsdmx/blob/master/vignettes/quickstart.Rmd#using-the-helper-approach)! | |
| Let's see then how to use ``rsdmx``! | |
| ## Install rsdmx | |
| ``rsdmx`` can be installed from CRAN or from its development repository hosted in Github. For the latter, you will need the ``devtools`` package and run: | |
| ```{r, eval=FALSE, results="hide"} | |
| devtools::install_github("opensdmx/rsdmx") | |
| ``` | |
| ## Load rsdmx | |
| To load rsdmx in R, do the following: | |
| ```{r,eval = FALSE,results="hide"} | |
| library(rsdmx) | |
| ``` | |
| ## Read dataset documents | |
| This section will introduce you on how to read SDMX *dataset* documents, either from _remote_ datasources, or from _local_ SDMX files. | |
| ### Read _remote_ datasets | |
| #### using the _raw_ approach (specifying the complete request URL) | |
| The following code snipet shows you how to read a dataset from a remote data source, taking as example the [OECD StatExtracts portal](http://stats.oecd.org): [http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startTime=2000&endTime=2011](http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startTime=2000&endTime=2011) | |
| ```{r,eval = FALSE,results="hide"} | |
| myUrl <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startTime=2000&endTime=2011" | |
| dataset <- readSDMX(myUrl) | |
| stats <- as.data.frame(dataset) | |
| ``` | |
| You can try it out with other datasources, such as from the [**EUROSTAT portal**](http://ec.europa.eu/eurostat/web/sdmx-web-services/rest-sdmx-2.1): [http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/cdh_e_fos/..PC.FOS1.BE/?startperiod=2005&endPeriod=2011](http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/cdh_e_fos/..PC.FOS1.BE/?startperiod=2005&endPeriod=2011) | |
| The online rsdmx documentation also provides a list of data providers, either from international or national institutions, and [more request examples](https://github.com/opensdmx/rsdmx/wiki#read-remote-datasets). | |
| #### using the _helper_ approach | |
| Now, the service providers above mentioned are known by ``rsdmx`` which let users using ``readSDMX`` with the helper parameters. The list of service providers can be retrieved doing: | |
| ```{r,eval = FALSE,results="hide"} | |
| providers <- getSDMXServiceProviders(); | |
| as.data.frame(providers) | |
| ``` | |
| Note it is also possible to add an SDMX service provider at runtime. For registering a new SDMX service provider by default, please contact me! | |
| Let's see how it would look like for querying an ``OECD`` datasource: | |
| ```{r, eval = FALSE,message = FALSE,results="hide"} | |
| sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG", | |
| key = list("TOT", NULL, NULL), start = 2010, end = 2011) | |
| df <- as.data.frame(sdmx) | |
| head(df) | |
| ``` | |
| It is also possible to query a dataset together with its "definition", handled | |
| in a separate SDMX-ML document named ``DataStructureDefinition`` (DSD). It is | |
| particularly useful when you want to enrich your dataset with all labels. For this, | |
| you need the DSD which contains all reference data. | |
| To do so, you only need to append ``dsd = TRUE`` (default value is ``FALSE``), | |
| to the previous request, and specify ``labels = TRUE`` when calling ``as.data.frame``, | |
| as follows: | |
| ```{r, eval = FALSE,message = FALSE,results="hide"} | |
| sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG", | |
| key = list("TOT", NULL, NULL), start = 2010, end = 2011, | |
| dsd = TRUE) | |
| df <- as.data.frame(sdmx, labels = TRUE) | |
| head(df) | |
| ``` | |
| For embedded service providers that require a user authentication/subscription key or token, | |
| it is possible to specify it in ``readSDMX`` with the ``providerKey`` argument. If provided, | |
| and that the embedded provider requires a specific key parameter, the latter will be appended | |
| to the SDMX web-request. For example, it's the case for the new [UNESCO SDMX API](https://apiportal.uis.unesco.org/getting-started). | |
| Note that in case you are reading SDMX-ML documents with the native approach (with | |
| URLs), instead of the embedded providers, it is also possible to associate a DSD | |
| to a dataset by using the function ``setDSD``. Let's try how it works: | |
| ```{r, eval = FALSE,message = FALSE,results="hide"} | |
| #data without DSD | |
| sdmx.data <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG", | |
| key = list("TOT", NULL, NULL), start = 2010, end = 2011) | |
| #DSD | |
| sdmx.dsd <- readSDMX(providerId = "OECD", resource = "datastructure", resourceId = "MIG") | |
| #associate data and dsd | |
| sdmx.data <- setDSD(sdmx.data, sdmx.dsd) | |
| ``` | |
| ### Read _local_ datasets | |
| This example shows you how to use ``rsdmx`` with _local_ SDMX files, previously downloaded from [EUROSTAT](http://ec.europa.eu/eurostat). | |
| ```{r, eval = FALSE} | |
| #bulk download from Eurostat | |
| tf <- tempfile(tmpdir = tdir <- tempdir()) #temp file and folder | |
| download.file("http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Frd_e_gerdsc.sdmx.zip", tf) | |
| sdmx_files <- unzip(tf, exdir = tdir) | |
| #read local SDMX (set isURL = FALSE) | |
| sdmx <- readSDMX(sdmx_files[2], isURL = FALSE) | |
| stats <- as.data.frame(sdmx) | |
| ``` | |
| By default, ``readSDMX`` considers the data source is remote. To read a local file, add ``isURL = FALSE``. | |
| ## Read metadata documents | |
| This section will introduce you on how to read SDMX **metadata** documents, including ``concepts``, ``codelists`` and a complete ``data structure definition`` (DSD) | |
| ### Codelists | |
| Read codelists from [FAO data portal](http://data.fao.org/sdmx/index.html) | |
| ```{r,eval = FALSE,results="hide"} | |
| clUrl <- "http://data.fao.org/sdmx/registry/codelist/FAO/CL_FAO_MAJOR_AREA/0.1" | |
| clobj <- readSDMX(clUrl) | |
| cldf <- as.data.frame(clobj) | |
| ``` | |
| ### Data Structure Definition (DSD) | |
| This example illustrates how to read a complete DSD using a [OECD StatExtracts portal](http://stats.oecd.org) data source. | |
| ```{r,eval = FALSE,results="hide"} | |
| dsdUrl <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/TABLE1" | |
| dsd <- readSDMX(dsdUrl) | |
| ``` | |
| ``rsdmx`` is implemented in object-oriented way with ``S4`` classes and methods. The properties of ``S4`` objects are named ``slots`` and can be accessed with the ``slot`` method. The following code snippet allows to extract the list of ``codelists`` contained in the DSD document, and read one codelist as ``data.frame``. | |
| ```{r,eval = FALSE,results="hide"} | |
| #get codelists from DSD | |
| cls <- slot(dsd, "codelists") | |
| #get list of codelists | |
| codelists <- sapply(slot(cls, "codelists"), function(x) slot(x, "id")) | |
| #get a codelist | |
| codelist <- as.data.frame(slot(dsd, "codelists"), codelistId = "CL_TABLE1_FLOWS") | |
| ``` | |
| In a similar way, the ``concepts`` of the dataset can be extracted from the DSD and read as ``data.frame``. | |
| ```{r,eval = FALSE,results="hide"} | |
| #get concepts from DSD | |
| concepts <- as.data.frame(slot(dsd, "concepts")) | |
| ``` | |
| ## Save & Reload SDMX R objects | |
| It is possible to save SDMX R objects as RData file (.RData, .rda, .rds), to then | |
| be able to reload them into the R session. It could be of added value for users that | |
| want to keep their SDMX objects in R data files, but also for fast loading of large SDMX | |
| objects (e.g. DSD objects) for use in statistical analyses and R-based web-applications. | |
| To save a SDMX R object to RData file: | |
| ```{r, eval = FALSE, results="hide"} | |
| saveSDMX(sdmx, "tmp.RData") | |
| ``` | |
| To reload a SDMX R object from RData file: | |
| ```{r, eval = FALSE, results="hide"} | |
| sdmx <- readSDMX("tmp.RData", isRData = TRUE) | |
| ``` |