Skip to content

Latest commit

 

History

History
88 lines (63 loc) · 6.63 KB

README.md

File metadata and controls

88 lines (63 loc) · 6.63 KB

FDPtoRDF

URL: https://github.com/openbudgets/pipeline-fragments/raw/master/FDPtoRDF/FDPtoRDF.jsonld

The FDPtoRDF.jsonld file is a LinkedPipes pipeline fragment. It takes a Fiscal Data Package .jsonld descriptor as input, transforms the FDP into RDF OBEU representation and stores the result into a predefined triple-store.

Prerequisities

The pipeline uses LinkedPipes component FDPtoRDF, which is not part of LinkedPipes by default. You can deploy it into existing LinkedPipes installation by downloading it and copying into <Linked-pipes-directory>/deploy/jars/. Restart LinkedPipes to load the component. The component needs LP-ETL version of commit 81f082d or later.

Installation

  1. Import the pipeline into LinkedPipes using the button in the bottom-right corner of the LinkedPipes UI. You can reference to pipeline's GitHub URL directly.
  2. Open the pipeline in LinkedPipes and configure the Graph Store Protocol component (in the bottom-right corner of the pipeline, the node is highlighted by red). Follow the component's configuration documentation if needed and change the following to appropriate values of the triple-store where you want to store the results:
    • Repository type (Fuseki/Virtuoso/etc.)
    • Graph Store protocol endpoint
    • User name and password
  3. Configure the Files to SCP component (also highlighted by red, just above the Graph Store Protocol component). Follow the component's configuration documentation if needed. The component can be disabled if not needed: by selecting the component node and clicking on the upper right button above it. This applies to the Graph Store Protocol as well.
  4. Configure the Add result download url component. The url defined inside the BIND expression of the SPARQL query needs to correspond to the hostname and path where the abovementioned Files to SCP component stores data dumps.
  5. Don't forget to click the "apply changes" and then "save" the whole pipeline (bottom-left button in the pipeline editor).

Usage

The pipeline is executed through a HTTP POST request to its URL. The POST has to include either the FDP .jsonld descriptor or a .jsonld file with a URL of the .jsonld descriptor. The resulting RDF is stored into a triple-store as configured in the pipeline. The URL for pipeline execution looks like this:

http://[your LinkedPipes server]/resources/executions?pipeline=http://[your LinkedPipes server]/resources/pipelines/created-[pipeline timestamp]
  • [your LinkedPipes server] is the hostname and port where LinkedPipes are installed.
  • [pipeline timestamp] can be found at the end of the pipeline URL when opened in the LinkedPipes editor.

For example, a pipeline which can be edited at the URL

http://obeu.vse.cz:9080/#/pipelines/edit/canvas?pipeline=http:%2F%2Fobeu.vse.cz:9080%2Fresources%2Fpipelines%2Fcreated-1466690747879

can be executed through the following command using CURL:

    curl -i -X POST -H "Content-Type: multipart/form-data" -F "input=@datapackage.jsonld" http://obeu.vse.cz:9080/resources/executions?pipeline=http://obeu.vse.cz:9080/resources/pipelines/created-1466690747879

The details about the execution can then be seen in the Executions view of LinkedPipes UI.

Input

The pipeline uses a .jsonld FDP descriptor as the main input. It can be either POSTed to it directly, or in the form of a .jsonld file containing its URL.

POSTing FDP descriptor directly: The first option of input of the pipeline is a .jsonld FDP descriptor file sent in the POST request (datapackage.jsonld in the CURL example above). The file must

  • be named 'datapackage.jsonld'
  • correspond to the FDP descriptor specification and in addition to it:
  • have the .jsonld extension
  • all references to CSV data files (resources' paths) in it must be dereferencable URLs
  • contain the following property in the root object:
    "@context": “http://schemas.frictionlessdata.io/fiscal-data-package.jsonld”,

See examples folder for example .jsonld descriptors.

POSTing package-url.jsonld with a URL of the FDP descriptor: The second option is to POST a file named 'package-url.jsonld' having the URL of the FDP descriptor as a value of the schema:url property. See the example below:

{
  "@context": {
    "@vocab": "http://schema.org/"
  },
  "url": "http://protegeserver.cz/files/fdp-example-normalized/datapackage.jsonld"
}

The FDP descriptor file must fulfill the abovementioned requirements, except that

See a sample package-url.jsonld for further details.

Output

To a triplestore: The pipeline stores the resulting RDF into a triple-store as configured in the Graph Store Protocol component (see Installation). By default, a new graph is created for the output and named according to the Data Package name, the IRI looks as follows:

http://data.openbudgets.eu/resource/graph/[DataPackage name]

If the graph already exists, it is overwritten.

To a filestore: The pipeline can also store the resulting RDF in a file named according to the FDP package name. The file is sent through SCP protocol as configured in the Files to SCP component (see Installation).

For debug purposes, the output can also be downloaded through the Detail view of the Pipeline Execution UI in LinkedPipes, e.g. by displaying the "FDP to RDF" node output file.

Further documentation

See the developer's documentation for details about the FDPtoRDF transformation process. Note that the latest changes in technical details of pipeline input and output are not reflected there, these are described in this file.