ObdalibRdb2rdf

Piotrek Mierzejewski edited this page Sep 24, 2017 · 12 revisions
Clone this wiki locally

valid for Ontop Version 1

R2RML - The RDB2RDF Mapping Language

Table of Contents

Description

R2RML is a W3C recommended RDB-to-RDF mapping language that generates RDF triples from a relational database based on specific mappings. The mappings are specified in Turtle syntax as we will illustrate below. The R2RML mapping is an RDF graph consisting of several rr:TriplesMaps, that specify how to map a logical table in the input relational database into RDF. The logical table can correspond to a table, a view existing in the database, or the result of an SQL query to be executed over the input relational database. Each rr:TriplesMap consists of one rr:SubjectMap and possibly multiple rr:PredicateObjectMaps. Each row in the logical table produces a single subject in the target RDF, which is specied by the rr:SubjectMap.

The multiple rr:PredicateObjectMaps each specify how to generate predicates or objects (by means of rr:PredicateMaps and rr:ObjectMaps, respectively), that are related to the generated subject. Furthermore, each rr:SubjectMap, rr:PredicateMap and rr:ObjectMap may specify how the RDF term is generated by means of different predicates. For example, using the R2RML rr:column predicate for the mapping rule indicates that the RDF objects should be generated based on the values of the column in the input database. Further it is possible by using the R2RML rr:template predicate allows to specify how terms can be generated by a template based on values from the logical table.

Example

The R2RML mapping

<TriplesMap1>
    a rr:TriplesMapClass;

    rr:logicalTable [rr:SQLQuery """Select "id","name", from PLAYER"""];

    rr:subjectMap [ rr:template "http://example.com/soccer/player/{id}"];

    rr:predicateObjectMap
    [ 
      rr:predicate     ex:name; 
      rr:objectMap    [ rr:column "name"]
    ].

Content of the logical table used in this mapping:

   id   |     name
--------------------------------------
   001  |    Lionel Messi
   002  |    Christiano Ronaldo

By applying the mapping we produce the following triples:

<http://example.com/soccer/player/001>    ex:name     "Lionel Messi"
<http://example.com/soccer/player/002>    ex:name     "Christiano Ronaldo"   

R2RML features

R2RML has a well defined RDF vocabulary that can be used for specifying an R2RML mapping for creating an RDF dataset. The whole vocabulary can be found here. Some features encoded it the R2RML vocabulary are not yet available in the Quest Mapping Language, but will be added soon. The list of unsupported vocabulary can be found at the bottom of the page.

A term can have the following properties set:

  • rr:termtype: a string indicating whether subject or object generated using the value from column name specified for rr:column should be an IRI reference, blank node, or (if object) a literal.
  • rr:datatype: specifies the datatype of the object component for the generated triple from a logical table row.
  • rr:language: specifies the language for the object component for the generated triple from a logical table row.

Furthermore there are some features of R2RML, which can either be directly be integrated into the SQL query or specified as part of the mapping. These features are for example:

  • rr:template : A template (format string) to specify how to generate a value for a subject, predicate, or object, using one or more columns from a logical table row.
  • rr:inverseExpression :An expression that allows, at query processing time the use of index-based access to the the relational tables, instead of simply retrieving the table rows first and then applying a filter. For clarification see the example below.
  • rr:joinCondition : specifies the join condition for joining the child logical table with the parent logical table of the foreign key constraint.
However these features are easily implementable by modifying the SQL query during the translation from R2RML mapping to a Quest mapping, but, especially in the case of the template features, one could think of introducing them into the Quest Mapping Language.

Example of an rr:inverseExpression

Consider the following fragment of a R2RML mapping:

     rr:logicalTable [rr:sqlQuery """
       Select ('Department' || "deptno") AS "deptId"
            , "deptno"
            , "dname"
            , "loc"
         from SCOTT.DEPT
       """];

    rr:subjectMap [ rr:column "deptId"; rr:inverseExpression "{deptno} = substr({deptId},length('Department')+1)"];

In this mapping the rr:inverseExpression states that the value of the colum deptno is equal to the fragment of the deptId value, which we get by applying the substr method. In oder word it shows how to undo the concatenation done in the SQL query. This allows to directly compare the value of an RDF node to a given value by applying the inverseExpression to the value of the RDF node, instead of going through each value of the the involved column, applying the template or string modifications to it and then compare it with the value of the RDF node. NOTE: -ontop-'s current version doesn't have support for inverse expression yet!

Notes

  • a triple map contains only one subject map.
  • a subject map does not contain necessarily a rr:class element
  • object classes can also be defined outside and used as a variable inside the triple map.
  • a R2RML mapping does not necessarily contain the rr:SQLQuery element, it is enough if the rr:tableName is set. In this case the SQL query is implicitly given as "SELECT * FROM TABLENAME".
  • The W3C WG has not yet made a final commitment of which will be the primary language to produce R2RML mappings, however their favorite seems to be TURTLE.
  • By default R2RML will suppress triples when the subject, predicate, or object columns are NULL, if an application needs other handling for NULL values then a SQLQuery can be defined in the mapping to convert NULL values to some other application specific value

Translating between R2RML and Quest Mappings

We illustrate the translation between an R2RML and a Quest mapping with an example that consists of the following two relational tables, along with sample data.

Team Table

Column Name Colum Type Key Constraint
TEAM_ID INTEGER PRIMARY KEY
NAME VARCHAR(30)
CITY VARCHAR(30)
   team_id   |     name              |    city
----------------------------------------------------------
   001       |    FC Barcelona       |   Barcelona       
   002       |    CF Real Madrid     |   Madrid          

Player Table

Column Name Colum Type Key Constraint
PLAYER_ID INTEGER PRIMARY KEY
NAME VARCHAR(30)
TEAM_ID INTEGER REFERENCES CLUB(CLUB_ID)
   player_id   |     name               |    team_id   
--------------------------------------------------------
   101         |    Lionel Messi        |   001         
   102         |    Christiano Ronaldo  |   002        

R2RML Mappings

The tables specified in the above section are mapped to the RDF using the mapping specificied in the following. We use the following prefixes in the mappings for this examples:

Prefix IRI
rr: http://www.w3.org/ns/r2rml#
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
ex: http://example.com/ns#

R2RML Mapping for table team:

<TriplesMapTeam>
    a rr:TriplesMapClass;
    
    rr:logicalTable [rr:tableName  "TEAM"] ;    

    rr:subjectMap [ rr:template "http://example.com/soccer/team/{team_id}" ; rr:class ex:Team ] ;

    rr:predicateObjectMap 
    [ 
       rr:predicate     ex:name ;
      rr:objectMap    [ rr:column "name" ]
    ] ;
    rr:predicateObjectMap 
    [ 
      rr:predicate      ex:locatedIn ;
      rr:objectMap    [ rr:column "city" ]
    ]
    .

R2RML Mapping for table player:

<TriplesMapPlayer>
    a rr:TriplesMapClass;
    
    rr:logicalTable [rr:tableName  "PLAYER"] ;    

    rr:subjectMap [ rr:template "http://example.com/soccer/player/{player_id}" ; rr:class ex:Player ] ;

    rr:predicateObjectMap 
    [ 
      rr:predicate      ex:name ;
      rr:objectMap    [ rr:column "name" ]
    ];
    rr:refPredicateObjectMap
    [ 
       rr:predicate         ex:playsFor ;
       rr:refObjectMap    [ rr:parentTriplesMap <TriplesMapTeam> ; 
                                   rr:joinCondition [
						rr:child "team_id" ;
						rr:parent "team_id" ;
					]
      ]
    ] .

Quest Mappings

Translating the R2RML mapping for the team table is pretty straight forward. The subject class will be translated into a unary atom associatated to the class specified in the mapping, whereas the two predicate maps are translated into binary atoms associated to the properties specified in the mapping. The variable used in the unary atom is the one representing the template, which will be introduce in the SQL query. This variable will also be the first variable of the properties, the second variable is simply the table column specified by the rr:colum element. The major difficulty in this translation is to mantain the uri template specified in the subject map. For this we need to slightly modify the SQL query, such that it creates the URI on the fly, which might requires parsing of the SQL and its implementation is also depending on the RDBMS used, since string concatination in SQL is implemented differently in the different RDBMS. The resulting mapping is:

m1: SELECT team_id, name, city FROM team → 
    q(team_id, name, city) :- ex:Team(uri("http://example.com/soccer/team/",team_id)), 
                                              ex:has-name(uri("http://example.com/soccer/team/", team_id),name), 
                                              ex:locatedIn(uri("http://example.com/soccer/team/",team_id),city)

Translating the R2RML mapping for the player table is a little more complicated due to the join condition. In this case we need to completely change the SQL query so that we can perform the join expressed in the join condition. Therefore might need to change completely the SQL such that it takes into account the join. Furthermore we need to introduce the template expressed in this mapping as well as the template used in the team mapping. The other components can be translated as in the previous case. The resulting Quest Mapping is:

m2: SELECT team_id, player_id, name FROM team, player 
    WHERE team.team_id = player.team_id → 
    q(team_id, player_id, name) :- ex:Team( uri("http://example.com/soccer/team/", team_id)), 
                                                               ex:name( uri("http://example.com/soccer/player/",team_id), name), 
                                                               ex:playsFor( uri("http://example.com/soccer/player/", player_id), uri("http://example.com/soccer/team/", team_id))

R2RML tools in -ontop-

This feature of the OBDA plugin allows us to import RDB2RDF mappings and translate them into equivalent Quest mappings as well as to export a set of Quest mappings and translate them in to equivalent RDB2RDF mappings.

Import-export mappings in -ontoPro-

The -ontoPro- Protege plugin allows users to import a Turtle syntax R2RML mapping file, and adds it to the specified data source. It falls on the user to make sure that the mapping specification refers to the correct tables and columns in the source database. Nevertheless, in the mapping manager all mappings are modifiable after importing.

Exporting mappings from -ontoPro- takes all mappings of a specific data source and writes them in an R2RML Turtle syntax file. Note, that the exported file in cases of JOIN conditions will not be the same as a semantically equivalent input R2RML file. This is due to the fact that -ontop- creates a new mapping for the join condition, and hence in the output this will appear as a new TriplesMap. This is necessary because of the existence of the WHERE clause (or joinCondition) for the join, hence we cannot include it in the mapping for the whole table (which is not conditioned). As mentioned, this will not alter the semantics or the correctness of the mappings in any way.

Open issues

  • no support for inverseExpression
  • predicate cannot be uritemplate (column reference or template declaration) {it will be supported soon}
  • object cannot be uritemplate (column reference or template declaration) when predicate is rdf:type {it will be supported soon}
  • no support for bnode in Quest
  • no support for sqlversion
  • no support for graphMaps (hence context graphs)
  • duplicate rows from database get processed as duplicate statements
  • bnode naming is arbitrary