Internal Structures

Christopher Johnson edited this page Jan 12, 2014 · 3 revisions

ARC uses object-oriented code for its components and methods, but the processed data structures consist of simple associative arrays, which leads to faster operations and less memory consumption. Apart from a few special formats returned by the SPARQL engine (e.g. from SELECT or INSERT queries), ARC is built around two core structures: triple sets and resource indexes.

Triple sets

A triple set is a flat array that contains (associative) triple arrays. Triple sets can be processed with a simple loop:

...
$triples = $parser->getTriples();
for ($i = 0, $i_max = count($triples); $i < $i_max; $i++) {
  $triple = $triples[$i];
  ...
}

A single triple array can contain the following keys:

  • s the subject value (a URI, Bnode ID, or Variable)
  • p the property URI (or a Variable)
  • o the object value (a URI, Bnode ID, Literal, or Variable)
  • s_type "uri", "bnode", or "var"
  • o_type "uri", "bnode", "literal", or "var"
  • o_datatype a datatype URI
  • o_lang a language identifier, e.g. ("en-us")

Variables are generated by ARC's Turtle parser which was implemented for the SPARQL processor and therefore extends the Turtle spec with support for features such as single quotes around literals, and variables. The latter lead to a handy by-product which can be used for dynamic graph creation.

Resource Indexes

A resource index is an associative array of triples indexed by subject -> predicates -> objects. ARC2's resource indexes are compatible with Talis' RDF/PHP specification.

$index = array(
  '_:john' => array(
    'http://xmlns.com/foaf/0.1/knows' => array(
      '_:bill',
      '_:bob',
      '_:mary',
    ),
  ),
  '_:mary' => ...
);

echo $index['_:john']['http://xmlns.com/foaf/0.1/knows'][0];

ARC supports two index forms. The one above uses flat objects, which can be handy for simplified access operations, but can lead to information loss (e.g. when the object type is not clear, or when a datatype was present in the original triples). The second, slightly extended index structure keeps the object details:

$index = array(
  '_:john' => array(
    'http://xmlns.com/foaf/0.1/knows' => array(
      array('value' => '_:bill', 'type' => 'bnode'),
      array('value' => '_:bob', 'type' => 'bnode'),
      ...
    ),
  ),
);

echo $index['_:john']['http://xmlns.com/foaf/0.1/knows'][0]['value'];

This form is also returned by the DESCRIBE and CONSTRUCT query handlers.

Index and Triple Operations

ARC provides a number of operations for PHP arrays compatible with the structures described above, e.g.

$triples = $parser->getTriples();
$turtle_doc = $parser->toTurtle($triples);
$json_doc = $parser->toRDFJSON($triples);

$index = ARC2::getSimpleIndex($triples, false) ; /* false -> non-flat version */
$rdfxml_doc = $parser->toRDFXML($index);

$triples = ARC2::getTriplesFromIndex($index);

$merged_index = ARC2::getMergedIndex($index1, $index2, $index3);