Using ARC's RDF Store

barobba edited this page Jan 25, 2012 · 2 revisions


An ARC Store is instantiated like any other component:

/* ARC2 static class inclusion */ 

/* configuration */ 
$config = array(
  /* db */
  'db_host' => 'localhost', /* optional, default is localhost */
  'db_name' => 'my_db',
  'db_user' => 'user',
  'db_pwd' => 'secret',

  /* store name (= table prefix) */
  'store_name' => 'my_store',

/* instantiation */
$store = ARC2::getStore($config);

Creating the MySQL tables

if (!$store->isSetUp()) {

Running Queries

$q = 'SELECT ...';
$rs = $store->query($q);
if (!$store->getErrors()) {
  $rows = $rs['result']['rows'];

ARC supports standard SPARQL 1.0 queries as well as SPARQL+ for write operations.

Result formats

The default query() method returns an associative array with two keys: "query_time" and "result". The former tells how long the SPARQL engine needed to process the query (excluding parse time), the latter contains query-dependent sub-structures. The query() method also accepts a second parameter to specify a result format. Examples are listed below:

query('SELECT ?fname ...')

// Results:
// $rs['query_time']                  Duration
// $rs['result']['rows']              Rows
// $rs['result']['rows'][0]           First row
// $rs['result']['rows'][1]           Second row
// $rs['result']['rows'][1]['fname']  Second row result by "SPARQL variable" name

query('SELECT ?fname ...', 'rows')

// Results:
// $rs   Rows

query('ASK ...')

// Results:
// $rs['query_time']  Duration
// $rs['result']      TRUE or FALSE

query('ASK ...', 'raw')

// Results:
// $rs   TRUE or FALSE


// Results:
// $rs['query_time']                     Duration
// $rs['result']                         Index
// $rs['result']['']  Index res

The index format is described in Internal Structures.

query('DESCRIBE', 'raw')

// Results:
// $rs   Index

query('CONSTRUCT ...') works analogue to DESCRIBE

query('LOAD ...')

// Results:
// $rs['query_time']                   Duration
// $rs['result']['t_count']            Added triples
// $rs['result']['load_time']          Load time
// $rs['result']['index_update_time']  Index update time

query('LOAD ...', 'raw')

// Results:
// $rs['t_count']            Added triples
// $rs['load_time']          Load time
// $rs['index_update_time']  Index update time

query('INSERT ...') works analogue to LOAD

query('DELETE ...')

// Results:
// $rs['query_time']                   Duration
// $rs['result']['t_count']            Removed triples
// $rs['result']['delete_time']        Delete time
// $rs['result']['index_update_time']  Index update time

query('DELETE ...', 'raw')

// Results:
// $rs['t_count']            Removed triples
// $rs['delete_time']        Delete time
// $rs['index_update_time']  Index update time

query('DUMP) creates (and outputs) a store backup (see dump method below), the result format parameter has no effect

Advanced query parameters

Besides a query and result_format, the query() method accepts two other parameters: query_base and whether to keep_bnode_ids. "query_base" (parameter #3, default: empty) allows you to specify a base for the query (e.g. if the query contains relative paths, but no BASE).

"keep_bnode_ids" (parameter #4, default: false) is an advanced trigger that enables deletes and updates of blank nodes. ARC supports bnode identification for read operations, i.e. bnode IDs returned by a SELECT can be used in successive queries, if masked as URIs (e.g. <_:bn27>). Likewise, ARC can be told to write bnodes to the store without changing their IDs:

      $q1 = 'DELETE FROM <...> { <_:methuselah> ex:age ?age . }';
      $q2 = 'INSERT INTO <...> { <_:methuselah> ex:age 969 . }';
      $store->query($q1, 'raw', '', true);
      $store->query($q2, 'raw', '', true);

Other methods

reset() All tables are emptied.

drop() All tables are deleted.

insert($doc, $g, $keep_bnode_ids = 0) A convenience method. $doc can be an ARC structure, or an ARC-supported RDF format (including HTML), $g is the target graph URI, $keep_bnode_ids is explained in the paragraph above.

dump() Creates a SPOG document from all quads in the store. This method can be used for streamed store backups.

createBackup($path, $q = '') Saves a SPOG file that either contains a complete store dump, or triples/quads from a custom, SPO(G)-compliant SELECT query (via the $q parameter).

replicateTo($name) Creates a new store and replicates all tables and quads to it.

renameTo($name) Renames the store's underlying database tables.

optimizeTables($level = 2) /* 1: triple + g2t, 2: triple + *2val, 3: all tables */ Defragments the MySQL tables. This method is automatically called every ~50th LOAD or DELETE query. You can also call it explicitly, though, when queries are getting slower than they should due to store updates.

extendColumns()** Changes the table column types from MEDIUMINT to INT for scaling beyond 16M triples. Called automatically by RDF loaders.

Advanced configuration options

store_indexes (default: array('sp (s,p)', 'os (o,s)', 'po (p,o)')) Custom MySQL triple table indexes.

store_write_buffer (default: 2500) This option let's you set the batch size of triples written to the MySQL tables via SQL.

store_engine_type (default: MyISAM)** This option let's you set the MySQL engine type used by ARC, in case your application environment works better with InnoDB, or maybe even MEMORY.

store_strip_mb_comp_str (default: false) If you encounter UTF-8/multibyte-related MySQL errors on your system during INSERTs or LOADs, you can try setting this flag to "1". Multibyte comparisons may then return inaccurate results, but the errors should go away.

max_errors (default: 25) This option let's you set the maximum number of errors before ARC will stop proceeding (e.g. during LOADs or streaming parsing).

Querying remote SPARQL endpoints

ARC provides a dedicated "RemoteStore" component for running queries against Web-accessible SPARQL endpoints.