ObdalibQuestPerformanceTuning

juliencorman edited this page Aug 8, 2017 · 8 revisions
Clone this wiki locally

valid for Ontop Version 1

Performance Tuning

THIS PAGE HAS BEEN SUPERSEDED BY MappingDesignTips. Please refer to the latter for updated tips.

In order to get the best performance, we recommend the following general tips by categories:

Queries:

  • Avoid complex queries, SELECT-PROJECT-JOIN queries work best.
Mappings:
  • Avoid unnecessary joins in the source query. They are the main performance issue, since they can slow down queries up to 100 times.
  • Use primary keys as part of the URI templates if possible.
  • Do not use UNION in SQL, use multiple mappings instead.
  • Try to use a minimal number of mappings.
Database:
  • Define primary and foreign keys in your database.
  • Memory: set your database to use around 2/3 of available memory.
  • Disks: in most cases they are the bottlenecks of database performance. If possible put your data on a faster disk.
  • Indexes and Keys: create indexes in your SQL tables for columns that are used in JOINs and or WHERE clauses. Quest uses optimizations based mainly on primary keys. Indexes speed queries a lot. Use indexes where possible (when your data is more or less static, there are few inserts, or space is not an issue).
Benchmarking:
  • System warmup: for the best performance allow the system to "warm up". First queries are always slower, because of connections to database, initializations, etc...
In particular, when you have a set of queries on which you would like to obtain maximal performance, here are the things that you can do:
  • analyze your queries, check which tables they use, see if the database uses indexes on the selected columns. This you can do by analyzing the query plan. Copy the generated SQL query, prefix it with the keyword "EXPLAIN", and execute it over an SQL querying interface over the database. It will give you information about which indexes and keys it uses, and from here you can get a hint if you need to create a new, possibly combined index on the table columns.
Example: having a database table about train logistics event (iLog scenario)
ituevent ID ITUEventNumber eventTime terminalID trainID ITUID ITUEventTypeID

and the query:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX : <http://www.semanticweb.org/ontologies/2011/10/iLogOntology.owl#>
SELECT DISTINCT ?terminal  ?itu
WHERE {
?event a :ITUEvent;
 :takesPlaceAt ?terminal;
 :happensToITU ?itu;
 :hasEventTime ?time.
FILTER(?time < "2011-11-12T00:00:00.000Z"^^xsd:dateTime)}
ORDER BY ?terminal

Although terminalID, trainID, ITUID and ITUEventTypeID are foreign keys, and therefore are indexed automatically by the database, the use of these indexes does not result in the best performance. A new, combined index is added on the columns: eventTime, terminalID, ITUID.

In order to analyze the query plan, we execute the explain command on the resulting SQL string:

EXPLAIN
SELECT *
FROM (
SELECT DISTINCT 
   1 AS "terminalQuestType", NULL AS "terminalLang", CAST(CONCAT('http://www.semanticweb.org/ontologies/2011/10/iLogOntology.owl#terminal/',(CAST(QVIEW1.<tt>terminalID</tt> AS CHAR))) AS CHAR) AS <tt>terminal</tt>, 
   1 AS "ituQuestType", NULL AS "ituLang", CAST(CONCAT('http://www.semanticweb.org/ontologies/2011/10/iLogOntology.owl#itu/',(CAST(QVIEW1.<tt>ITUID</tt> AS CHAR))) AS CHAR) AS <tt>itu</tt>
 FROM 
<tt>ituevent</tt> QVIEW1
WHERE 
QVIEW1.<tt>ID</tt> IS NOT NULL AND
QVIEW1.<tt>terminalID</tt> IS NOT NULL AND
QVIEW1.<tt>ITUID</tt> IS NOT NULL AND
QVIEW1.<tt>eventTime</tt> IS NOT NULL AND
(QVIEW1.<tt>eventTime</tt> < '2011-11-12 00:00:00+00:00')
) SUB_QVIEW
ORDER BY SUB_QVIEW.<tt>terminal</tt>
  • analyze the generated UCQ and SQL for unneccessary joins. First, the length of the query can make you suspicious that there are unnecessary fragments. Analyze your mappings, to see that for each class, object and data property you have only one mapping as a source of materialization, if possible. Be sure not to use joins inside the mappings.
  • keep a minimal number of mappings
Example: having three tables
Book Author Editor
bid aid eid
title name name
authorID birth_date
edition death_date
editorID nationality

The table Book contains the foreign keys authorID and editorID linking to other tables Author and Editor. When constructing the mappings, the URI-s contain the primary keys, i.e. the id-s of each table. One can make for the table Books two mappings as follows:

mappingId       Books
target		:book/{bid} a :Book; :title {title} . 
source		select bid, title from Book

mappingId       Book has Author
target		:book/{bid} :author :author/{authorid}/
source		select bid, authorID from Book

OR one could compact the two mappings into one as follows:

mappingId       Books
target		:book/{bid} a :Book; :title {title}; :author :author/{authorid} . 
source		select bid, title, authorID from Book
  • when your database is properly set up with primary keys, foreign keys and indexes, and you have the clearest set of mappings, you can do the following: increase database memory buffer to almost maximum, warm up (run the query several times) before you record the performance values.
  • use "DISTINCT" and "ORDER BY" in the SPARQL query only when necessary. It slows down performance significantly with large result sets.