Slow query #142

v4ss4llo · 2017-12-14T06:59:03Z

I've tried to load the latest wikidata dump with Fuseki using -Xmx6g for the Java VM. Once loaded, the resident memory is less than 3G.
Then I ran this query: select ?s where { ?s a <http://wikiba.se/ontology-beta#Item> } limit 10 offset 20000000 but eventually I killed it after several minutes because it wouldn't return. To execute this query, all my 4 CPU cores were at 100%, Java was using all the allocated RAM (6G), and no disk reads or writes at all.
I'd like to understand why this query was slow to return... so slow in fact that it didn't return anything. If HDT has a OPS index, shouldn't this query almost immediate to resolve? What's going on under the hoods of HDT? Or is it maybe because I need enough ram to hold the entire 28GB index in memory (but even in this case, searching a OPS index from disk shouldn't take forever...)?

Thank you, waiting for any insights.

The text was updated successfully, but these errors were encountered:

wouterbeek · 2017-12-14T09:28:34Z

For me the query terminates in 35 seconds using one core (which is not too bad given the 20M offset). Could this be an issue with Fuseki rather than HDT?

v4ss4llo · 2017-12-14T09:40:11Z

@wouterbeek did you run the query with Fuseki or with something else?

v4ss4llo · 2017-12-14T09:50:12Z

Could this be an issue with Fuseki rather than HDT?

It's more likely that the if there is a problem, it is with hdt-java since Fuseki delegates execution AFAIK. Fuseki doesn't deal with HDT files at all.

v4ss4llo · 2017-12-14T09:59:53Z

Just ran another test, killed it after 15 minutes running at 100% CPU and JavaVM using all 6G RAM available.

wouterbeek · 2017-12-14T10:11:41Z

@v4ss4llo I ran the query using hdt-cpp. Unfortunately I have no experience with Fuseki, but this does not seem to be an hdt-cpp issue.

v4ss4llo · 2017-12-14T11:39:04Z

What query/tool did you run exactly? As far as I know the only search tool in hdt-cpp is hdtSearch, which only accepts patterns matching like "? ? ?"

wouterbeek · 2017-12-14T15:14:47Z

I use the hdt-cpp API, not the command-line scripts, although it should work with the command-line scripts as well if you implement limit and offset in Bash.

afs · 2017-12-17T15:49:35Z

Further information from the OP: that isn't the query they were using. When the correct query is used, there is no issue.

v4ss4llo mentioned this issue Dec 14, 2017

Slow query rdfhdt/hdt-java#65

Closed

wouterbeek closed this as completed Dec 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow query #142

Slow query #142

v4ss4llo commented Dec 14, 2017 •

edited

wouterbeek commented Dec 14, 2017

v4ss4llo commented Dec 14, 2017

v4ss4llo commented Dec 14, 2017

v4ss4llo commented Dec 14, 2017

wouterbeek commented Dec 14, 2017

v4ss4llo commented Dec 14, 2017

wouterbeek commented Dec 14, 2017

afs commented Dec 17, 2017

Slow query #142

Slow query #142

Comments

v4ss4llo commented Dec 14, 2017 • edited

wouterbeek commented Dec 14, 2017

v4ss4llo commented Dec 14, 2017

v4ss4llo commented Dec 14, 2017

v4ss4llo commented Dec 14, 2017

wouterbeek commented Dec 14, 2017

v4ss4llo commented Dec 14, 2017

wouterbeek commented Dec 14, 2017

afs commented Dec 17, 2017

v4ss4llo commented Dec 14, 2017 •

edited