New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow query #142
Comments
For me the query terminates in 35 seconds using one core (which is not too bad given the 20M offset). Could this be an issue with Fuseki rather than HDT? |
@wouterbeek did you run the query with Fuseki or with something else? |
It's more likely that the if there is a problem, it is with hdt-java since Fuseki delegates execution AFAIK. Fuseki doesn't deal with HDT files at all. |
Just ran another test, killed it after 15 minutes running at 100% CPU and JavaVM using all 6G RAM available. |
@v4ss4llo I ran the query using hdt-cpp. Unfortunately I have no experience with Fuseki, but this does not seem to be an hdt-cpp issue. |
What query/tool did you run exactly? As far as I know the only search tool in hdt-cpp is hdtSearch, which only accepts patterns matching like "? ? ?" |
I use the hdt-cpp API, not the command-line scripts, although it should work with the command-line scripts as well if you implement limit and offset in Bash. |
Further information from the OP: that isn't the query they were using. When the correct query is used, there is no issue. |
I've tried to load the latest wikidata dump with Fuseki using -Xmx6g for the Java VM. Once loaded, the resident memory is less than 3G.
Then I ran this query:
select ?s where { ?s a <http://wikiba.se/ontology-beta#Item> } limit 10 offset 20000000
but eventually I killed it after several minutes because it wouldn't return. To execute this query, all my 4 CPU cores were at 100%, Java was using all the allocated RAM (6G), and no disk reads or writes at all.I'd like to understand why this query was slow to return... so slow in fact that it didn't return anything. If HDT has a OPS index, shouldn't this query almost immediate to resolve? What's going on under the hoods of HDT? Or is it maybe because I need enough ram to hold the entire 28GB index in memory (but even in this case, searching a OPS index from disk shouldn't take forever...)?
Thank you, waiting for any insights.
The text was updated successfully, but these errors were encountered: