Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow query #142

Closed
v4ss4llo opened this issue Dec 14, 2017 · 8 comments
Closed

Slow query #142

v4ss4llo opened this issue Dec 14, 2017 · 8 comments

Comments

@v4ss4llo
Copy link

v4ss4llo commented Dec 14, 2017

I've tried to load the latest wikidata dump with Fuseki using -Xmx6g for the Java VM. Once loaded, the resident memory is less than 3G.
Then I ran this query: select ?s where { ?s a <http://wikiba.se/ontology-beta#Item> } limit 10 offset 20000000 but eventually I killed it after several minutes because it wouldn't return. To execute this query, all my 4 CPU cores were at 100%, Java was using all the allocated RAM (6G), and no disk reads or writes at all.
I'd like to understand why this query was slow to return... so slow in fact that it didn't return anything. If HDT has a OPS index, shouldn't this query almost immediate to resolve? What's going on under the hoods of HDT? Or is it maybe because I need enough ram to hold the entire 28GB index in memory (but even in this case, searching a OPS index from disk shouldn't take forever...)?

Thank you, waiting for any insights.

@wouterbeek
Copy link
Contributor

For me the query terminates in 35 seconds using one core (which is not too bad given the 20M offset). Could this be an issue with Fuseki rather than HDT?

@v4ss4llo
Copy link
Author

@wouterbeek did you run the query with Fuseki or with something else?

@v4ss4llo
Copy link
Author

Could this be an issue with Fuseki rather than HDT?

It's more likely that the if there is a problem, it is with hdt-java since Fuseki delegates execution AFAIK. Fuseki doesn't deal with HDT files at all.

@v4ss4llo
Copy link
Author

Just ran another test, killed it after 15 minutes running at 100% CPU and JavaVM using all 6G RAM available.

@wouterbeek
Copy link
Contributor

@v4ss4llo I ran the query using hdt-cpp. Unfortunately I have no experience with Fuseki, but this does not seem to be an hdt-cpp issue.

@v4ss4llo
Copy link
Author

What query/tool did you run exactly? As far as I know the only search tool in hdt-cpp is hdtSearch, which only accepts patterns matching like "? ? ?"

@wouterbeek
Copy link
Contributor

I use the hdt-cpp API, not the command-line scripts, although it should work with the command-line scripts as well if you implement limit and offset in Bash.

@afs
Copy link

afs commented Dec 17, 2017

Further information from the OP: that isn't the query they were using. When the correct query is used, there is no issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants