Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Bug: Scan rel is slow when bound node offsets are random #3787

Open
Tracked by #3666
ray6080 opened this issue Jul 9, 2024 · 0 comments
Open
Tracked by #3666
Assignees
Labels
bug Something isn't working

Comments

@ray6080
Copy link
Contributor

ray6080 commented Jul 9, 2024

Kùzu version

master

What operating system are you using?

No response

What happened?

Currently, we assume scans of rel tables are sequential (bound node offsets are sequential), and cache the whole node group header (CSR lengths and offsets) ahead of actual scans. However, in cases like recursive joins and graph algorithms, the access pattern can be random and we lost the guarantee of sequential scans.

Under these cases, we should take in a batch of bound node offsets (with the assumption that they are all sorted? this is to be decided by the caller, but this should be a reasonable gurantee to be provided), and have a smarter way of cacheing only necessary offsets and lengths from the node group header.

The optimization should be all done inside initializeScanState without the need of introducing another set of interfaces.

Are there known steps to reproduce?

No response

@ray6080 ray6080 added the bug Something isn't working label Jul 9, 2024
@ray6080 ray6080 mentioned this issue Jul 11, 2024
81 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants