Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iter and range methods don't work #7

Open
troad456 opened this issue Aug 7, 2023 · 5 comments
Open

Iter and range methods don't work #7

troad456 opened this issue Aug 7, 2023 · 5 comments

Comments

@troad456
Copy link

troad456 commented Aug 7, 2023

Thanks for this excellent librarry!

I noticed that the iter and range methods don't work currently because only the leaf nodes are being iterated over, and not the inner nodes. Just curious if you plan to resolve this

@rdaum
Copy link
Owner

rdaum commented Aug 10, 2023

Thanks for poking at this and finding that. It doesn't surprise me they are broken, unfortunately; they've been on my list of things to poke at for a while because I'm not happy with the way they're implemented. Right now they'll do a bunch of excessive copies during iteration, and defeats half the purpose of the ART structure, which is fast sequential operations.

I would definitely like to fix this, when I get the time, not likely to be this week.

In the meantime, if you have a reproduction case, even a small fragment of code, feel free to provide. Regression tests more than welcome.

@rdaum
Copy link
Owner

rdaum commented Aug 18, 2023

So I've fixed .iter(), but .range() remains broken for now, pending more serious surgery.

The TLDR is that there's issues with the comparison of key values as passed in to .insert, .range, etc. and with the Partial representations that end up stored in the tree. When iterating we have to reconstruct the key bytes based on the various partials in the tree. With range the reconstructed keys need to be compared to the begin/end keys passed in for the Range, so that we can determine if we've hit the end (or start) of the range. Right now this is broken.

At some point hopefully soon I am going to rework how keys are produced, and change it so that users have to pass in a couple functors for key->bytes and partials->key conversion when making the tree. I think this will allow me to fix the range stuff properly.

There was also an issue with iteration and sort order. At one point I changed one of the child node mappings so that it wasn't internally sorted -- but instead by insertion order -- which had a fairly minor but real performance improvement. But it means that the tests that compared iteration order in the ART vs iteration order in a BTree were unstable. I've switched back to the SortedKeyedMapping for now, but will likely make it an option for the user as soon as I can get around to that refactoring.

@grovesNL
Copy link

grovesNL commented Jun 26, 2024

@rdaum Do you have a sense for how complicated the key rework might be? I have a use case that might be a good fit for rart but I can't benchmark it without range, so I was wondering if I might be able to help somehow.

@rdaum
Copy link
Owner

rdaum commented Jun 26, 2024 via email

@grovesNL
Copy link

That sounds interesting, thank you! I'll take a look.

Maybe I could explain my use case in case you have any other ideas - maybe I could skip range somehow. I currently have a BTreeMap<u32, u32> but it's causing performance problems on range queries/insertion/removal (especially range queries, so some trade-off would probably be ok).

I'm interested in some kind of radix tree to represent the same thing but with fast mutable range searches, e.g., "give me all keys bounded by a range, maybe remove some nodes, then insert some more nodes" My thought was to write my own tree and organize the data around SIMD lanes somehow to accelerate the search, but it looks like rart already does most of that so maybe I could help get it to the point I'd like.

For what it's worth, I also know roughly how the keys will be distributed (over 0 to u32::MAX) and how far queries would span over those shards, so it's possible to bucket/shard them across multiple trees if that could help in some way (e.g., rart per shard and linear scans, although maybe that's a bad idea).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants