Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Larger Hash index Splitting #2939

Open
benjaminwinger opened this issue Feb 23, 2024 · 0 comments
Open

Larger Hash index Splitting #2939

benjaminwinger opened this issue Feb 23, 2024 · 0 comments

Comments

@benjaminwinger
Copy link
Collaborator

It might make sense to try splitting a full page of slots (4096/256 = 16) each time we split. That would usually not increase the number of pages touched, and should not add significant overhead, but could help somewhat in reducing overall i/o (though it doesn't reduce the amount of wal page pins/unpins, since those are being done individually, but maybe we could do a single read + single write for the split instead).

E.g. for individual CREATE statements: For 8-byte keys (14 entries per slot), once we reach 66% of capacity we split about once every 9 insertions, and each split will write to two pages (in addition to the page containing the slot which holds the new key). Since splitting happens to slots incrementally, this means that 16 splits in a row occur within the same two pages (ignoring overflow slots, which reduce the benefits since they aren't grouped together on disk).

It's probably not huge though in terms of overall writes. 2 extra pages written every 9 CREATEs means 20 instead of 18 pages written in total (assuming no rehashing occurs each insert should modify just two pages, the header, and the slot being inserted into; more for long strings). This would make that be two extra pages every ~149 inserts instead. I suspect that in terms of real-world performance, bypassing the disk array when doing the split and updating the slots just once might have more of a performance impact (though we can do that regardless).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant