-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REA-2863] File is too large for PlainTableReader! #88
Comments
The performance results were favorable for switching to block based format. https://gerrit.readyset.name/c/readyset/+/5206 https://readyset-workspace.slack.com/archives/C05471W2M44/p1687374146825119 |
It's REA-2714 (mentioned above) — going to perform at least the PlainTable vs BlockBased performance to validate this is OK to switch. |
We had a task slated to benchmark other table formats against PlainTable for both write and read latency specifically for this reason - I think @Ethan might have been working on it but I'm not sure where it ended up |
Thanks. Not much to go by out there I'm afraid. |
Looks like this change was made back in the noria days and there isn't much context as to why it was chosen. If i had to take a wild guess, the wording in https://github.com/facebook/rocksdb/wiki/PlainTable-Format "PlainTable is a RocksDB's SST file format optimized for low query latency on pure-memory or really low-latency media." sounds appealing for a research project focused on low query latency. They also had the advantage of running on smaller datasets and avoiding the "file is too large" problem that real world scenarios can run into. |
@KwilLuke Can you git-blame to see who might have made the choice to use PlainTable instead of the default? |
PlainTable Format is the only rocksdb SST format that has this size restriction, and it is a hard limit due to the data structure using 32-bit integers in indexing. It seems like we should be able to configure rocksdb to avoid making a file greater than the 2GiB limit in compaction, and from what I can tell we set it to make 256MiB files at most, but there must be something that is at least temporarily making one too large. I'm not exactly sure where yet, and it is a bit slow to reproduce this repeatedly. As we already have a hunch that BlockBased Table Format may be a better choice (it is the default for rocksdb), I will try using that and confirm that it can handle the failure scenario in this ticket, as well as sanity check the performance a bit, which will cover some of the matrix of REA-2714. |
cc: @jasobrown Who I know was interested in RocksDB settings as well. |
No, I was going to work on this one first.
We don't directly call this currently. We do similar options-tweaking that PrepareForBulkUpload does to improve performance, though. I tried calling this and didn't see any significant performance improvement. |
So, are you looking at implementing REA-2900 first? I thought we might already be invoking "PrepareForBulUpload" in our code. Seems worthwhile for us to try that out first, I agree. |
If we implement REA-2900 before this, it may either resolve this or have a different signature, with compaction failing with the same error after opening PersistentState rather than during. |
Fixed in 8a32b30 |
Summary
If the snapshotting of a large (~100 GiB) table gets interrupted before compaction finishes (as can easily happen if one forgets to run
ulimit -n
to allow readyset to use more file descriptors than the default 1024), Readyset Panics when re-opening persistent state.Description
See summary
Expected behavior
ReadySet can open persistent state and complete the compaction of any snapshot table
Actual behavior
Readyset permanently fails if rocksdb gets into the state where a file is too big to open for PlainTableReader
Steps to reproduce
ReadySet version
Upstream DB type and version
Postgres 14.5
Instance Details
Logs
From SyncLinear.com | REA-2863
The text was updated successfully, but these errors were encountered: