New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid exception if fail to insert block to block-cache #8670
Comments
cc @zhichao-cao , @ltamasi @akankshamahajan15 who might be interested in this topic. |
Thanks for the suggestion. So there will be a new error code in Status as Status::FailInsertCache()? |
@zhichao-cao Yes, this is just a proposal and hope to get your response to see whether this is practicable. |
Correction: LRUCacheOptions My concern is that if you are setting strict_capacity_limit=true on Cache, you are probably interested in limiting the amount of memory used to hold SST blocks. But under your proposal, iterators and active Get operations are allowed to allocate as much as they want beyond this limit, because they ignore when insertion into Cache fails. This is worse than setting strict_capacity_limit=false because at least with strict_capacity_limit=false you can cache hit on everything that is already in memory. |
It is a good question though - since the allocation and read already happened, does it make sense to return the value? I think it's not that straightforward. When using the PinnableSlice APIs we don't just fill a value buffer for the caller -- the whole underlying block comes with it. And this is amplified in MultiGet() or iterators. One alternative we've been discussing is to pre-reserve cache memory before doing the allocation and read. That would prevent the case where we do an allocation and read only to throw it away and fail the operation. But we would still fail the operation -- it would just be an earlier/more efficient way to fail. |
Thanks for your feedback! From my previous understanding, current memory control solution in RocksDB cannot control it very well, just as you pointed out. It would first let the memory allocation happen and then choose whether to free that after inserting into the cache. This is not perfect but better than totally no memory control. I agree that RocksDB should refactor to consider memory allocation first, however, that might take a while to do so and I don't known whether this is already in current roadmap. On the other hand, Apache Flink community heavily depends on this future in container environment and make the cache strictly limit could help improve the memory control. From current status, I think make strict_capacity_limit=true and not failing the get operation could be the low hanging fruit. I think this is not something against the long term design. What do you think ? @ajkr @pdillinger @zhichao-cao BTW, I think wait for evicting lru cache until we fail the operation, that might be someting better for long-running applications. |
I think the big complication is that we can't always predict the size needed for decompressed block. Note: there are two other "lookup with intent to insert" things we could consider solving along with this:
|
I thought the size of compressed blocks is typically known -- that is "compression format 2" which writes the size at the beginning. The size an uncompressed block will be compressed is however unknown. I wonder if having every Cache have a MemoryAllocator and having a smarter allocator (that allocated a Handle and could support "reallocate") would be helpful... |
That doesn't help you predict the uncompressed size when all you have is a BlockHandle to the compressed block. |
…void insertion exception on cache full ### What changes were proposed in this pull request? Disable strict limit for RocksDB write manager to avoid insertion exception on cache full ### Why are the changes needed? In some cases, if the memory limit is reached, on insert/get, we are seeing the following exception ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 42 in stage 9.0 failed 4 times, most recent failure: Lost task 42.3 in stage 9.0 (TID 2950) (96.104.176.55 executor 0): org.rocksdb.RocksDBException: Insert failed due to LRU cache being full. at org.rocksdb.RocksDB.get(Native Method) at org.rocksdb.RocksDB.get(RocksDB.java:2053) at org.apache.spark.sql.execution.streaming.state.RocksDB.get(RocksDB.scala:299) at org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider$RocksDBStateStore.get(RocksDBStateStoreProvider.scala:55) ``` It seems this is being thrown with strict memory limit within RocksDB here - https://github.com/facebook/rocksdb/blob/0fa0c97d3e9ac5dfc2e7ae94834b0850cdef5df7/cache/lru_cache.cc#L394 It seems this issue can only happen with the strict mode as described here - facebook/rocksdb#5048 (comment) Seems like there is a pending issue for RocksDB around this as well - facebook/rocksdb#8670 There is probably a relevant fix, but not sure whether this addresses the issue completely - facebook/rocksdb#6619 (cc - siying ) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests Closes #42567 from anishshri-db/task/SPARK-44878. Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Expected behavior
If fail to insert block to block-cache, just abot the insert but return the result as normal.
Actual behavior
If we use default
ReadOptions
withstrict_capacity_limit
block cache, RocksDB would throw exception ifInsert failed due to LRU cache being full
.Steps to reproduce the behavior
This is because current
RetrieveBlock
would check the status ofMaybeReadBlockAndLoadToCache
:And if fail to insert, the block cache would report incomplete status:
As more and more applications move on cloud, and fine-granularity memory control is required in many use cases. I think enabling the strict capacity limit of block cache in production environment should be something valueable.
How about this solution:
FailInsertCache
.fill_cache_if_possible
in ReadOptions, which means if cannot fill block to cache, we would not treatFailInsertCache
status as a problem, and continue the read process as normal.What do you think of this problem and the propsed solution?
The text was updated successfully, but these errors were encountered: