Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRUCache strict_capacity=true #5048

Open
toktarev opened this issue Mar 7, 2019 · 8 comments
Open

LRUCache strict_capacity=true #5048

toktarev opened this issue Mar 7, 2019 · 8 comments

Comments

@toktarev
Copy link

toktarev commented Mar 7, 2019

We are not having any problems after I use rocksdb the way I described many times in this thread. Our read write load is very high and dB size is around terabytes. Still we are able to handle that with limited docker instance with 5gb ram limit including our Java app which is itself quite hungry. My guess is that rocksdb is taking max 2gb. All is running perfectly stable for months.

What I took as best for us:

  1. Set Max open files! Yes it is a must.
  2. Two level index.
  3. Shared and strictly limited caches.
  4. Jemalloc. It helps to keep memory stable. We have quite aggressive setup, but still no noticeable performance impact.
  5. Directio for sequence scans.
  6. Custom patches (Java specific)

All of these basically helps system and library. And it works perfectly. I use modified 5.12.5. Unfortunately patches that also helps a lot for high loads (Java direct buffers) were not merged. Otherwise jvm suffers from memory hot spots.

Anyway, rocksdb team, this is great work. I love this library.

Originally posted by @koldat in #4112 (comment)

@toktarev toktarev changed the title We are not having any problems after I use rocksdb the way I described many times in this thread. Our read write load is very high and dB size is around terabytes. Still we are able to handle that with limited docker instance with 5gb ram limit including our Java app which is itself quite hungry. My guess is that rocksdb is taking max 2gb. All is running perfectly stable for months. We are not having any problems after I use rocksdb Mar 7, 2019
@toktarev
Copy link
Author

toktarev commented Mar 7, 2019

@koldat , thanks a lot for such greate help
Could you please explain how do you live with
"strictly limited caches".

Throwing of
s = Status::Incomplete("Insert failed due to LRU cache being full.");

in background processes like compaction and flush + failed writes and iterations/reads can lead to unpredictable system behaviour.

How do you handle this problem.

Thanks in advance.

@toktarev toktarev changed the title We are not having any problems after I use rocksdb LRUCache strict_capacity=true Mar 7, 2019
@koldat
Copy link
Contributor

koldat commented Mar 7, 2019

I am not using pinning of index blocks thus cache should always have space as cache I use id also quite big (512 MB). In worst case it would throw and exception and application needs to handle that by its own.

I do not think that if background process would fail it will cause unpredictable behavior. Why do you think so? I guess the library should handle that as failed job and job will be scheduled again. This is my assumption and maybe more question to core developers.

Anyway I did not met this corner case.

@toktarev
Copy link
Author

toktarev commented Mar 7, 2019

--Why do you think so
because when we turned strict_capacity=true we saw tons of messages in our log with the following message: Insert failed due to LRU cache being full
I am not sure if it is correct behaviour and what kind of problems it can cause

@toktarev
Copy link
Author

toktarev commented Mar 7, 2019

---I am not using pinning of index blocks

which parameters do you use ?

I see such error on default BlockBasedTableOptions where
pin_l0_filter_and_index_blocks_in_cache=false

@koldat
Copy link
Contributor

koldat commented Mar 7, 2019

I just scanned all our rocksdb LOGs we have and there is not a single message like "Insert failed due to LRU cache being full".

We are having defaults so pin_l0_filter_and_index_blocks_in_cache=false

What is size of your cache?

And I forgot to mention that for sequence scans we use:
readOpts = new ReadOptions();
readOpts.setFillCache(false);

I also guess that functionality like open iterators, etc is pinning current pages into cache. Maybe you can check that as well.

@toktarev
Copy link
Author

toktarev commented Mar 7, 2019

We see message "Insert failed due to LRU cache being full" in the application LOG not in RocksDB log

@siying
Copy link
Contributor

siying commented Mar 7, 2019

@toktarev if you see this message, it pretty much means that if you turn strict mode to false, RocksDB will use more memory to cache blocks for capacity of block cache.

@toktarev
Copy link
Author

toktarev commented Mar 8, 2019

    if (usage_ - lru_usage_ + charge > capacity_ &&
        (strict_capacity_limit_ || handle == nullptr)) {
      if (handle == nullptr) {
        // Don't insert the entry but still return ok, as if the entry inserted
        // into cache and get evicted immediately.
        last_reference_list.push_back(e);
      } else {
        delete[] reinterpret_cast<char*>(e);
        *handle = nullptr;
        s = Status::Incomplete("Insert failed due to LRU cache being full.");
      }
    } else {

as we can see from the code this message can be thrown only if strict_capacity_limit_=true

HeartSaVioR pushed a commit to apache/spark that referenced this issue Aug 23, 2023
…void insertion exception on cache full

### What changes were proposed in this pull request?
Disable strict limit for RocksDB write manager to avoid insertion exception on cache full

### Why are the changes needed?
In some cases, if the memory limit is reached, on insert/get, we are seeing the following exception

```
org.apache.spark.SparkException: Job aborted due to stage failure: Task 42 in stage 9.0 failed 4 times, most recent failure: Lost task 42.3 in stage 9.0 (TID 2950) (96.104.176.55 executor 0): org.rocksdb.RocksDBException: Insert failed due to LRU cache being full.
	at org.rocksdb.RocksDB.get(Native Method)
	at org.rocksdb.RocksDB.get(RocksDB.java:2053)
	at org.apache.spark.sql.execution.streaming.state.RocksDB.get(RocksDB.scala:299)
	at org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider$RocksDBStateStore.get(RocksDBStateStoreProvider.scala:55)
```

It seems this is being thrown with strict memory limit within RocksDB here - https://github.com/facebook/rocksdb/blob/0fa0c97d3e9ac5dfc2e7ae94834b0850cdef5df7/cache/lru_cache.cc#L394

It seems this issue can only happen with the strict mode as described here - facebook/rocksdb#5048 (comment)

Seems like there is a pending issue for RocksDB around this as well - facebook/rocksdb#8670

There is probably a relevant fix, but not sure whether this addresses the issue completely - facebook/rocksdb#6619
(cc - siying )

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests

Closes #42567 from anishshri-db/task/SPARK-44878.

Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants