-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in BlockBasedTableIterator::CheckDataBlockWithinUpperBound() using JNI #9378
Comments
Adding to the above context: How does Alluxio use RocksDB From the crash stacktrace, the code throwing the segfault is from Pasting the exact code here, hoping that helps further discussion and reference a bit:
Alluxio uses RocksDB to store the block metadata and associated locations. For example "metadata" means the block length etc, and "locations" means on which workers the block replicas reside. For locations, we use RocksDB as a KV store, where the key is simply a For completeness of logic, the codes for adding and removing block locations are as below, from the same
The @apc999 Please correct me if i'm wrong on how Alluxio metadata is structured in RocksDB. FYI we had a previous discussion on a similar segfault issue here. Can't tell if our previous fix attempt didn't work or this time it's a different issue. Much appreciated if anybody could shed some light regarding that too! |
It'll be great if you could let us know what we can better collect from the somewhat-frequent reproductions (anything like RocksDB logs?), or how we can further debug this (like how to find the on-disk file causing the segfault etc). |
This line:
might be problematic. I don't whether it is the root of the problem, but Slice probably needs to be hold during the whole process. This is very similar to your previous reporting: https://groups.google.com/g/rocksdb/c/PwapmWwyBbc/m/Wh4N89tjBgAJ Can you give a similar fix a try? |
thanks for the quick response! In this case, since we do hold a reference to the |
ReadOptions actually doesn't hold a reference to Slice. I think this is an implementation that we probably should improve but I think this is how it is now. |
Thanks, we will give this a try. |
Thanks a lot for the replies @siying ! Sure we will try the suggested fix and update asap. But sorry I still fail to understand why the
The method argument is a reference to the anonymous Am I reading it wrong? Or how do we tell if something held in a member variable does not stop the underlying object to be GC-ed? |
@jiacheliu3 |
You are correct. Sorry for my previous wrong answer. I think it might have something to do with how RocksDB Java deals with native handle. As @Cheng-Chang said, it is possible that Java might think some objects are of no use anymore, while RocksDB still keeps its native handle. I suspect that it might have happened for endKey, and perhaps readOptions too. It's just a suspicious and I don't know how to test the theory or even how to fix it. I'm not sure whether Java has someway to guarantee the endKey and readOptions are not garbage collected until the end of the function. |
@adamretter do you have any thought how we can test and/or fix the problem? |
I guess one way to hack it around is to create a class contains endKey, readOptions and iter so that as long as the class is kept around, they are not GCed. |
I see a lot of patterns like
nativeHandle_ is passed while the original holder object is untouched (and subsequently may get GC-ed). I guess if this is a problem then it is quite general and applies to many places in RocksDB code?
Not sure if I understand correctly, it seems there are two potential problems when the java object and the native object lifecycle are not matched perfectly:
I believe these problems must have been thought through in RocksDB JNI design. I must have missed something like written-out anti-pattern docs? I was looking at memory mgmt but it only mentioned how the java objects should be closed properly. Much appreciated if someone could point us to the right direction! @Cheng-Chang @siying @yuzhu |
Hi, just dropping in to introduce myself. I work for @adamretter and I have been looking at some changes and improvements in RocksJava. First to confirm memory mgmt is the guidance for how to stay safe while using the JNI interface to RocksDB. Also @jiacheliu3 you are right that we can potentially hold multiple java references to a nativeHandle_, only one of which is the "owning" reference and which will cause the underlying C++ rocksdb object to close() when it in turn is closed (often as an Autocloseable in a try block). So it is the responsibility of the Java code to ensure that other references do not outlive that reference, in order to avoid a potential SEGFAULT. But looking at the code, it is not obvious to me that this is causing the problem here. I don't believe It's relevant to note that I'm doing some experiments with a new mechanism for native handles from Java which uses std::shared_ptrs to ensure that all Java references count in ownership of C++ objects, but that's highly experimental and may not amount to anything; but it's addressing a common problem because Java developers have expectations of a Java API which aren't currently met by the RocksJava. I suspect, without any proof, that the problem may instead be a concurrency issue, although given it is all just readers, it would need to be some kind of erroneous state sharing between the read threads. In Alluxio or in rocks ? I don't know. You could try @siying's idea re the Slice object, as it should be easy to test ? But I don't think it's the reason as nothing is going out of scope and becoming eligible for GC until the try has finished. |
Thanks for the reply @alanpaxton ! We are currently trying siying's idea and scrutinizing all our past usages for On your extra info about a better Java-C++ mechanism, that sounds great! If there's some ongoing work we can subscribe to that'd be very intriguing :) I share the same suspicion that the segfault on the read path could be from the writer's sin, but as previously mentioned, we don't have any evidence at the moment. Would it be possible that we can tell by examining embedded Rocks log or examining the on-disk storage to find more information for you? Back to the current state that the existing docs might not be intuitive or comprehensive enough, we are more than happy to contribute something like code examples that people can copy-paste and follow, without need to worry about segfaults or leaks. Alluxio relies on RocksDB heavily to store metadata off-heap when dealing with hundreds of millions of files. So we need to consolidate that and make sure everything's correct anyways.... That'd be awesome if we can contribute the doc somewhere in the Rocks repo and discussion in your reviews. Where is the best link/doc to contribute that to? |
Hi @jiacheliu3 I think for documentation you just want to add to the wiki where the info is missing or outdated. I'll gladly share what I'm doing in a draft PR as soon as I make any progress. I generally work in public in my fork (github.com/alanpaxton/rocksdb) and I'm alan at evolvedbinary dot com. I'm not aware how to get anything more out of the stack dump. That shows it's falling over when comparing a key against the upper bound, but you pretty much inferred that already. I'll have more of a dig around (it's useful to me to see what kind of real client errors happen) and let you know if I find anything. |
@jiacheliu3 @dbw9580 I have seen some examples of where Java Objects are GC'd but are still needed in subsequent API calls - this can cause such a segfault, although I can't be certain that this is your issue. In the early design of the RocksJava API (before I was involved), it was decided that the So... some time ago I introduced a mechanism to manually control the memory release in RocksJava by adding a At the moment @alanpaxton is investigating and prototyping a possible alternative strategy which may provide an automated but efficient approach that doesn't involved the GC, and instead uses reference counting - but that is still a work in progress at the moment. If we do remove the auto-deallocation from RocksJava in 7.0.0, then of course the user must free their RocksObjects explicitly by calling close, otherwise they will leak memory. However, the problem of the GC disposing objects that you still need will be removed. If you believe the GC is causing your problems. For the time being, your best approach is to keep a reference to the Java object that you need, and the free that reference once your are certain you are finished with it. |
### What changes are proposed in this pull request? Fixes facebook/rocksdb#9378 The `orgs.rocksdb.RocksObject` extends `AutoCloseable` so they should be closed properly to avoid leaks. ### Why are the changes needed? According to the [RocksDB documentation](https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management) and a sample [PR discussion](facebook/rocksdb#7608 (comment)), the objects need to be closed or wrapped in a try-with-resource block. ### Does this PR introduce any user facing changes? NA pr-link: #14856 change-id: cid-69a164c547a14e0bb7710fc7a48731aecba4db88
@adamretter Thx for you explanation!
|
The segfault no longer occurs after the fix Alluxio/alluxio#14856 The issue can be closed and many thanks to your help! |
### What changes are proposed in this pull request? Fixes facebook/rocksdb#9378 The `orgs.rocksdb.RocksObject` extends `AutoCloseable` so they should be closed properly to avoid leaks. ### Why are the changes needed? According to the [RocksDB documentation](https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management) and a sample [PR discussion](facebook/rocksdb#7608 (comment)), the objects need to be closed or wrapped in a try-with-resource block. ### Does this PR introduce any user facing changes? NA pr-link: #14856 change-id: cid-69a164c547a14e0bb7710fc7a48731aecba4db88
Close RocksDB objects that are AutoCloseable ### What changes are proposed in this pull request? Fixes facebook/rocksdb#9378 The `orgs.rocksdb.RocksObject` extends `AutoCloseable` so they should be closed properly to avoid leaks. ### Why are the changes needed? According to the [RocksDB documentation](https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management) and a sample [PR discussion](facebook/rocksdb#7608 (comment)), the objects need to be closed or wrapped in a try-with-resource block. ### Does this PR introduce any user facing changes? NA pr-link: Alluxio#14856 change-id: cid-69a164c547a14e0bb7710fc7a48731aecba4db88
Hi, we are using RocksDB to store file system metadata in Alluxio. When retrieving these metadata from RocksDB, it occasionally segfaults with the following stack trace:
We are using RocksDB 6.25.3.
Steps to reproduce the behavior
Currently we don't have a reproduction without involving Alluxio. So the steps to reproduce this is
Start an Alluxio cluster with rocksdb as the metastore;
Use a internal benchmark tool to create a heavy load on the Master, that creates the 1 million blocks under 1 path. Then read the metadata of the blocks concurrently from 1500 clients threads. Each of the clients is listing all the 1 million blocks. (more details in this comment)
Observe that the JVM gets SIGSEGV and killed with a core dump.
We have observed that under light load the segfault is unlikely to occur. Only under heavy load (say > 1 million blocks in one path during one benchmark run) this segfault occurs quite frequently.
Full core dumps: https://drive.google.com/drive/folders/1bcP4bMyktLURFLPGoonIrbUL-hrVxLKQ?usp=sharing
The text was updated successfully, but these errors were encountered: