New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash caused by concurrent CF iterations and drops #5982
Comments
Perhaps if you could run your program under valgrind or compiled with ASAN, it would give more accurate hints of where the bug is. |
hi @maysamyabandeh, we found a case to reproduce it:
Perhaps the cause is the iterator has not been released when the CF is dropped, but I don't know how it cause the behavior in the new version 6.x. public static void main(String args[]) throws Exception {
String path = "rocksdb-data";
DBOptions options = new DBOptions();
options.setWalDir(path);
options.setCreateIfMissing(true);
List<ColumnFamilyDescriptor> cfds = new ArrayList<>();
cfds.add(new ColumnFamilyDescriptor(encode("default")));
List<ColumnFamilyHandle> cfhs = new ArrayList<>();
RocksDB rocksdb = RocksDB.open(options, path, cfds, cfhs);
ColumnFamilyDescriptor cfd = new ColumnFamilyDescriptor(encode("cf"));
System.out.println("create CF");
ColumnFamilyHandle cfHandle = rocksdb.createColumnFamily(cfd);
RocksIterator iter = rocksdb.newIterator(cfHandle);
iter.seekToFirst();
//iter.close(); // It's no coredump if manually closed here
System.out.println("drop CF");
rocksdb.dropColumnFamily(cfHandle);
System.out.println("close CF");
cfHandle.close(); // <== Assertion failed: (is_last_reference) here with v6.3.6, it's ok with v5.14.2 and earlier
System.out.println("done");
} |
Does it seem the CF-drop has lost ColumnFamilyData reference? |
I think you got it right. The iterator must be closed before the column family is dropped. |
@maysamyabandeh Thank you very much. |
CC @adamretter who knows the Java world better. |
hi @maysamyabandeh @adamretter I analyzed the code, maybe there are 2 ways to solve this issue:
We will test for both of the two ways, if it is resolved, of course we are willing to submit a patch. |
Both the above two ways can solve this issue, and I have submitted a patch(using the first way). ps. we also tried the second way, the attachment shows the code diff: fix-coredump-by-iter-ref-cfd.diff.txt |
Closing ColumnFamilyHandle with unreleased iterators is easy to cause coredump, because the iterator release is controlled by java GC when using JNI. This patch fixed it, we let an iterator hold a ColumnFamilyData reference to prevent the CF from being released too early. fixed facebook#5982
I would like to hear @adamretter's take on this. But iterator's keeping references to no-longer-existing CFs sounds a bit scary to me. I am afraid it might cause more problems than it fixes. |
Here is the suggested discipline (curtesy of @sagar0): https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management |
@maysamyabandeh I agree with this point. Still we need to consider how to deal with the following scenarios: If someone A is querying with iterators, and someone else B wants to drop the CF, what should we do:
|
You are right that concurrent cf drop is not a well-explored topic. It seems that a simple RW lock on the user side could care of most of the cases. To address further liveness issues, the user can implement fancier synchronization methods. Since it does not seem to be a popular use case, it seems better to keep the complexity on the user side rather than pushing it to inside rocksdb. |
For concurrent CF iteration and drop, the user-side implementation cost is higher, and each user has to achieve their different implementations by themselves. Whereas it's more friendly and secure to push it to inside rocksdb, and the implementation on the rocksdb side is not complex. A discipline can work, but it will increase the error risk and difficulty for users, especially if the user does not even know the discipline. In terms of the implementation, I think we don't have to worry about iterator's keeping the ColumnFamilyData reference. AWAK the Iterator holds SuperVersion reference, which is essentially the same as holding ColumnFamilyData reference. And the code of CleanupIteratorState() is similar to ~ColumnFamilyHandleImpl(): release the reference and then PurgeObsoleteFiles(). Therefore it's reasonable for iterator's keeping ColumnFamilyData reference (or instead of the SuperVersion reference). |
@javeme the whole point of having SuperVersion is to avoid the cost of acquiring DB mutex for each get or iterator, which used to be a major bottleneck to the system. Bringing back DB mutex in the normal read paths will almost definitely introduce performance bottleneck. Even if DB mutex is not held, frequent referencing ColumnFamilyData is not what we want to do in the long term. It is true that we reference SuperVersion in iterators, but we do hope to get rid of it using a sharded pool of SuperVersion. Referencing ColumnFamilyData defeated the plan. If you try to solve the problem, the path is to have SuperVersion to reference ColumnFamilyData, in this way, normal reads will not need to hold DB mutex. There is some complexity about the solution that ColumnFamilyData references current SuperVersion too, so it is a circular. But it should be able to made work with some efforts. You are welcome to give it a try. |
@siying thanks, I get the context. I will try to follow the path you said, and have found two implementations:
ColumnFamilyHandleImpl --> SuperVersion --> ColumnFamilyData
Flush & Compaction --^
ColumnFamilyHandleImpl --> SuperVersion <--> ColumnFamilyData <-- Flush & Compaction What do you think about the second way? |
I agree that 1 is very complicated to implement. 2 is very tricky too. ColumnFamilyHandle right now is a simple wrapper of ColumnFamilyData, so other than as a pointer to ColumnFamilyData it is immutable and can be freely used in all the threads. Introducing a super version, which can be changed would make it really complicated. I think we might be able to just resolve the circular reference when doing cleaning up. When we clean up column family data, can we just simply dereferencce current super version and let the the dereference and clean up super version to clear up ColumnFamilyData if it in turn becomes the last reference to the ColumnFamilyData? |
It seems that circular reference can't be broken, and there is no chance to delete ColumnFamilyData since each other's refs are greater than 0. Therefore, we are unable to clean up ColumnFamilyData forever. (◞‸◟) |
If we can't take advantage of circular reference, can we do ColumnFamilyData.Ref() at the time of SuperVersion.Ref() like ColumnFamilyData::GetReferencedSuperVersion(), and do ColumnFamilyData.Unref() when dereferencing SuperVersion? Only hold DB mutex when deleting ColumnFamilyData with ColumnFamilyData.refs=0. |
I'm not sure I understand it. If SuperVersion has reference 1 by CFD and CFD has reference 1 by SuperVersion, can a function call TryDeleteCft() simply dereferences super version, which would triggers a SuperVersion::Cleanup(), which dereference CFD and destroy it? |
Is the TryDeleteCfd() probably like this? ColumnFamilyData::TryDeleteCfd() {
this->Unref();
if (this->refs_ == 1)
if (super_version_->Unref())
delete super_version_;
}
Still 2 key questions to confirm with you:
if (cfd_->Unref()) {
delete cfd_;
}
SuperVersion::TryDeleteSuperVersion() {
this->Unref();
assert this->refs_ > 0;
if (this->refs_ == 1) {
this->Cleanup(); // will call cfd
delete this; // will delete cfd if cfd->Unref() returns true
} |
@siying Any suggestions? so that we can continue. |
@javeme I actually means:
and
May have some bugs there though. |
Hmm,
may need to be moved after Cleanup() in some way. |
@javeme To answer the original concern about the Java API. I do not want to introduce book-keeping (reference counting) or synchronisation into that API unless absolutely necessary. Whilst doing so might make it easier for some users who are sharing objects in a concurrent manner, it will have a negative impact for other users who want the fast possible single-thread performance, or who have different concurrency concerns. In addition, I think in most places it makes sense for the Java API and expected behaviour to mirror the C++ API, albeit with some of the unsafe/rough edges rounded off. |
@maysamyabandeh Ok, we'll try the way siying said. |
@siying Do you mean that let SuperVersion hold a pointer to ColumnFamilyData, but don't increase the reference count of ColumnFamilyData? And change all the Compaction::~Compaction() {
if (input_version_ != nullptr) {
input_version_->Unref();
}
if (cfd_ != nullptr) {
if (cfd_->Unref()) {
delete cfd_;
}
}
} => Compaction::~Compaction() {
if (input_version_ != nullptr) {
input_version_->Unref();
}
if (cfd_ != nullptr) {
if (cfd_->Unref()) {
cfd_->TryDeleteCfd(); // call cfd_->TryDeleteCfd() instead of delete cfd_
}
}
} |
If CFD has reference 1 by SuperVersion, then cfd->Unref() will always return false (due to circular reference), therefore there is a problem: when do we call TryDeleteCfd()? I think TryDeleteCfd() can only be called when the CFD reference is 0 (means cfd->Unref() returns true). To make an opportunity to call TryDeleteCfd(), I have found two ways:
bool ColumnFamilyData::Unref() {
int old_refs = refs_.fetch_sub(1);
assert(old_refs > 0);
return old_refs == 1;
}
bool ColumnFamilyData::Unref() {
int old_refs = refs_.fetch_sub(1);
assert(old_refs > 0);
return old_refs == 2;
}
|
I mean SuperVersion holds a reference to CFD. It's the point of fixing the crash. |
Can TryDeleteCfd() always deference CFD by 1, and also dereference super version by 1? |
Yes no problem. In fact I have proposed this way in #5982 (comment), I thought you may disagree 🥇 ColumnFamilyData::TryDeleteCfd() {
this->Unref();
if (this->refs_ == 1)
if (super_version_->Unref())
delete super_version_;
} I will submit a patch after a while, thanks. |
It's easy to cause coredump when closing ColumnFamilyHandle with unreleased iterators, especially iterators release is controlled by java GC when using JNI. This patch fixed concurrent CF iteration and drop, we let iterators(actually SuperVersion) hold a ColumnFamilyData reference to prevent the CF from being released too early. fixed facebook#5982
Summary: It's easy to cause coredump when closing ColumnFamilyHandle with unreleased iterators, especially iterators release is controlled by java GC when using JNI. This patch fixed concurrent CF iteration and drop, we let iterators(actually SuperVersion) hold a ColumnFamilyData reference to prevent the CF from being released too early. fixed facebook#5982 Pull Request resolved: facebook#6147 Differential Revision: D18926378 fbshipit-source-id: 1dff6d068c603d012b81446812368bfee95a5e15
Actual behavior
The library used is from https://mvnrepository.com/artifact/org.rocksdb/rocksdbjni/6.3.6
another time:
Steps to reproduce the behavior
Unknown currently
Reproducible version: 6.3.6, 6.2.x, 6.1.x, 6.0.x (all from maven rocksdbjni)
Env
System Details
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial
The text was updated successfully, but these errors were encountered: