Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash caused by concurrent CF iterations and drops #5982

Closed
javeme opened this issue Oct 29, 2019 · 30 comments
Closed

crash caused by concurrent CF iterations and drops #5982

javeme opened this issue Oct 29, 2019 · 30 comments
Assignees

Comments

@javeme
Copy link
Contributor

javeme commented Oct 29, 2019

Actual behavior

The library used is from https://mvnrepository.com/artifact/org.rocksdb/rocksdbjni/6.3.6

*** Error in `/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java': corrupted double-linked list: 0x00007fe34ae8dca0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fe3c42857e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x80baf)[0x7fe3c428ebaf]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fe3c429253c]
/tmp/librocksdbjni5970098043799648318.so(_ZN7rocksdb14LRUHandleTableD1Ev+0xa0)[0x7fe3988cf450]
/tmp/librocksdbjni5970098043799648318.so(_ZN7rocksdb8LRUCacheD1Ev+0x4b)[0x7fe3988cf52b]
/tmp/librocksdbjni5970098043799648318.so(_ZNSt19_Sp_counted_deleterIPN7rocksdb8LRUCacheENSt12__shared_ptrIS1_LN9__gnu_cxx12_Lock_policyE2EE8_DeleterISaIS1_EEES8_LS5_2EE10_M_disposeEv+0x15)[0x7fe3988d11c5]
/tmp/librocksdbjni5970098043799648318.so(_ZN7rocksdb22BlockBasedTableFactoryD0Ev+0x2ba)[0x7fe398a9cbba]
/tmp/librocksdbjni5970098043799648318.so(Java_org_rocksdb_ColumnFamilyOptions_disposeInternal+0x4c9)[0x7fe39887d8f9]
[0x7fe3adcf4528]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:01 543321                             /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
00600000-00601000 r--p 00000000 08:01 543321                             /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
00601000-00602000 rw-p 00001000 08:01 543321                             /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
01ee0000-03f1e000 rw-p 00000000 00:00 0                                  [heap]
8b200000-8fb80000 rw-p 00000000 00:00 0 
8fb80000-d9100000 ---p 00000000 00:00 0 
d91000000 00:00 0 
7fe3697f3000-7fe3697f4000 ---p 00000000 00:00 0 
7fe3697f4000-7fe369ff4000 rw-p 00000000 00:00 0 
7fe369ff4000-7fe369ff5000 ---p 00000000 00:00 0 
...
7fe3767fd000-7fe3767fe000 ---p 00000000 00:00 0 
7fe3767fe000-7fe376ffe000 rw-p 00000000 00:00 0 
7fe376ffe000-7fe376fff000 ---p 00000000 00:00 0 
7fe376fff000-7fe3777ff000 rw-p 00000000 00:00 0 
7fe3777ff000-7fe377800000 ---p 00000000 00:00 0 
7fe377800000-7fe378000000 rw-p 00000              /tmp/librocksdbjni5970098043799648318.so

another time:

*** Error in `/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java': corrupted size vs. prev_size: 0x00007fa73a7da6d0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fa7b31a87e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x7e9dc)[0x7fa7b31af9dc]
/lib/x86_64-linux-gnu/libc.so.6(+0x81cde)[0x7fa7b31b2cde]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7fa7b31b5184]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_Znwm+0x18)[0x7fa7b18b1e78]
/tmp/librocksdbjni5858602310051612369.so(_ZN7rocksdb25NewArenaWrappedDbIteratorEPNS_3EnvERKNS_11ReadOptionsERKNS_18ImmutableCFOptionsERKNS_16MutableCFOptionsERKmmmPNS_12ReadCallbackEPNS_6DBImplEPNS_16ColumnFamilyDataEbb+0x39)[0x7fa7675ed2f9]
/tmp/librocksdbjni5858602310051612369.so(_ZN7rocksdb6DBImpl15NewIteratorImplERKNS_11ReadOptionsEPNS_16ColumnFamilyDataEmPNS_12ReadCallbackEbb+0x84)[0x7fa767578db4]
/tmp/librocksdbjni5858602310051612369.so(_ZN7rocksdb6DBImpl11NewIteratorERKNS_11ReadOptionsEPNS_18ColumnFamilyHandleE+0x94)[0x7fa767578ed4]
/tmp/librocksdbjni5858602310051612369.so(Java_org_rocksdb_RocksDB_iteratorCF__JJJ+0xc7)[0x7fa7674e6127]
[0x7fa79d77aa2e]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:01 543321                             /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
00600000-00601000 r--p 00000000 08:01 543321                             /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
00601000-00602000 rw-p 00001000 08:01 543321                             /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
01e78000-03eb6000 rw-p 00000000 00:00 0                                  [heap]
8b200000-90f80000 rw-p 00000000 00:00 0 
90f80000-d9100000 ---p 00000000 00:00 0 
d9100000-f7900000 rw-p 00000000 00:00 0 
f7900000-100000000 ---p 00000000 00:00 0 
100000000-1003e0000 rw-p 00000000 00:00 0 
1003e0000-140000000 ---p 00000000 00:00 0 
7fa72c000000-7fa72f001000 rw-p 00000000 00:00 0 
7fa72f001000-7fa730000000 ---p 00000000 00:00 0 
7fa730000000-7fa733031000 rw-p 00000000 00:00 0 
...
7fa76732e000-7fa7679de000 r-xp 00000000 08:01 1145                       /tmp/librocksdbjni5858602310051612369.so
7fa7679de000-7fa767bde000 ---p 006b0000 08:01 1145                       /tmp/librocksdbjni5858602310051612369.so
7fa767bde000-7fa767bf0000 r--p 006b0000 08:01 1145                       /tmp/librocksdbjni5858602310051612369.so
7fa767bf0000-7fa767bf9000 rw-p 006c2000 08:01 1145                       /tmp/librocksdbjni5858602310051612369.so

Steps to reproduce the behavior

Unknown currently

Reproducible version: 6.3.6, 6.2.x, 6.1.x, 6.0.x (all from maven rocksdbjni)

Env

System Details

Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial

@maysamyabandeh
Copy link
Contributor

Perhaps if you could run your program under valgrind or compiled with ASAN, it would give more accurate hints of where the bug is.

@javeme
Copy link
Contributor Author

javeme commented Oct 30, 2019

hi @maysamyabandeh, we found a case to reproduce it:

  1. create a CF
  2. query the CF with non-close iterator
  3. drop the CF
  4. close the CF

Perhaps the cause is the iterator has not been released when the CF is dropped, but I don't know how it cause the behavior in the new version 6.x.

     public static void main(String args[]) throws Exception {
        String path = "rocksdb-data";

        DBOptions options = new DBOptions();
        options.setWalDir(path);
        options.setCreateIfMissing(true);

        List<ColumnFamilyDescriptor> cfds = new ArrayList<>();
        cfds.add(new ColumnFamilyDescriptor(encode("default")));
        List<ColumnFamilyHandle> cfhs = new ArrayList<>();
        RocksDB rocksdb = RocksDB.open(options, path, cfds, cfhs);

        ColumnFamilyDescriptor cfd = new ColumnFamilyDescriptor(encode("cf"));
        System.out.println("create CF");
        ColumnFamilyHandle cfHandle = rocksdb.createColumnFamily(cfd);

        RocksIterator iter = rocksdb.newIterator(cfHandle);
        iter.seekToFirst();
        //iter.close(); // It's no coredump if manually closed here

        System.out.println("drop CF");
        rocksdb.dropColumnFamily(cfHandle);

        System.out.println("close CF");
        cfHandle.close(); // <== Assertion failed: (is_last_reference) here with v6.3.6, it's ok with v5.14.2 and earlier

        System.out.println("done");
    }

@javeme
Copy link
Contributor Author

javeme commented Oct 30, 2019

Does it seem the CF-drop has lost ColumnFamilyData reference?

@maysamyabandeh
Copy link
Contributor

I think you got it right. The iterator must be closed before the column family is dropped.

@javeme
Copy link
Contributor Author

javeme commented Nov 1, 2019

@maysamyabandeh Thank you very much.
Can we let iterator-constructor increase the refs of ColumnFamilyData, because the iterator release is controlled by java GC when using JNI, so it is is easy to cause this issue.

@maysamyabandeh
Copy link
Contributor

CC @adamretter who knows the Java world better.

@javeme
Copy link
Contributor Author

javeme commented Nov 2, 2019

hi @maysamyabandeh @adamretter

I analyzed the code, maybe there are 2 ways to solve this issue:

  1. The first is in the Java layer, let org.rocksdb.RocksIterator hold a ColumnFamilyHandle reference, to prevent the ColumnFamilyHandle from being released before org.rocksdb.RocksIterator.
  2. The second is in the C++ layer, let the InternalIterator pin ColumnFamilyData(increase CFD refs), witch can guarantee the underlying ColumnFamilyData is released after the ArenaWrappedDBIter.

We will test for both of the two ways, if it is resolved, of course we are willing to submit a patch.

@javeme
Copy link
Contributor Author

javeme commented Nov 2, 2019

Both the above two ways can solve this issue, and I have submitted a patch(using the first way).

ps. we also tried the second way, the attachment shows the code diff: fix-coredump-by-iter-ref-cfd.diff.txt

javeme added a commit to javeme/rocksdb that referenced this issue Nov 3, 2019
Closing ColumnFamilyHandle with unreleased iterators is easy to cause coredump,
because the iterator release is controlled by java GC when using JNI.

This patch fixed it, we let an iterator hold a ColumnFamilyData reference to
prevent the CF from being released too early.

fixed facebook#5982
@maysamyabandeh
Copy link
Contributor

I would like to hear @adamretter's take on this. But iterator's keeping references to no-longer-existing CFs sounds a bit scary to me. I am afraid it might cause more problems than it fixes.
At the moment, there is no bug in the code and it would be fine as long as the developer follows the suggested discipline of closing iterators explicitly before dropping the cf.

@maysamyabandeh
Copy link
Contributor

Here is the suggested discipline (curtesy of @sagar0): https://github.com/facebook/rocksdb/wiki/RocksJava-Basics#memory-management

@javeme
Copy link
Contributor Author

javeme commented Nov 5, 2019

But iterator's keeping references to no-longer-existing CFs sounds a bit scary to me

@maysamyabandeh I agree with this point.

Still we need to consider how to deal with the following scenarios:

If someone A is querying with iterators, and someone else B wants to drop the CF, what should we do:

  1. Don't allow to drop CF, return error or the B must wait for A until the end of query. (B wait A)
  2. Allow to drop CF, and
    • Interrupt the query in progress. (B interrupt A)
    • Allow the query to continue, but not allow new queries after the CF drop. (B and A are isolated)

@maysamyabandeh
Copy link
Contributor

You are right that concurrent cf drop is not a well-explored topic. It seems that a simple RW lock on the user side could care of most of the cases. To address further liveness issues, the user can implement fancier synchronization methods. Since it does not seem to be a popular use case, it seems better to keep the complexity on the user side rather than pushing it to inside rocksdb.

@javeme javeme changed the title v6.3.6 'corrupted double-linked list' or 'corrupted size vs. prev_size' crash causedConcurrent CF iterations and drops Nov 11, 2019
@javeme javeme changed the title crash causedConcurrent CF iterations and drops crash caused by concurrent CF iterations and drops Nov 11, 2019
@javeme
Copy link
Contributor Author

javeme commented Nov 11, 2019

For concurrent CF iteration and drop, the user-side implementation cost is higher, and each user has to achieve their different implementations by themselves. Whereas it's more friendly and secure to push it to inside rocksdb, and the implementation on the rocksdb side is not complex. A discipline can work, but it will increase the error risk and difficulty for users, especially if the user does not even know the discipline.

In terms of the implementation, I think we don't have to worry about iterator's keeping the ColumnFamilyData reference. AWAK the Iterator holds SuperVersion reference, which is essentially the same as holding ColumnFamilyData reference. And the code of CleanupIteratorState() is similar to ~ColumnFamilyHandleImpl(): release the reference and then PurgeObsoleteFiles(). Therefore it's reasonable for iterator's keeping ColumnFamilyData reference (or instead of the SuperVersion reference).

@maysamyabandeh maysamyabandeh assigned siying and unassigned adamretter Nov 11, 2019
@siying
Copy link
Contributor

siying commented Nov 12, 2019

@javeme the whole point of having SuperVersion is to avoid the cost of acquiring DB mutex for each get or iterator, which used to be a major bottleneck to the system. Bringing back DB mutex in the normal read paths will almost definitely introduce performance bottleneck. Even if DB mutex is not held, frequent referencing ColumnFamilyData is not what we want to do in the long term. It is true that we reference SuperVersion in iterators, but we do hope to get rid of it using a sharded pool of SuperVersion. Referencing ColumnFamilyData defeated the plan.

If you try to solve the problem, the path is to have SuperVersion to reference ColumnFamilyData, in this way, normal reads will not need to hold DB mutex. There is some complexity about the solution that ColumnFamilyData references current SuperVersion too, so it is a circular. But it should be able to made work with some efforts. You are welcome to give it a try.

@javeme
Copy link
Contributor Author

javeme commented Nov 13, 2019

@siying thanks, I get the context.

I will try to follow the path you said, and have found two implementations:

  1. The first way is to let SuperVersion reference ColumnFamilyData, but ColumnFamilyData does not reference SuperVersion, SuperVersion is moved to the upper layer of ColumnFamilyData, so that the code using ColumnFamilyData are changed to use SuperVersion, this implementation seems reasonable, but involves too many code changes, it is really a lot of places to use ColumnFamilyData.
ColumnFamilyHandleImpl --> SuperVersion --> ColumnFamilyData
           Flush & Compaction --^
  1. The second way is to let SuperVersion reference ColumnFamilyData, and let ColumnFamilyHandleImpl reference SuperVersion instead of ColumnFamilyData, but keep ColumnFamilyData referencing SuperVersion. This way may change less code, only need to change the code related to ColumnFamilyHandleImpl.
ColumnFamilyHandleImpl --> SuperVersion <--> ColumnFamilyData <-- Flush & Compaction

What do you think about the second way?

@siying
Copy link
Contributor

siying commented Nov 14, 2019

I agree that 1 is very complicated to implement. 2 is very tricky too. ColumnFamilyHandle right now is a simple wrapper of ColumnFamilyData, so other than as a pointer to ColumnFamilyData it is immutable and can be freely used in all the threads. Introducing a super version, which can be changed would make it really complicated.

I think we might be able to just resolve the circular reference when doing cleaning up. When we clean up column family data, can we just simply dereferencce current super version and let the the dereference and clean up super version to clear up ColumnFamilyData if it in turn becomes the last reference to the ColumnFamilyData?

@javeme
Copy link
Contributor Author

javeme commented Nov 15, 2019

It seems that circular reference can't be broken, and there is no chance to delete ColumnFamilyData since each other's refs are greater than 0. Therefore, we are unable to clean up ColumnFamilyData forever. (◞‸◟)

@javeme
Copy link
Contributor Author

javeme commented Nov 15, 2019

If we can't take advantage of circular reference, can we do ColumnFamilyData.Ref() at the time of SuperVersion.Ref() like ColumnFamilyData::GetReferencedSuperVersion(), and do ColumnFamilyData.Unref() when dereferencing SuperVersion? Only hold DB mutex when deleting ColumnFamilyData with ColumnFamilyData.refs=0.

@siying
Copy link
Contributor

siying commented Nov 15, 2019

It seems that circular reference can't be broken, and there is no chance to delete ColumnFamilyData since each other's refs are greater than 0. Therefore, we are unable to clean up ColumnFamilyData forever. (◞‸◟)

I'm not sure I understand it. If SuperVersion has reference 1 by CFD and CFD has reference 1 by SuperVersion, can a function call TryDeleteCft() simply dereferences super version, which would triggers a SuperVersion::Cleanup(), which dereference CFD and destroy it?

@javeme
Copy link
Contributor Author

javeme commented Nov 18, 2019

Is the TryDeleteCfd() probably like this?

ColumnFamilyData::TryDeleteCfd() {
  this->Unref();
  if (this->refs_ == 1)
    if (super_version_->Unref())
      delete super_version_;
}

Still 2 key questions to confirm with you:

  1. There are about 20 places to try to delete cfd, do we need to replace them with TryDeleteCfd()? Or just call TryDeleteCfd() in ~ColumnFamilyHandleImpl() method?
    if (cfd_->Unref()) {
      delete cfd_;
    }
  1. Similarly, we should also add TryDeleteSuperVersion() method, and have to replace all super_version->Unref() with TryDeleteSuperVersion() to delete SuperVersion when SuperVersion.refs_==1, are we expecting this?
SuperVersion::TryDeleteSuperVersion() {
  this->Unref();
  assert this->refs_ > 0;
  if (this->refs_ == 1) {
    this->Cleanup(); // will call cfd
    delete this; // will delete cfd if cfd->Unref() returns true
  }

@javeme
Copy link
Contributor Author

javeme commented Dec 3, 2019

@siying Any suggestions? so that we can continue.

@siying
Copy link
Contributor

siying commented Dec 4, 2019

@javeme I actually means:

ColumnFamilyData::TryDeleteCfd() {
    super_version_->Unref();
}

and

SuperVersion::Unref() {
    assert(cfd != nullptr);
    int old_refs = refs_.fetch_sub(1);
    assert(old_refs > 0);
    if(old_refs == 1) {
      if (cfd->Unref()) {
        delete cfd;
        cfd = nullptr;
      }
      return true;
    }
    return false;
}

May have some bugs there though.

@siying
Copy link
Contributor

siying commented Dec 4, 2019

Hmm,

      if (cfd->Unref()) {
        delete cfd;
        cfd = nullptr;
      }

may need to be moved after Cleanup() in some way.

@adamretter
Copy link
Collaborator

@javeme To answer the original concern about the Java API. I do not want to introduce book-keeping (reference counting) or synchronisation into that API unless absolutely necessary. Whilst doing so might make it easier for some users who are sharing objects in a concurrent manner, it will have a negative impact for other users who want the fast possible single-thread performance, or who have different concurrency concerns.

In addition, I think in most places it makes sense for the Java API and expected behaviour to mirror the C++ API, albeit with some of the unsafe/rough edges rounded off.

@javeme
Copy link
Contributor Author

javeme commented Dec 5, 2019

@maysamyabandeh Ok, we'll try the way siying said.

@javeme
Copy link
Contributor Author

javeme commented Dec 9, 2019

@siying Do you mean that let SuperVersion hold a pointer to ColumnFamilyData, but don't increase the reference count of ColumnFamilyData?

And change all the delete cfd statements to cfd->TryDeleteCfd()? like:

Compaction::~Compaction() {
  if (input_version_ != nullptr) {
    input_version_->Unref();
  }
  if (cfd_ != nullptr) {
    if (cfd_->Unref()) {
      delete cfd_;
    }
  }
}

=>

Compaction::~Compaction() {
  if (input_version_ != nullptr) {
    input_version_->Unref();
  }
  if (cfd_ != nullptr) {
    if (cfd_->Unref()) {
      cfd_->TryDeleteCfd(); // call cfd_->TryDeleteCfd() instead of delete cfd_
    }
  }
}

@javeme
Copy link
Contributor Author

javeme commented Dec 9, 2019

If SuperVersion has reference 1 by CFD and CFD has reference 1 by SuperVersion, can a function call TryDeleteCft() simply dereferences super version, which would triggers a SuperVersion::Cleanup(), which dereference CFD and destroy it?

If CFD has reference 1 by SuperVersion, then cfd->Unref() will always return false (due to circular reference), therefore there is a problem: when do we call TryDeleteCfd()?

I think TryDeleteCfd() can only be called when the CFD reference is 0 (means cfd->Unref() returns true). To make an opportunity to call TryDeleteCfd(), I have found two ways:

  1. SuperVersion has reference 1 by CFD and CFD has reference 1 by SuperVersion, but let cfd->Unref() return true when CFD reference is 1, which is kept by SuperVersion. Previously it only returns true when CFD reference is 0.
  bool ColumnFamilyData::Unref() {
    int old_refs = refs_.fetch_sub(1);
    assert(old_refs > 0);
    return old_refs == 1;
  }
 =>
  bool ColumnFamilyData::Unref() {
    int old_refs = refs_.fetch_sub(1);
    assert(old_refs > 0);
    return old_refs == 2;
  }
  1. Let SuperVersion hold a pointer to CFD, but don't increase the reference count of CFD (we just assume that SuperVersion holds a reference to CFD, so that we can keep cfd->Unref() return true when CFD reference is 0), CFD can only be deleted by SuperVersion when SuperVersion Cleanup.

@siying
Copy link
Contributor

siying commented Dec 9, 2019

I mean SuperVersion holds a reference to CFD. It's the point of fixing the crash.

@siying
Copy link
Contributor

siying commented Dec 9, 2019

Can TryDeleteCfd() always deference CFD by 1, and also dereference super version by 1?

@javeme
Copy link
Contributor Author

javeme commented Dec 10, 2019

Can TryDeleteCfd() always deference CFD by 1, and also dereference super version by 1?

Yes no problem. In fact I have proposed this way in #5982 (comment), I thought you may disagree 🥇

ColumnFamilyData::TryDeleteCfd() {
  this->Unref();
  if (this->refs_ == 1)
    if (super_version_->Unref())
      delete super_version_;
}

I will submit a patch after a while, thanks.

javeme added a commit to javeme/rocksdb that referenced this issue Dec 10, 2019
It's easy to cause coredump when closing ColumnFamilyHandle with unreleased
iterators, especially iterators release is controlled by java GC when using JNI.

This patch fixed concurrent CF iteration and drop, we let iterators(actually
SuperVersion) hold a ColumnFamilyData reference to prevent the CF from being
released too early.

fixed facebook#5982
wolfkdy pushed a commit to wolfkdy/rocksdb that referenced this issue Dec 22, 2019
Summary:
It's easy to cause coredump when closing ColumnFamilyHandle with unreleased iterators, especially iterators release is controlled by java GC when using JNI.

This patch fixed concurrent CF iteration and drop, we let iterators(actually SuperVersion) hold a ColumnFamilyData reference to prevent the CF from being released too early.

fixed facebook#5982
Pull Request resolved: facebook#6147

Differential Revision: D18926378

fbshipit-source-id: 1dff6d068c603d012b81446812368bfee95a5e15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants