Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drastically decreasing read performence when range of keys are deleted #610

Open
pcmind opened this issue Aug 3, 2018 · 5 comments
Open

Comments

@pcmind
Copy link

pcmind commented Aug 3, 2018

Performance of Iteration over a range of keys is drastically affected when multiple keys that share some comon prefix where previously deleted.

The use case to reproduce this issue is as follow:

  1. Put a set of 1000 keys that share a common prefix.
  2. Delete them all
  3. For each key in the set do a search using a new DB::NewIterator (it would make more sense to use DB::get, but assume you are searching for more than one entry)
  4. Repeat step 1, 2 and 3. Each loop performance hit is big.

I know that as mentioned in issue #83, commit 748539c would mitigate this issue. But as show by the following example, the implemented solution does not mitigate completely the issue. This is specially relevant when using LevelDB easily with prefix searches (or mutable indexes).

I made a simple unit test to show the issue:

TEST(DBTest, RangeDeleteAndRead) {
  do {
    for (int k = 0; k < 27; ++k) {
      Env *env = Env::Default();
      uint64_t start_micros = env->NowMicros();
      for (int i = 0; i < 2000; ++i) {
        //search for prefix
        std::string prefix = Key(i);
        std::string key1 = prefix + "-1";
        std::string key2 = prefix + "-2";
        //range search with key prefix. No entries exist but each k+1 more time is lost here
        Iterator *iter = db_->NewIterator(ReadOptions());
        iter->Seek(prefix);
        ASSERT_TRUE(!iter->Valid());
        delete iter;

        //insert values
        Put(key1, "value1");
        Put(key2, "value2");
      }
      uint64_t stop_micros = env->NowMicros();

      //delete all entries
      Iterator *iter = db_->NewIterator(ReadOptions());
      WriteBatch wb;
      iter->Seek("key");
      while (iter->Valid()) {
        if (iter->key().ToString().find("key") != 0) {
          break;
        }
        wb.Delete(iter->key().ToString());
        iter->Next();
      }
      WriteOptions wo;
      db_->Write(wo, &wb);
      delete iter;
      unsigned int us = (stop_micros - start_micros)/1000;
      fprintf(stderr,
              "Run loop %d took %d ms\n",
              k, us);
    }
  } while (ChangeOptions());
}

Running this test I get the following result:

==== Test DBTest.RangeDeleteAndRead
Run loop 0 took 78 ms
Run loop 1 took 1671 ms
Run loop 2 took 3219 ms
Run loop 3 took 4782 ms
Run loop 4 took 6312 ms
Run loop 5 took 7843 ms
...
Run loop 26 98922 ms
...

The performance issue due to the fact that db_iter knows nothing about the prefix being searched by the end user.

Adding something like:

if(!SharePrefix(&ikey)) {
  break;
}

to db_iter.cc#L179 improve drastically performance:

==== Test DBTest.RangeDeleteAndRead
Run loop 0 took 93 ms
Run loop 1 took 94 ms
Run loop 2 took 109 ms
Run loop 3 took 125 ms
Run loop 4 took 109 ms
Run loop 5 took 94 ms
...
Run loop 26 took 250 ms
...

Would it be nice to add an API to give prefix being searched to iterator and stop looking for more data when no more keys are available?

pcmind added a commit to pcmind/leveldb that referenced this issue Oct 23, 2018
Implemente a solution to the issue described by google/leveldb#610
pcmind added a commit to pcmind/leveldb that referenced this issue Nov 5, 2018
Implemente a solution to the issue described by google/leveldb#610
pcmind added a commit to pcmind/leveldb that referenced this issue Nov 26, 2018
Implemente a solution to the issue described by google/leveldb#610
pcmind added a commit to pcmind/leveldb that referenced this issue Nov 26, 2018
Implemente a solution to the issue described by google/leveldb#610
@qduyang
Copy link

qduyang commented Jan 24, 2019

Hi, I may meet the same issue with frequent query, any update?

pcmind added a commit to pcmind/leveldb that referenced this issue Jan 27, 2019
Implemente a solution to the issue described by google/leveldb#610
@qduyang
Copy link

qduyang commented Feb 15, 2019

Why google failed to handle those known issues in such a long time?

@felipecrv
Copy link
Contributor

@pcmind @qduyang the fix assumes the default key comparator (memcmp) is being used. Users can define their own comparator and define an ordering that invalidates the assumption that neighbor keys share a prefix.

@pcmind
Copy link
Author

pcmind commented Mar 18, 2019

Yes, fix only shows that stopping early can improve greatly iteration time.
I think that final solution should be implemented with a more generic interface, something like iterator(fromKey, fromInclusive, toKey, toInclusive) (because we can iterate in both directions). This would serve more use case and require no changes to Comparator.

@felipecrv
Copy link
Contributor

@pcmind totally! An internal iterator that is bounded in an inclusive key range would enable these early returns. 👏

pcmind added a commit to pcmind/leveldb that referenced this issue Jul 29, 2019
Implement a solution to the issue described by google/leveldb#610
pcmind added a commit to pcmind/leveldb that referenced this issue Jul 29, 2019
Implement a possible solution for the issue google/leveldb#610
maochongxin pushed a commit to maochongxin/leveldb that referenced this issue Jul 21, 2022
* format all documents according to contributor guidelines and specifications
use clang-format on/off to stop formatting when it makes excessively poor decisions

* format all tests as well, and mark blocks which change too much
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants