Drastically decreasing read performence when range of keys are deleted #610

pcmind · 2018-08-03T21:54:42Z

Performance of Iteration over a range of keys is drastically affected when multiple keys that share some comon prefix where previously deleted.

The use case to reproduce this issue is as follow:

Put a set of 1000 keys that share a common prefix.
Delete them all
For each key in the set do a search using a new DB::NewIterator (it would make more sense to use DB::get, but assume you are searching for more than one entry)
Repeat step 1, 2 and 3. Each loop performance hit is big.

I know that as mentioned in issue #83, commit 748539c would mitigate this issue. But as show by the following example, the implemented solution does not mitigate completely the issue. This is specially relevant when using LevelDB easily with prefix searches (or mutable indexes).

I made a simple unit test to show the issue:

TEST(DBTest, RangeDeleteAndRead) {
  do {
    for (int k = 0; k < 27; ++k) {
      Env *env = Env::Default();
      uint64_t start_micros = env->NowMicros();
      for (int i = 0; i < 2000; ++i) {
        //search for prefix
        std::string prefix = Key(i);
        std::string key1 = prefix + "-1";
        std::string key2 = prefix + "-2";
        //range search with key prefix. No entries exist but each k+1 more time is lost here
        Iterator *iter = db_->NewIterator(ReadOptions());
        iter->Seek(prefix);
        ASSERT_TRUE(!iter->Valid());
        delete iter;

        //insert values
        Put(key1, "value1");
        Put(key2, "value2");
      }
      uint64_t stop_micros = env->NowMicros();

      //delete all entries
      Iterator *iter = db_->NewIterator(ReadOptions());
      WriteBatch wb;
      iter->Seek("key");
      while (iter->Valid()) {
        if (iter->key().ToString().find("key") != 0) {
          break;
        }
        wb.Delete(iter->key().ToString());
        iter->Next();
      }
      WriteOptions wo;
      db_->Write(wo, &wb);
      delete iter;
      unsigned int us = (stop_micros - start_micros)/1000;
      fprintf(stderr,
              "Run loop %d took %d ms\n",
              k, us);
    }
  } while (ChangeOptions());
}

Running this test I get the following result:

==== Test DBTest.RangeDeleteAndRead
Run loop 0 took 78 ms
Run loop 1 took 1671 ms
Run loop 2 took 3219 ms
Run loop 3 took 4782 ms
Run loop 4 took 6312 ms
Run loop 5 took 7843 ms
...
Run loop 26 98922 ms
...

The performance issue due to the fact that db_iter knows nothing about the prefix being searched by the end user.

Adding something like:

if(!SharePrefix(&ikey)) {
  break;
}

to db_iter.cc#L179 improve drastically performance:

==== Test DBTest.RangeDeleteAndRead
Run loop 0 took 93 ms
Run loop 1 took 94 ms
Run loop 2 took 109 ms
Run loop 3 took 125 ms
Run loop 4 took 109 ms
Run loop 5 took 94 ms
...
Run loop 26 took 250 ms
...

Would it be nice to add an API to give prefix being searched to iterator and stop looking for more data when no more keys are available?

The text was updated successfully, but these errors were encountered:

Implemente a solution to the issue described by google/leveldb#610

qduyang · 2019-01-24T06:46:59Z

Hi, I may meet the same issue with frequent query, any update?

Implemente a solution to the issue described by google/leveldb#610

qduyang · 2019-02-15T02:20:47Z

Why google failed to handle those known issues in such a long time?

felipecrv · 2019-03-17T16:49:33Z

@pcmind @qduyang the fix assumes the default key comparator (memcmp) is being used. Users can define their own comparator and define an ordering that invalidates the assumption that neighbor keys share a prefix.

pcmind · 2019-03-18T21:00:08Z

Yes, fix only shows that stopping early can improve greatly iteration time.
I think that final solution should be implemented with a more generic interface, something like iterator(fromKey, fromInclusive, toKey, toInclusive) (because we can iterate in both directions). This would serve more use case and require no changes to Comparator.

felipecrv · 2019-03-19T20:35:52Z

@pcmind totally! An internal iterator that is bounded in an inclusive key range would enable these early returns. 👏

Implement a solution to the issue described by google/leveldb#610

Implement a possible solution for the issue google/leveldb#610

* format all documents according to contributor guidelines and specifications use clang-format on/off to stop formatting when it makes excessively poor decisions * format all tests as well, and mark blocks which change too much

cmumford added the enhancement label Sep 11, 2018

pcmind added a commit to pcmind/leveldb that referenced this issue Oct 23, 2018

Prefix search implementation

99122aa

Implemente a solution to the issue described by google/leveldb#610

pcmind added a commit to pcmind/leveldb that referenced this issue Nov 5, 2018

Prefix search implementation

b7c054b

Implemente a solution to the issue described by google/leveldb#610

pcmind added a commit to pcmind/leveldb that referenced this issue Nov 26, 2018

Prefix search implementation

44ab18f

Implemente a solution to the issue described by google/leveldb#610

pcmind added a commit to pcmind/leveldb that referenced this issue Nov 26, 2018

Prefix search implementation

da3771e

Implemente a solution to the issue described by google/leveldb#610

pcmind added a commit to pcmind/leveldb that referenced this issue Jan 27, 2019

Prefix search implementation

fee000b

Implemente a solution to the issue described by google/leveldb#610

pcmind added a commit to pcmind/leveldb that referenced this issue Jul 29, 2019

Prefix search implementation

d2f928b

Implement a solution to the issue described by google/leveldb#610

pcmind added a commit to pcmind/leveldb that referenced this issue Jul 29, 2019

Prefix iterator implementation

32cd8af

Implement a possible solution for the issue google/leveldb#610

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drastically decreasing read performence when range of keys are deleted #610

Drastically decreasing read performence when range of keys are deleted #610

pcmind commented Aug 3, 2018 •

edited

qduyang commented Jan 24, 2019

qduyang commented Feb 15, 2019

felipecrv commented Mar 17, 2019

pcmind commented Mar 18, 2019

felipecrv commented Mar 19, 2019

Drastically decreasing read performence when range of keys are deleted #610

Drastically decreasing read performence when range of keys are deleted #610

Comments

pcmind commented Aug 3, 2018 • edited

qduyang commented Jan 24, 2019

qduyang commented Feb 15, 2019

felipecrv commented Mar 17, 2019

pcmind commented Mar 18, 2019

felipecrv commented Mar 19, 2019

pcmind commented Aug 3, 2018 •

edited