Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator #12480

Closed
wants to merge 13 commits into from

Conversation

jaykorean
Copy link
Contributor

@jaykorean jaykorean commented Mar 25, 2024

Summary

There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic Iterator interface.

  • The default behavior of value()/columns() returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
  • The attribute_groups() API, which is not yet implemented, will not be useful for a single-CF iterator.

In this PR, we are implementing the following changes:

  • IteratorBase introduced, which includes all basic iterator functions except value() and columns().
  • Iterator, which now inherits from IteratorBase, includes value() and columns().
  • New public interface AttributeGroupIterator inherits from IteratorBase and additionally includes attribute_groups() (to be implemented).
  • Renamed former MultiCfIterator to CoalescingIterator which inherits from Iterator
  • Existing MultiCfIteratorTest has been split into two - CoalescingIteratorTest and AttributeGroupIteratorTest.
  • Moved AttributeGroup related code from wide_columns.h to a new file, attribute_groups.h.

Some Implementation Details

  • MultiCfIteratorImpl takes two functions - populate_func and reset_func and use them to populate value_ and columns_ in CoalescingIterator and attribute_groups_ in AttributeGroupIterator. In CoalescingIterator, populate_func is Coalesce(), in AttributeGroupIterator populate_func is AddToAttributeGroups(). reset_func clears populated value_, columns_ and attribute_groups_ accordingly.
  • Coalesce() merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.

For example, if CF1 has "key_1" ==> {"col_1": "foo", "col_2", "baz"} and CF2 has "key_1" ==> {"col_2": "quux", "col_3", "bla"}, and when the iterator is at key_1, columns() will return {"col_1": "foo", "col_2", "quux", "col_3", "bla"}

In this example, value() will be empty, because none of them have values for kDefaultColumnName

Test Plan

Unit Test

./multi_cf_iterator_test

Performance Test

To make sure this change does not impact existing Iterator performance

Build

$> make -j64 release

Setup

$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none

Run

TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000

Before the change

DB path: [/dev/shm/db_bench/dbbench]
newiterator  :       0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom   :       5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)

After the change

DB path: [/dev/shm/db_bench/dbbench]
newiterator  :       0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom   :       5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)

@jaykorean jaykorean changed the title [MultiCFIterator Refactor] Introduce Coalescing Iterator MultiCFIterator Refactor - Introduce IteratorBase & Coalescing Iterator Mar 26, 2024
@jaykorean jaykorean changed the title MultiCFIterator Refactor - Introduce IteratorBase & Coalescing Iterator MultiCFIterator Refactor - Introduce IteratorBase & CoalescingIterator Mar 26, 2024
@jaykorean jaykorean changed the title MultiCFIterator Refactor - Introduce IteratorBase & CoalescingIterator MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator Mar 26, 2024
@jaykorean jaykorean marked this pull request as ready for review March 26, 2024 03:43
@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design looks great overall, thanks a lot @jaykorean ! Left some comments/questions below

db/attribute_group_iterator_impl.h Outdated Show resolved Hide resolved
db/coalescing_iterator.h Outdated Show resolved Hide resolved
db/coalescing_iterator.h Outdated Show resolved Hide resolved
db/db_impl/db_impl.cc Outdated Show resolved Hide resolved
db/db_impl/db_impl.cc Outdated Show resolved Hide resolved
db/multi_cf_iterator_impl.h Show resolved Hide resolved
include/rocksdb/attribute_groups.h Outdated Show resolved Hide resolved
include/rocksdb/attribute_groups.h Outdated Show resolved Hide resolved
include/rocksdb/iterator_base.h Outdated Show resolved Hide resolved
include/rocksdb/options.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.


namespace ROCKSDB_NAMESPACE {

// UNDER CONSTRUCTION - DO NOT USE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to lift this after adding this to stress_test

@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @jaykorean !! This looks awesome!

db/coalescing_iterator.cc Show resolved Hide resolved
db/coalescing_iterator.h Outdated Show resolved Hide resolved
std::function<void()> reset_func,
std::function<void(ColumnFamilyHandle*, Iterator*)> populate_func)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor and we can do this as a follow-up but we can avoid the overhead of std::function by turning MultiCfIteratorImpl into a class template that takes the types of these two functors as template parameters.

Copy link
Contributor Author

@jaykorean jaykorean Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using the template first, but had a couple of issues that I ended up using std::function here.

  1. The compiler was complaining with the following message and I still haven't figured out the reason.
./db/multi_cf_iterator_impl.h: In member function ‘void rocksdb::MultiCfIteratorImpl<ResetFuncType, PopulateFuncType>::InitMinHeap()’:
./db/multi_cf_iterator_impl.h:175:33: error: expected primary-expression before ‘>’ token
  175 |     heap_.emplace<MultiCfMinHeap>(
  1. Minor, but in CoalescingIterator and AttributeGroupIterator, the impl had to contain the types of the two functions. e.g. MultiCfIteratorImpl<std::function<void()>, std::function<void(ColumnFamilyHandle*, Iterator*)>> impl_; which I found not as neaty/readable as I wanted it to be.

db/multi_cf_iterator_impl.h Outdated Show resolved Hide resolved
db/multi_cf_iterator_impl.h Outdated Show resolved Hide resolved
db/multi_cf_iterator_impl.h Show resolved Hide resolved
db/multi_cf_iterator_impl.h Outdated Show resolved Hide resolved
include/rocksdb/attribute_groups.h Outdated Show resolved Hide resolved
db/multi_cf_iterator_test.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jaykorean merged this pull request in 58a98bd.

@jaykorean jaykorean deleted the iterator_base branch April 11, 2024 20:26
facebook-github-bot pushed a commit that referenced this pull request Apr 16, 2024
…on (#12534)

Summary:
Continuing from the previous MultiCfIterator Implementations - (#12422, #12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`.

Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL`

Here are some implementation details:
- `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration.
- `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap.
- `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap.
- Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once.
- This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead.

Pull Request resolved: #12534

Test Plan:
Uncommented the assertions in `verifyAttributeGroupIterator()`

```
./multi_cf_iterator_test
```

Reviewed By: ltamasi

Differential Revision: D56089019

Pulled By: jaykorean

fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants