Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table: Fix quadratic behavior when inserting sstables into tracker on schema change #12593

Conversation

raphaelsc
Copy link
Member

Each time backlog tracker is informed about a new or old sstable, it will recompute the static part of backlog which complexity is proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables() for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes #12499.

Signed-off-by: Raphael S. Carvalho raphaelsc@scylladb.com

@scylladb-promoter
Copy link
Contributor

replica/table.cc Outdated
});
}

void execute() noexcept {
new_bt.replace_sstables({}, std::move(new_sstables_for_backlog_tracker));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_sstables_for_backlog_tracker can also be computed on-the-fly here using boost::copy_range<sstables::shared_sstable>(*new_sstables->all), no?
It might be slightly more efficient since it will reserve the required capacity once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is noexcept path though. also not a hot path. I can reserve new_sstables_for_backlog_tracker upfront. what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is noexcept path though.

ok, so let's do that in prepare.

also not a hot path. I can reserve new_sstables_for_backlog_tracker upfront. what do you think?

Do we know the size upfront?
I don't think sstable_set gives you that, does it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is noexcept path though.

ok, so let's do that in prepare.

also not a hot path. I can reserve new_sstables_for_backlog_tracker upfront. what do you think?

Do we know the size upfront? I don't think sstable_set gives you that, does it?

I think we can easily implement a size() method for sstable_set. Today, we have to call all() which can copy elements, but that's possibly better than reallocating log base_2 N, where N is number of elements inserted into container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compaction_backlog_tracker::replace_sstables() can fail, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is noexcept path though.

ok, so let's do that in prepare.

also not a hot path. I can reserve new_sstables_for_backlog_tracker upfront. what do you think?

Do we know the size upfront? I don't think sstable_set gives you that, does it?

I think we can easily implement a size() method for sstable_set. Today, we have to call all() which can copy elements, but that's possibly better than reallocating log base_2 N, where N is number of elements inserted into container.

Let's consider this for follow up then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compaction_backlog_tracker::replace_sstables() can fail, no?

Yup,

ret.push_back(sst);

Why does execute need to be noexcept?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, technically errors are handled their in

try {
_impl->replace_sstables(filter_and_revert_charges(old_ssts), filter_and_revert_charges(new_ssts));
} catch (...) {
cmlog.error("Disabling backlog tracker due to exception {}", std::current_exception());
// FIXME: tracker should be able to recover from a failure, e.g. OOM, by having its state reset. More details on https://github.com/scylladb/scylla/issues/10297.
disable();
}

But I'm not sure if that's the best course of action.
Seems harsh to disable the backlog tracker on e.g. transient memory pressure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is noexcept path though.

ok, so let's do that in prepare.

also not a hot path. I can reserve new_sstables_for_backlog_tracker upfront. what do you think?

Do we know the size upfront? I don't think sstable_set gives you that, does it?

I think we can easily implement a size() method for sstable_set. Today, we have to call all() which can copy elements, but that's possibly better than reallocating log base_2 N, where N is number of elements inserted into container.

Let's consider this for follow up then.

sure.

@raphaelsc raphaelsc force-pushed the fix-quadratic-behavior-backlog-tracker-on-schema-change branch from f57c507 to 7149404 Compare January 24, 2023 14:46
@raphaelsc
Copy link
Member Author

v2: replace sstables in new bt in prepare() phase, then execute() remains noexcept.

… schema change

Each time backlog tracker is informed about a new or old sstable, it
will recompute the static part of backlog which complexity is
proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables()
for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes scylladb#12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
@raphaelsc raphaelsc force-pushed the fix-quadratic-behavior-backlog-tracker-on-schema-change branch from 7149404 to ed7dbfe Compare January 24, 2023 14:53
cg.main_sstables()->for_each_sstable([this] (const sstables::shared_sstable& s) {
add_sstable_to_backlog_tracker(new_bt, s);
std::vector<sstables::shared_sstable> new_sstables_for_backlog_tracker;
new_sstables_for_backlog_tracker.reserve(cg.main_sstables()->all()->size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't all build a sstable_list?
If so, we can just keep it and use it here?

Copy link
Member Author

@raphaelsc raphaelsc Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not always. It builds a new list if the set is compound (maintenance + main or sets of all groups). Here we're working on main set only of a single group.

For partititioned_sstable_set, for example, we just return its local copy of all.

I think the interface is fragile as user can modify local copy if it doesn't copy on write.

lw_shared_ptr<sstable_list> all() const;

should be changed to

lw_shared_ptr<const sstable_list> all() const;

@scylladb-promoter
Copy link
Contributor

@scylladb-promoter
Copy link
Contributor

@raphaelsc
Copy link
Member Author

@scylladb/scylla-maint ping.

syuu1228 pushed a commit to syuu1228/scylla that referenced this pull request Jan 30, 2023
… schema change

Each time backlog tracker is informed about a new or old sstable, it
will recompute the static part of backlog which complexity is
proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables()
for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes scylladb#12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#12593
raphaelsc added a commit to raphaelsc/scylla that referenced this pull request Feb 7, 2023
… schema change

Each time backlog tracker is informed about a new or old sstable, it
will recompute the static part of backlog which complexity is
proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables()
for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes scylladb#12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#12593

(cherry picked from commit 87ee547)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
raphaelsc added a commit to raphaelsc/scylla that referenced this pull request Feb 14, 2023
… schema change

Each time backlog tracker is informed about a new or old sstable, it
will recompute the static part of backlog which complexity is
proportional to the total number of sstables.
On schema change, we're calling backlog_tracker::replace_sstables()
for each existing sstable, therefore it produces O(N ^ 2) complexity.

Fixes scylladb#12499.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb#12593

(cherry picked from commit 87ee547)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reactor stalls when inserting sstables into backlog tracker on schema change during disrupt_add_drop_column
5 participants