Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: prevent non-owner cpu access to shared_ptr #14532

Closed
wants to merge 1 commit into from

Conversation

Deexie
Copy link
Contributor

@Deexie Deexie commented Jul 5, 2023

In get_sstables_for_key in api/column_family.cc a set of lw_shared_ptrs to sstables is passes to reducer of map_reduce0. Reducer then accesses these shared pointers. As reducer is invoked on the same shard map_reduce0 is called, we have an illegal access to shared pointer on non-owner cpu.

A set of shared pointers to sstables is trasnsformed in map function, which is guaranteed to be invoked on a shard associated with the service.

Fixes: #14515.

@scylladb-promoter
Copy link
Contributor

@scylladb/scylla-maint please review and trigger CI for this pull request

@nyh
Copy link
Contributor

nyh commented Jul 5, 2023

@scylladb/scylla-maint please review and trigger CI for this pull request

@benipeled @avikivity do we have a bug in the list of ScyllaDB developers?

@benipeled
Copy link
Contributor

@scylladb/scylla-maint please review and trigger CI for this pull request

Yes, I assumed that the dev github-team is maintained and new developers added to this team as part of the onboarding process but it's not true - I'm working on updating the team

@benipeled
Copy link
Contributor

@scylladb-promoter
Copy link
Contributor

@Deexie Deexie requested a review from bhalevy July 7, 2023 16:01
return ctx.db.map_reduce0([key, uuid] (replica::database& db) -> future<std::unordered_set<sstring>> {
auto sstables = co_await db.find_column_family(uuid).get_sstables_by_partition_key(key);
auto names = sstables | boost::adaptors::transformed([] (auto s) { return s->get_filename(); });
co_return std::unordered_set<sstring>(names.begin(), names.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be achieved more simply in a single pass with boost::copy_range<std::unordered_set>(sstables | boost::adaptors::transformed ...)

auto names = b | boost::adaptors::transformed([] (auto s) { return s->get_filename(); });
a.insert(names.begin(), names.end());
[](std::unordered_set<sstring> a, std::unordered_set<sstring>&& b) mutable {
a.insert(b.begin(), b.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.merge(b) may be slightly more effcient (and shorter)

@bhalevy
Copy link
Member

bhalevy commented Jul 7, 2023

Since when the regression?
Can you please regerence the patch that introduced it?

In get_sstables_for_key in api/column_family.cc a set of lw_shared_ptrs
to sstables is passes to reducer of map_reduce0. Reducer then accesses
these shared pointers. As reducer is invoked on the same shard
map_reduce0 is called, we have an illegal access to shared pointer
on non-owner cpu.

A set of shared pointers to sstables is trasnsformed in map function,
which is guaranteed to be invoked on a shard associated with the service.

Fixes: scylladb#14515.
@Deexie
Copy link
Contributor Author

Deexie commented Jul 9, 2023

Regression since 198bca9.

@Deexie
Copy link
Contributor Author

Deexie commented Jul 9, 2023

  • use shorter forms

@scylladb-promoter
Copy link
Contributor

@bhalevy
Copy link
Member

bhalevy commented Jul 9, 2023

@scylladb/scylla-maint please review/merge.
Although we found this issue in dtest-debug it might silently impact non-debug build modes as well.

@avikivity
Copy link
Member

Absolutely, this bug has a very low probability of triggering a very nasty bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_scrub_with_one_node_expect_data_loss: shared_ptr accessed on non-owner cpu
6 participants