Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coroutinize distributed loader #10609

Merged

Conversation

bhalevy
Copy link
Member

@bhalevy bhalevy commented May 19, 2022

Before touching any of this code for #9559,
that requires a change when loading sstables from the staging subdirectory,
simplify it using coroutines.

@bhalevy bhalevy requested review from xemul and cmm May 19, 2022 12:04
@scylladb-promoter
Copy link
Contributor

@bhalevy bhalevy force-pushed the coroutinize-distributed_loader branch from 0bfd569 to b36d004 Compare May 20, 2022 10:44
@bhalevy bhalevy requested review from tgrabiec and nyh as code owners May 20, 2022 10:44
@bhalevy
Copy link
Member Author

bhalevy commented May 20, 2022

In v2 (b36d004):

  • kept seastar threads where deferred_stop(directory) was required.
    • to keep 'em simple
  • fixed premature freeing of func in sstable_directory: parallel_for_each_restricted that causes use-after-free exposed by coroutinizing make_sstables_available
  • rebased + retested

@scylladb-promoter
Copy link
Contributor

bhalevy added 19 commits May 20, 2022 17:07
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
…ss calls

Without that there's use-after-free when called from
distributed_loader::make_sstables_available where
func is turned into a coroutine and the shared_sstable parameter
is not explicitly copied and captured for the continuation
of sst->move_to_new_dir.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
…st_dirs

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
…dirs

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
@bhalevy bhalevy force-pushed the coroutinize-distributed_loader branch from b36d004 to b3e2204 Compare May 20, 2022 14:43
@bhalevy
Copy link
Member Author

bhalevy commented May 20, 2022

In v3 (b3e2204):

@bhalevy bhalevy requested a review from xemul May 20, 2022 14:46
@scylladb-promoter
Copy link
Contributor

@bhalevy
Copy link
Member Author

bhalevy commented May 24, 2022

@xemul ping. please re-review.

}

return table.add_sstables_and_update_cache(new_sstables).handle_exception([&table] (std::exception_ptr ep) {
co_await table.add_sstables_and_update_cache(new_sstables).handle_exception([&table] (std::exception_ptr ep) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm... I don't insist, but shouldn't it rather look like

try {
    co_await table.add_sstables_and_update_cache();
} catch (...) {
    dblog.error(...);
    abort();
}

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but why throw exception when they can be handled elegantly as above?
I'd use try/catch if the exception handling would have covered a number of statements where we don't care which of them failed and they have common exception handling.

});
}).then([start, total_size, ks_name, table_name] {
co_await d.reshard(std::move(info_vec), cm, table, max_threshold, creator, iop);
co_await d.move_foreign_sstables(dir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't dir go out of scope because of this? There had been several bugs like that already, e.g. see my patch Save coroutine's captured variable on stack (to be fair -- I don't understand this problem completely)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dir is passed by reference and is instantiated in a seastar thread in distributed_loader::process_upload_dir
so it doesn't go out of scope.

Copy link
Member Author

@bhalevy bhalevy May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(to be fair -- I don't understand this problem completely)

I think the following explains it well.

https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#cp51-do-not-use-capturing-lambdas-that-are-coroutines

A lambda results in a closure object with storage, often on the stack,
that will go out of scope at some point. When the closure object goes
out of scope the captures will also go out of scope.
Normal lambdas will have finished executing by this time so it is not a problem.
Coroutine lambdas may resume from suspension after the closure object has
destructed and at that point all captures will be use-after-free memory access.

The point is that the lambda object is temporary and goes out of scope when to lambda coroutine suspends.
One way to extend its lifetime is, if the calling function is a coroutine as well, keep the coroutine lambda in an automatic variable and that variable is retained on the calling coroutine stack frame.

I.e. instead of e.g.

    co_await parallel_for_each([captures] () -> future<> { co_await async_call; });

do:

    auto f = [captures] () -> future<> { co_await async_call; };
    co_await parallel_for_each(f);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does coroutine::parallel_for_each help?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does coroutine::parallel_for_each help?

Yes, since it moves the Func&& to the coroutine::parallel_for_each object. and so it extends its lifetime until it's fully resolved. but I'm not sure all continuations do that.

Anyhow, it's not required for this particular use case.

@scylladb-promoter scylladb-promoter merged commit ed23e83 into scylladb:master May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants