-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coroutinize distributed loader #10609
Coroutinize distributed loader #10609
Conversation
CI state |
0bfd569
to
b36d004
Compare
In v2 (b36d004):
|
CI state |
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
…ss calls Without that there's use-after-free when called from distributed_loader::make_sstables_available where func is turned into a coroutine and the shared_sstable parameter is not explicitly copied and captured for the continuation of sst->move_to_new_dir. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
…st_dirs Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
…dirs Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
b36d004
to
b3e2204
Compare
In v3 (b3e2204):
|
CI state |
@xemul ping. please re-review. |
} | ||
|
||
return table.add_sstables_and_update_cache(new_sstables).handle_exception([&table] (std::exception_ptr ep) { | ||
co_await table.add_sstables_and_update_cache(new_sstables).handle_exception([&table] (std::exception_ptr ep) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erm... I don't insist, but shouldn't it rather look like
try {
co_await table.add_sstables_and_update_cache();
} catch (...) {
dblog.error(...);
abort();
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, but why throw exception when they can be handled elegantly as above?
I'd use try/catch if the exception handling would have covered a number of statements where we don't care which of them failed and they have common exception handling.
}); | ||
}).then([start, total_size, ks_name, table_name] { | ||
co_await d.reshard(std::move(info_vec), cm, table, max_threshold, creator, iop); | ||
co_await d.move_foreign_sstables(dir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't dir go out of scope because of this? There had been several bugs like that already, e.g. see my patch Save coroutine's captured variable on stack
(to be fair -- I don't understand this problem completely)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dir
is passed by reference and is instantiated in a seastar thread in distributed_loader::process_upload_dir
so it doesn't go out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(to be fair -- I don't understand this problem completely)
I think the following explains it well.
A lambda results in a closure object with storage, often on the stack,
that will go out of scope at some point. When the closure object goes
out of scope the captures will also go out of scope.
Normal lambdas will have finished executing by this time so it is not a problem.
Coroutine lambdas may resume from suspension after the closure object has
destructed and at that point all captures will be use-after-free memory access.
The point is that the lambda object is temporary and goes out of scope when to lambda coroutine suspends.
One way to extend its lifetime is, if the calling function is a coroutine as well, keep the coroutine lambda in an automatic variable and that variable is retained on the calling coroutine stack frame.
I.e. instead of e.g.
co_await parallel_for_each([captures] () -> future<> { co_await async_call; });
do:
auto f = [captures] () -> future<> { co_await async_call; };
co_await parallel_for_each(f);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does coroutine::parallel_for_each help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does coroutine::parallel_for_each help?
Yes, since it moves the Func&& to the coroutine::parallel_for_each object. and so it extends its lifetime until it's fully resolved. but I'm not sure all continuations do that.
Anyhow, it's not required for this particular use case.
Before touching any of this code for #9559,
that requires a change when loading sstables from the staging subdirectory,
simplify it using coroutines.