-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compaction: avoid excessive reallocation during input list formatting #13907
compaction: avoid excessive reallocation during input list formatting #13907
Conversation
ba72e94
to
462a9e0
Compare
v2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
CI state |
…ting with off-strategy, input list size can be close to 1k, which will lead to unneeded reallocations when formatting the list for logging. in the past, we faced stalls in this area, and excessive reallocation (log2 ~1k = ~10) may have contributed to that. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
462a9e0
to
f27268f
Compare
v3: |
CI state |
@raphaelsc / @denesb which issue is it fixing? |
@DoronArazii I don't think there is an issue. |
Is it considered a performance item, or should we consider backport? |
@avikivity , @bhalevy ? |
It is performance related, as it can cause reactor stalls like in https://github.com/scylladb/scylla-enterprise/issues/2980#issuecomment-1590363663 (first stall) Since it's low risk I'd vote for backporting it. |
yes , it merits a backport. #14247 |
os << "["; | ||
os << boost::algorithm::join(lst._ssts, ","); | ||
os << "]"; | ||
fmt::print(os, "[{}]", fmt::join(lst._ssts, ",")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guys, why don't you simply print in the loop instead of joining this into a single huge string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a lazy joiner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I see. Ok.
@@ -418,9 +418,12 @@ struct compaction_read_monitor_generator final : public read_monitor_generator { | |||
|
|||
class formatted_sstables_list { | |||
bool _include_origin = true; | |||
std::vector<sstring> _ssts; | |||
std::vector<std::string> _ssts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use std::list
or std::deque
or alike?
std::vector
is a wrong container if you expect a frequent (more than 1) size change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even our own chunked_vector
is much better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linked list is overkill, we know the size upfront, we don't have to reallocate. the container size rarely go beyond 1k, so the memory usage is bounded to ~100k bytes. we could have picked chunked_vector, but don't think it's a must.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1k -> 8kB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1k -> 8kB
I was assuming ~100 bytes for each entry, as it duplicates the directory full path for each one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we haven't seen any need to optimize this particular variable.
chunked_vector
could indeed do a good job.
Feel free to send a patch for master (I doubt it'd be backported though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and regarding vector vs. list; we do prefer a single allocation than yet another allocation per sstable.
Hence chunked_vector being a good compromise between a vector and a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we haven't seen any need to optimize this particular variable.
I see compactions of crazy amounts of sstables, especially with SAG and off-strategy.
So, in the light of #14241 1K items definitely doesn't sound like something impossible or even high enough to represent a ceiling. IMO we should have our code ready for hundreds of thousands if not millions. Will send a patch. NP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it indeed doesn't hurt to switch to chunked_vector and it can handle extreme unexpected cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc #14264
@avikivity any objection on backporting this one? |
…ting with off-strategy, input list size can be close to 1k, which will lead to unneeded reallocations when formatting the list for logging. in the past, we faced stalls in this area, and excessive reallocation (log2 ~1k = ~10) may have contributed to that. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13907 (cherry picked from commit 5544d12) Fixes #14071
…ting with off-strategy, input list size can be close to 1k, which will lead to unneeded reallocations when formatting the list for logging. in the past, we faced stalls in this area, and excessive reallocation (log2 ~1k = ~10) may have contributed to that. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13907 (cherry picked from commit 5544d12) Fixes #14071
with off-strategy, input list size can be close to 1k, which will lead to unneeded reallocations when formatting the list for logging.
in the past, we faced stalls in this area, and excessive reallocation (log2 ~1k = ~10) may have contributed to that.