-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add command line flag to skip hot restart stats transfer #34069
Conversation
Signed-off-by: Raven Black <ravenblack@dropbox.com>
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
/wait |
Signed-off-by: Raven Black <ravenblack@dropbox.com>
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm api
/assign @yanavlasov |
nevermind, unassigned Yan as I realized Joshua is already on the reviewer list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm modulo help-text suggestion.
One possible cleanup would be to add HotRestartOptions as a class and pass that around to all the hot-restart classes rather than all the settings as separate params.
That would not make this PR smaller but would make the next ones smaller.
@@ -90,6 +90,12 @@ class Options { | |||
*/ | |||
virtual bool skipHotRestartOnNoParent() const PURE; | |||
|
|||
/** | |||
* @return bool don't get stats from the parent. If there are a lot of stats, getting them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you don't need to specify the type in a doxygen return declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not changed, for consistency with the doxygen comments nearby which say @return bool
.
source/server/options_impl.cc
Outdated
"", "skip-hot-restart-parent-stats", | ||
"When hot restarting, by default the child instance copies stats from the parent" | ||
" instance periodically during the draining period. This can potentially be an" | ||
" expensive operation; set this to true to just use new stats.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/just use new stats/reset all stats in child process/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Yeah, I think 3 options is the threshold at which I would do that, so, next time if there is one! |
Commit Message: Add command line flag to skip hot restart stats transfer
Additional Description: We had a number of issues with production servers having their main thread stuck trying to get stats from the parent instance. We're not entirely sure what the cause was, but we know stats can get big, we know that's where it was stuck, and we know there was 3 minutes of spare time during which the drain should have completed before the parent instance was hard-killed. It's possible that an out of memory occurred, it's possible the stream had a problem with the volume of data, it's possible there's a bug that causes the parent instance to lock up or crash during the stats retrieval, but one thing we can be confident of is that if we don't go down that branch at all then none of those things will happen.
Risk Level: Extremely small; no change to the usual behavior, only a change if a new command-line flag is set.
Testing: Minimal. The coverage is the same as it was before, or very slightly better.
Docs Changes: Added documentation for the command line option.
Release Notes: Yes.
Platform Specific Features: Not relevant to Windows because hot restart isn't supported there.
Relevant to: #33446