New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Increase package install max timeout + add concurrency control to rollovers #166775
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
if (isStatusInstalling && hasExceededTimeout) { | ||
// If this is a forced installation, ignore the timeout and restart the installation anyway | ||
if (force) { | ||
await restartInstallation({ | ||
savedObjectsClient, | ||
pkgName, | ||
pkgVersion, | ||
installSource, | ||
verificationResult, | ||
}); | ||
} else { | ||
throw new ConcurrentInstallOperationError( | ||
`Concurrent installation or upgrade of ${pkgName || 'unknown'}-${ | ||
pkgVersion || 'unknown' | ||
} detected, aborting.` | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the edge case mentioned in #166761 (comment), I'm allowing forced installation to immediately trigger a restart of a stuck installation without honoring the timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the changes.
I am wondering if we can reproduce the apm rollover issue to test that the changes here help.
Should we backport to 8.10.2? |
We missed 8.10.2 feature freeze it looks like, and I'd like to figure out some kind of testing solution here before we ship so this won't make 8.10.2. |
I see, then maybe 8.10.3, the FF is Oct 3. |
Hmm I am on the fence about 8.10.3 here. This could probably be considered a bugfix, the more I think about it - so maybe safe to backport to that patch release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added tests for the "stuck in installing" timeout cases and the force
flag logic changes, but I wasn't able to really land on a testing implementation to ensure we don't overwhelm the ES cluster with concurrent requests.
@elasticmachine merge upstream |
Pinging @elastic/fleet (Team:Fleet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @kpollich |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…l to rollovers (elastic#166775) Fixes elastic#166761 Ref elastic#162772 ## Summary - Increase overall timeout for waiting to retry "stuck" installations from 1 minute to 30 minutes - Add `pMap` concurrency control limiting concurrent `putMapping` + `rollover` requests to mitigate ES load --------- Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit c20d177)
… control to rollovers (#166775) (#167184) # Backport This will backport the following commits from `main` to `8.10`: - [[Fleet] Increase package install max timeout + add concurrency control to rollovers (#166775)](#166775) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Kyle Pollich","email":"kyle.pollich@elastic.co"},"sourceCommit":{"committedDate":"2023-09-25T18:05:03Z","message":"[Fleet] Increase package install max timeout + add concurrency control to rollovers (#166775)\n\nFixes #166761 #162772 Summary\r\n\r\n- Increase overall timeout for waiting to retry \"stuck\" installations\r\nfrom 1 minute to 30 minutes\r\n- Add `pMap` concurrency control limiting concurrent `putMapping` +\r\n`rollover` requests to mitigate ES load\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"c20d177a036be73d7b1180dc17e644afa260994f","branchLabelMapping":{"^v8.11.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:Fleet","backport:prev-minor","v8.11.0"],"number":166775,"url":"#166775 Increase package install max timeout + add concurrency control to rollovers (#166775)\n\nFixes #166761 #162772 Summary\r\n\r\n- Increase overall timeout for waiting to retry \"stuck\" installations\r\nfrom 1 minute to 30 minutes\r\n- Add `pMap` concurrency control limiting concurrent `putMapping` +\r\n`rollover` requests to mitigate ES load\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"c20d177a036be73d7b1180dc17e644afa260994f"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.11.0","labelRegex":"^v8.11.0$","isSourceBranch":true,"state":"MERGED","url":"#166775 Increase package install max timeout + add concurrency control to rollovers (#166775)\n\nFixes #166761 #162772 Summary\r\n\r\n- Increase overall timeout for waiting to retry \"stuck\" installations\r\nfrom 1 minute to 30 minutes\r\n- Add `pMap` concurrency control limiting concurrent `putMapping` +\r\n`rollover` requests to mitigate ES load\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"c20d177a036be73d7b1180dc17e644afa260994f"}}]}] BACKPORT--> --------- Co-authored-by: Kyle Pollich <kyle.pollich@elastic.co> Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
Fixes #166761
Ref #162772
Summary
pMap
concurrency control limiting concurrentputMapping
+rollover
requests to mitigate ES load