Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Increase package install max timeout + add concurrency control to rollovers #166775

Merged
merged 5 commits into from Sep 25, 2023

Conversation

kpollich
Copy link
Member

@kpollich kpollich commented Sep 19, 2023

Fixes #166761
Ref #162772

Summary

  • Increase overall timeout for waiting to retry "stuck" installations from 1 minute to 30 minutes
  • Add pMap concurrency control limiting concurrent putMapping + rollover requests to mitigate ES load

@kpollich kpollich added the Team:Fleet Team label for Observability Data Collection Fleet team label Sep 19, 2023
@kpollich kpollich self-assigned this Sep 19, 2023
@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Comment on lines +109 to +125
if (isStatusInstalling && hasExceededTimeout) {
// If this is a forced installation, ignore the timeout and restart the installation anyway
if (force) {
await restartInstallation({
savedObjectsClient,
pkgName,
pkgVersion,
installSource,
verificationResult,
});
} else {
throw new ConcurrentInstallOperationError(
`Concurrent installation or upgrade of ${pkgName || 'unknown'}-${
pkgVersion || 'unknown'
} detected, aborting.`
);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the edge case mentioned in #166761 (comment), I'm allowing forced installation to immediately trigger a restart of a stuck installation without honoring the timeout.

Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the changes.
I am wondering if we can reproduce the apm rollover issue to test that the changes here help.

@juliaElastic
Copy link
Contributor

Should we backport to 8.10.2?

@kpollich
Copy link
Member Author

Should we backport to 8.10.2?

We missed 8.10.2 feature freeze it looks like, and I'd like to figure out some kind of testing solution here before we ship so this won't make 8.10.2.

@juliaElastic
Copy link
Contributor

We missed 8.10.2 feature freeze it looks like, and I'd like to figure out some kind of testing solution here before we ship so this won't make 8.10.2.

I see, then maybe 8.10.3, the FF is Oct 3.

@kpollich
Copy link
Member Author

Hmm I am on the fence about 8.10.3 here. This could probably be considered a bugfix, the more I think about it - so maybe safe to backport to that patch release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tests for the "stuck in installing" timeout cases and the force flag logic changes, but I wasn't able to really land on a testing implementation to ensure we don't overwhelm the ES cluster with concurrent requests.

@kpollich
Copy link
Member Author

@elasticmachine merge upstream

@kpollich kpollich marked this pull request as ready for review September 25, 2023 14:49
@kpollich kpollich requested a review from a team as a code owner September 25, 2023 14:49
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kpollich kpollich added release_note:fix backport:prev-minor Backport to the previous minor version (i.e. one version back from main) labels Sep 25, 2023
Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@kpollich
Copy link
Member Author

@elasticmachine merge upstream

@kpollich kpollich enabled auto-merge (squash) September 25, 2023 16:32
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @kpollich

@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.10

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 25, 2023
…l to rollovers (elastic#166775)

Fixes elastic#166761
Ref elastic#162772

## Summary

- Increase overall timeout for waiting to retry "stuck" installations
from 1 minute to 30 minutes
- Add `pMap` concurrency control limiting concurrent `putMapping` +
`rollover` requests to mitigate ES load

---------

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit c20d177)
kibanamachine added a commit that referenced this pull request Sep 26, 2023
… control to rollovers (#166775) (#167184)

# Backport

This will backport the following commits from `main` to `8.10`:
- [[Fleet] Increase package install max timeout + add concurrency
control to rollovers
(#166775)](#166775)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Kyle
Pollich","email":"kyle.pollich@elastic.co"},"sourceCommit":{"committedDate":"2023-09-25T18:05:03Z","message":"[Fleet]
Increase package install max timeout + add concurrency control to
rollovers (#166775)\n\nFixes
#166761
#162772
Summary\r\n\r\n- Increase overall timeout for waiting to retry \"stuck\"
installations\r\nfrom 1 minute to 30 minutes\r\n- Add `pMap` concurrency
control limiting concurrent `putMapping` +\r\n`rollover` requests to
mitigate ES load\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"c20d177a036be73d7b1180dc17e644afa260994f","branchLabelMapping":{"^v8.11.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:Fleet","backport:prev-minor","v8.11.0"],"number":166775,"url":"#166775
Increase package install max timeout + add concurrency control to
rollovers (#166775)\n\nFixes
#166761
#162772
Summary\r\n\r\n- Increase overall timeout for waiting to retry \"stuck\"
installations\r\nfrom 1 minute to 30 minutes\r\n- Add `pMap` concurrency
control limiting concurrent `putMapping` +\r\n`rollover` requests to
mitigate ES load\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"c20d177a036be73d7b1180dc17e644afa260994f"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.11.0","labelRegex":"^v8.11.0$","isSourceBranch":true,"state":"MERGED","url":"#166775
Increase package install max timeout + add concurrency control to
rollovers (#166775)\n\nFixes
#166761
#162772
Summary\r\n\r\n- Increase overall timeout for waiting to retry \"stuck\"
installations\r\nfrom 1 minute to 30 minutes\r\n- Add `pMap` concurrency
control limiting concurrent `putMapping` +\r\n`rollover` requests to
mitigate ES load\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"c20d177a036be73d7b1180dc17e644afa260994f"}}]}]
BACKPORT-->

---------

Co-authored-by: Kyle Pollich <kyle.pollich@elastic.co>
Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to the previous minor version (i.e. one version back from main) release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team v8.10.3 v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Increase EPM integration installation timeout
7 participants