Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Implement rolling upgrades for Agent upgrades #130259

Closed
22 of 30 tasks
joshdover opened this issue Apr 14, 2022 · 3 comments
Closed
22 of 30 tasks

[Fleet] Implement rolling upgrades for Agent upgrades #130259

joshdover opened this issue Apr 14, 2022 · 3 comments
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team v8.3.0

Comments

@joshdover
Copy link
Contributor

joshdover commented Apr 14, 2022

Changes not related to rolling upgrade

UI Remove licence restrictions for bulk selection

@criamico #130981
Currently we restrict bulk actions to user with gold plus licence, (note we only restrict this in the UI, API allow bulk actions without validating licence

UI Move the bulk actions from the table to actions button

@criamico #131133
We want to move the bulk actions from the table to an action button near the add agent button when agents are selected

  • Move the actions button in the UI
  • Move the "Clear selection" action besides the selection
    Screenshot 2022-04-19 at 15 28 04

UI/API Add additional version option for "Upgrade" action on agent list

We want to allow the user to specify the version he want to upgrade too, we should allow a user to upgrade to a version <= kibana version.
Also we should not allow to upgrade agents before Fleet server are upgraded

API

  • In the upgrade API
    @criamico [Fleet] Changes to bulk upgrade api for allowing rolling upgrades #131947
    • change the restrictions to the bulk upgrade API that only allow to upgrade to the same version as kibana to <= kibana version.
    • add a restriction that do not allow to upgrade a non Fleet server agent to a version > Fleet server versions .
    • If the fleet server is not upgraded yet, throw with an explicit error message in the API and the user will be able to resolve that error by upgrading each Fleet server.
  • Add a new endpoint that has the list of elastic agent available versions @criamico Enhancement: [Fleet] Maintain versions list for Agent upgrade modal #133309
    Depends on https://github.com/elastic/website-development/issues/9331
    • It provides the list of available version GET /api/fleet/agents/available_upgrade_versions. This API should fetch data from an internal kibana endpoint and fallback to an hardcorded|configured list of version.
    • Filters out any prerelease versions, like 8.0.0-alpha1
    • Filters out any version > current Kibana version
    • Doesn't have any version < 7.17.0 since only 7.17.0+ is supported against Elastic 8.x. Put in some logic or test that breaks the feature if Kibana is bumped to 9.0 so the oldest allowed version can be updated

Note: the eol json doesn't have information about the elastic agent package, we'll likely need to find another source for this info - currently investigating how to get the info from https://www.elastic.co/downloads/past-releases#elastic-agent

UI Misc

  • Remove the beta label for the feature. Currently there's a badge indicating the "experimental" feature on the modal, this can be removed when updating the modal - @criamico:

Screenshot 2022-04-27 at 15 39 03

Rolling upgrade changes

This depends on the .fleet-agent-actions schema to be updated:

UI

  • Update the modal to upgrade the agent to add version selection, user can select a version coming from Fleet API or type one, and pass the version when calling Fleet upgrade API (for bulk upgrade and single agent upgrade). - @criamico [Fleet] Changes to agent upgrade modal to allow for rolling upgrades #132421
    • Use a EuiComboBox to search among the versions instead of the simple dropdown
    • There should be a warning message and upgrade aborted to inform the user that the fleet server needs to be upgraded to the same version (in the cloud we don’t have this issue).
    • Allow user to specify the upgrade window (rollout_duration_seconds) (bulk upgrade only) and pass that to the upgrade API
    • The Maintenance Window explanation should read: Defines the duration of time available to perform the upgrade. The agent upgrades are spread uniformly across this duration in order to avoid exhausting network resources.
    • When only one agent is selected, no maintainance window should be shown
    • When the number of selected agents <= 10, in the maintainance dropdown show the option Immediately (no wait between upgrades)
    • When the number of selected agents > 10 don't show the option Immediately (as it might impact upgrades of big batches)

Screenshot 2022-05-18 at 17 27 52

Upgrade_modal_1
Upgrade_,modal_2

Screenshot 2022-04-19 at 15 18 35

API

@nchaulet

.fleet-agent-actions document for an upgrade (@michel-laterman to confirm)

{
   "action_id": "action2",
   "@timestamp": "..",
   "expiration": "END_DATE",
   "start_time": "START_DATE",
   "minimum_execution_period": 123123,
  "type": "UPGRADE",
  "agents": [ "agent1" ],
  "data": { "version": "8.3.0", "source_uri": "nonmandatory" } 
}
  • Add a current upgrade API GET /api/fleet/current-upgrades [Fleet] Add new API to get current upgrades #132276

    • That API could query for .fleet-actions with the UPGRADE type that are not expired to get the action id and expected number of agent to upgrade and will then query .fleet-actions-results to know how many of these agents completed the upgrade.
  • Add an API to abort current upgrades POST /api/fleet/actions/{upgradeActionId}/cancel. This will create a new .fleet-actions of type CANCEL with the target action id to cancel and the agent ids that should be cancelled (we could reuse the agent ids from the action to cancel to populate that)
    .fleet-agent-actions document for a cancellation (@michel-laterman to confirm) [Fleet] Allow to cancel agent actions #132168

{
   "action_id": "action2",
   "@timestamp": "..",
   "expiration": "..",
  "type": "CANCEL",
  "agents": [ "agent1" ],
  "data": { "target_id": "action1" } 
}

Testing

  • Manual test of upgrade at scale, we can probably use Horde to simulate a lot of agents
@joshdover joshdover added WIP Work in progress Team:Fleet Team label for Observability Data Collection Fleet team v8.3.0 labels Apr 14, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@jen-huang jen-huang removed the WIP Work in progress label Apr 21, 2022
@joshdover
Copy link
Contributor Author

@nchaulet @criamico can we open follow up issues for any remaining work and close this one?

@nchaulet
Copy link
Member

nchaulet commented Jun 2, 2022

We have this two issues as a follow up of the remaining work:
#133388
#133309

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team v8.3.0
Projects
None yet
Development

No branches or pull requests

5 participants