Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up UPG_SCHEDULED state when action expires #3902

Merged
merged 9 commits into from
Dec 28, 2023

Conversation

AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented Dec 12, 2023

What does this PR do?

Fixes how the upgrade handles the UPG_SCHEDULED status and its eventual clean up.
Now the agent:

  • sets the upgrade status to UPG_SCHEDULED if a scheduled upgrade action is received
  • sets the upgrade status to UPG_FAILED if the upgrade action expires
  • if there is a expired upgrade and a new scheduled upgrade, the upgrade state is set to UPG_SCHEDULED

Why is it important?

The upgrade state was reporting UPG_SCHEDULED even after the upgrade action had expired

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

How to test this PR locally

  • build this branch:
DEV=true SNAPSHOT=true EXTERNAL=true PLATFORMS="linux/amd64" PACKAGES="tar.gz" mage -v package
  • unpack the agent
  • edit ./elastic-agent-8.12.0-SNAPSHOT-linux-x86_64/data/elastic-agent-[hash]/package.version to set the version to any previous one.
  • install/enroll the agent
  • create a new user (the elastic user won't do) with all the possible roles/permissions
  • on fleetUI set a scheduled upgrade
    • select the agent using the check box
    • click the "Actions" button
    • choose "Schedule upgrade for 1 agent"
    • set it for 5 min from now
  • watch the agent status to see the scheduled upgrade and get the action ID
  • get the action document ID
    • search .fleet-actions:
      • GET .fleet-actions/_search. The upgrade action should be the first one
      • get the action's document ID
    • OR query it
POST /_query?format=yaml
{
  "query": """
from .fleet-actions [METADATA _id]
| where action_id LIKE "ACTION_ID"
| keep _id, start_time, expiration, agents
  """
}
  • set the action's expiration to 5 min past the start time. You will need to use the new user and curl (perhaps login in with the new user works as well, I haven't tested it)
curl -XPOST --user 'ES_USER:ES_PASSWORD' -H'x-elastic-product-origin:kinana' --header 'kbn-xsrf: as' -H'content-type:application/json' -d '{"doc":{"expiration":"2023-12-13T08:55:00.000Z"}}' "ES_HOST.fleet-actions/_update/ACTION_ID"
  • stop the agent
  • wait until at least 2-3 min past the actions expiration
  • restart the agent, the upgrade status should be 'failed'

the catch here is it the agent don't really set the action as

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

Copy link
Contributor

mergify bot commented Dec 12, 2023

This pull request does not have a backport label. Could you fix it @AndersonQ? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@AndersonQ AndersonQ marked this pull request as ready for review December 13, 2023 10:40
@AndersonQ AndersonQ requested a review from a team as a code owner December 13, 2023 10:40
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and is well tested!

@AndersonQ
Copy link
Member Author

buildkite test this

Copy link

@AndersonQ AndersonQ merged commit 4856600 into elastic:main Dec 28, 2023
9 checks passed
@AndersonQ AndersonQ deleted the 3817-fix-UPG_SCHEDULED-report branch December 28, 2023 18:50
mergify bot pushed a commit that referenced this pull request Dec 28, 2023
AndersonQ added a commit that referenced this pull request Jan 2, 2024
…es (#3967)

* set upgrade state to failed when scheduled upgrade action expires (#3902)

(cherry picked from commit 4856600)

---------

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
cmacknz pushed a commit to cmacknz/elastic-agent that referenced this pull request Jan 8, 2024
cmacknz pushed a commit that referenced this pull request Jan 17, 2024
…es (#3967)

* set upgrade state to failed when scheduled upgrade action expires (#3902)

(cherry picked from commit 4856600)

---------

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
Co-authored-by: Pierre HILBERT <pierre.hilbert@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agent remains in the UPG_SCHEDULED state past the scheduled action start time
4 participants