Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The UPG_DOWNLOADING state should include a retryable error and the time the agent should spend retrying #3818

Closed
cmacknz opened this issue Nov 24, 2023 · 1 comment · Fixed by #3845
Assignees
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@cmacknz
Copy link
Member

cmacknz commented Nov 24, 2023

This is a follow up from a conversation in #3760

The details of the UPG_DOWNLOADING state should include a retryable_error state indicating the most recent retryable error that was encountered. This is so that users do not need to wait for the full download timeout to see the error, which defaults to 2 hours.

We should additionally include a retry_until field containing the deadline in UTC for the upgrade to complete. The elastic-agent status command should calculate the time until the deadline so that it is obvious much longer the agent will spend retrying the download. Fleet should be updated to do the same thing but this will be a separate issue.

To test this behavior I set the agent to use a source URI that does not exist:

sudo elastic-agent inspect
agent:
download:
    sourceURI: https://artifacts.elastic.co/broken/

I then observed that the agent reported itself in the upgrade downloading state with the percent complete stuck at 0% until the eventual transition to the upgrade failed state. The logs did contain the actual error, but looking at the details alone does not tell you the download is failing.

┌─ fleet
│  └─ status: (HEALTHY) Connected
├─ elastic-agent
│  └─ status: (UPGRADING) Upgrading to version 8.11.1
└─ upgrade_details
├─ target_version: 8.11.1
├─ state: UPG_DOWNLOADING
├─ action_id: 006aee27-2936-4b2e-9928-4ff53d2890e0
└─ metadata
    └─ download_percent: 0.00%
{"log.level":"warn","@timestamp":"2023-11-24T19:30:58.344Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":260},"message":"unable to download package: 3 errors occurred:\n\t* package '/Library/Elastic/Agent/data/elastic-agent-97e821/downloads/elastic-agent-8.11.1-darwin-aarch64.tar.gz' not found: open /Library/Elastic/Agent/data/elastic-agent-97e821/downloads/elastic-agent-8.11.1-darwin-aarch64.tar.gz: no such file or directory\n\t* call to 'https://artifacts.elastic.co/broken/beats/elastic-agent/elastic-agent-8.11.1-darwin-aarch64.tar.gz' returned unsuccessful status code: 404\n\t* call to 'https://artifacts.elastic.co/broken/beats/elastic-agent/elastic-agent-8.11.1-darwin-aarch64.tar.gz' returned unsuccessful status code: 404\n\n; retrying (will be retry 2) in 30.028119235s.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Nov 24, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz cmacknz changed the title The UPG_DOWNLOADING state should include a retryable error and the time the agent will retry The UPG_DOWNLOADING state should include a retryable error and the time the agent should spend retrying Nov 24, 2023
@blakerouse blakerouse assigned AndersonQ and unassigned ycombinator Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants