Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix unenroll offline agents #2092

Merged
merged 7 commits into from Nov 23, 2022

Conversation

juliaElastic
Copy link
Contributor

@juliaElastic juliaElastic commented Nov 17, 2022

What is the problem this PR solves?

Fix bug with unenrolling offline agents introduced in 8.5

"fleet.apikey.id":["49TbhYQBqTAlThoVXNcQ","",""],
"Validation Failed: 1: Field [ids] must not contain blank id, but got blank ids at index positions: [1, 2];\

How does this PR solve the problem?

Added missing null check when creating list of API keys to invalidate.

How to test this PR locally

  • create a Fleet Server policy with enrollment timeout: 60 seconds
  • enroll a Fleet Server with docker
  • stop the Fleet Server
  • start the local Fleet Server and enroll it to kibana with a service token
  • wait a few minutes until the unenroll logic kicks in
  • expect that the unenroll happens successfully
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","policy_id":"fleet-server-policy","unenroller_uuid":"0114a509-e906-416e-afa9-0bf8f6adaff3","timeout":60000,"fleet.agent.id":"74e84102-000d-4ba7-8e1e-34e9bf70536b","fleet.apikey.id":["49TbhYQBqTAlThoVXNcQ"],"@timestamp":"2022-11-17T14:26:35.334Z","message":"unenrollAgent due to unenroll timeout"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","policy_id":"fleet-server-policy","unenroller_uuid":"0114a509-e906-416e-afa9-0bf8f6adaff3","timeout":60000,"fleet.apikey.id":["74e84102-000d-4ba7-8e1e-34e9bf70536b"],"@timestamp":"2022-11-17T14:26:36.509Z","message":"marked agents unenrolled due to unenroll timeout"}

Is this error expected?

this change coming from the same pr that caused the original issue: ca9041f#diff-81f56287429686d60d33c6df7c9774da997979bc050126d4c807aa602683cc71

{"log.level":"error","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","policy_id":"fleet-server-policy","unenroller_uuid":"0114a509-e906-416e-afa9-0bf8f6adaff3","error.message":"not found","unenroll_timeout":60000,"@timestamp":"2022-11-17T14:27:36.763Z","message":"failed to unenroll offline agents"}
{"log.level":"error","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","policy_id":"fleet-server-policy","unenroller_uuid":"0114a509-e906-416e-afa9-0bf8f6adaff3","error.message":"not found","unenroll_timeout":60000,"@timestamp":"2022-11-17T14:28:37.016Z","message":"failed to unenroll offline agents"}

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Closes #2091

@juliaElastic juliaElastic added the bug Something isn't working label Nov 17, 2022
@juliaElastic juliaElastic requested a review from a team as a code owner November 17, 2022 14:40
@juliaElastic juliaElastic self-assigned this Nov 17, 2022
@juliaElastic juliaElastic added backport-v8.5.0 Automated backport with mergify backport-v8.6.0 Automated backport with mergify v8.7.0 labels Nov 17, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 17, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-11-23T07:48:59.365+0000

  • Duration: 12 min 30 sec

Test stats 🧪

Test Results
Failed 0
Passed 391
Skipped 1
Total 392

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@joshdover
Copy link
Member

Do we know why this field is empty in the first place? Or is it that one of the API keys is an empty string, but the other isn't?

@joshdover
Copy link
Member

We should also add a known issue in the 8.5.x release notes docs for Fleet & Agent.

Copy link
Contributor

@michel-laterman michel-laterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case to the unit tests for this?

@juliaElastic juliaElastic requested a review from a team as a code owner November 18, 2022 08:00
@juliaElastic
Copy link
Contributor Author

Do we know why this field is empty in the first place? Or is it that one of the API keys is an empty string, but the other isn't?

Yes, it looks like the output API key is empty.

@juliaElastic
Copy link
Contributor Author

We should also add a known issue in the 8.5.x release notes docs for Fleet & Agent.

Raised elastic/observability-docs#2383

internal/pkg/dl/agent.go Outdated Show resolved Hide resolved
internal/pkg/coordinator/monitor.go Outdated Show resolved Hide resolved
@juliaElastic
Copy link
Contributor Author

@michel-laterman could you approve? I think the codeowners changed and I need someone from @elastic/fleet to approve.

Copy link
Contributor

@michel-laterman michel-laterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@juliaElastic juliaElastic merged commit 3b22855 into elastic:main Nov 23, 2022
mergify bot pushed a commit that referenced this pull request Nov 23, 2022
* fix unenroll offline agents

* added changelog

* added unit test

* changed ErrNotFound error to an info log

* fixed error logic

(cherry picked from commit 3b22855)
mergify bot pushed a commit that referenced this pull request Nov 23, 2022
* fix unenroll offline agents

* added changelog

* added unit test

* changed ErrNotFound error to an info log

* fixed error logic

(cherry picked from commit 3b22855)
@lucabelluccini
Copy link
Contributor

lucabelluccini commented Nov 24, 2022

From the state of the 2 backports, this didn't make it for 8.5.2 and it's not yet merged on branches except for main. Is it correct? If it is the case, I would add the Known issue section Offline Elastic Agents fail to unenroll after timeout has expired at https://www.elastic.co/guide/en/fleet/master/release-notes-8.5.1.html also for 8.5.2

@juliaElastic
Copy link
Contributor Author

From the state of the 2 backports, this didn't make it for 8.5.2 and it's not yet merged on branches except for main. Is it correct? If it is the case, I would add the Known issue section Offline Elastic Agents fail to unenroll after timeout has expired at https://www.elastic.co/guide/en/fleet/master/release-notes-8.5.1.html also for 8.5.2

Yes, it didn't make it to 8.5.2.

juliaElastic added a commit that referenced this pull request Nov 29, 2022
* fix unenroll offline agents

* added changelog

* added unit test

* changed ErrNotFound error to an info log

* fixed error logic

(cherry picked from commit 3b22855)

Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
juliaElastic added a commit that referenced this pull request Nov 29, 2022
* fix unenroll offline agents

* added changelog

* added unit test

* changed ErrNotFound error to an info log

* fixed error logic

(cherry picked from commit 3b22855)

Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.5.0 Automated backport with mergify backport-v8.6.0 Automated backport with mergify bug Something isn't working v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Offline agents not unenrolled after unenrollment timeout has expired
6 participants