-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On Fleet Server reboot gives a grace period #1605
Conversation
@joshdover I think this will help api key issues we saw. |
@Mergifyio update |
✅ Branch has been successfully updated |
@ph i have relabelled the issue appropriately to make it land in 8.4. |
Test is fixed and it's ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like the change
internal/pkg/coordinator/monitor.go
Outdated
@@ -453,6 +453,23 @@ func runCoordinatorOutput(ctx context.Context, cord Coordinator, bulker bulk.Bul | |||
} | |||
|
|||
func runUnenroller(ctx context.Context, bulker bulk.Bulk, policyID string, unenrollTimeout time.Duration, l zerolog.Logger, checkInterval time.Duration, agentsIndex string) { | |||
// When fleet-server is offline for a long period and finally recover, it means that the connected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: recover -> recovers
When Fleet Server is offline for a period bigger or equal to the unenrollTimeout, on reboot the Server will start to automatically unenroll Elastic Agent without giving them a change to communicate with the system. Instead on reboot we will give a grace period equivalent to the unenrollTimeout this will give enough time to the Elastic Agent to communicate back to Fleet Server and update their last checkin time. Fixes: elastic#1500
Consider the grace period when executing the test.
Msg("giving a grace period to Elastic Agent before enforcing unenrollTimeout monitor") | ||
|
||
if err := waitWithContext(ctx, unenrollTimeout); err != nil { | ||
l.Err(err).Dur("unenroll_timeout", unenrollTimeout). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe log "context canceled" at debug level?
[8.3](backport #1605) On Fleet Server reboot gives a grace period
When Fleet Server is offline for a period bigger or equal to the
unenrollTimeout, on reboot the Server will start to automatically
unenroll Elastic Agent without giving them a chance to communicate with
the system. Instead on reboot we will give a grace period equivalent to
the unenrollTimeout this will give enough time to the Elastic Agent to
communicate back to Fleet Server and update their last checkin time.
Fixes: #1500
What is the problem this PR solves?
// Please do not just reference an issue. Explain WHAT the problem this PR solves here.
How does this PR solve the problem?
// Explain HOW you solved the problem in your code. It is possible that during PR reviews this changes and then this section should be updated.
How to test this PR locally
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Related issues