Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shipper output retries forever on all errors regardless of actual retry settings #34700

Closed
Tracked by #16
faec opened this issue Feb 28, 2023 · 0 comments · Fixed by #34762
Closed
Tracked by #16

Shipper output retries forever on all errors regardless of actual retry settings #34700

faec opened this issue Feb 28, 2023 · 0 comments · Fixed by #34762
Assignees
Labels
bug Team:Elastic-Agent Label for the Agent team

Comments

@faec
Copy link
Contributor

faec commented Feb 28, 2023

Currently when the shipper output encounters an error publishing to the shipper, it unconditionally calls batch.Cancelled(), which (despite the name) retries everything in the batch without decreasing the batch's time to live counter, which has the effect of retrying forever even when finite retry is configured. If the error itself is deterministically fatal (as for example in #34695) that means the failure will persist indefinitely, blocking the queue so that no future ingestion can take place.

To work correctly with the pipeline, the shipper output should call batch.Retry() on retryable errors and batch.Drop() on fatal errors.

Edit: actually, Cancelled is still potentially correct to call if we believe that any errors that can arise are ephemeral and cannot be related to the contents of the Publish request. I'm not sure this is always true, but the shipper is a special case and publishing to the shipper "can't fail" if the request itself is sent successfully, so for now the important part is just to make sure we drop batches on non-retryable errors.

@faec faec added the bug label Feb 28, 2023
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 28, 2023
@faec faec added Team:Elastic-Agent Label for the Agent team and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 28, 2023
@faec faec self-assigned this Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant