You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently when the shipper output encounters an error publishing to the shipper, it unconditionally calls batch.Cancelled(), which (despite the name) retries everything in the batch without decreasing the batch's time to live counter, which has the effect of retrying forever even when finite retry is configured. If the error itself is deterministically fatal (as for example in #34695) that means the failure will persist indefinitely, blocking the queue so that no future ingestion can take place.
To work correctly with the pipeline, the shipper output should call batch.Retry() on retryable errors and batch.Drop() on fatal errors.
Edit: actually, Cancelled is still potentially correct to call if we believe that any errors that can arise are ephemeral and cannot be related to the contents of the Publish request. I'm not sure this is always true, but the shipper is a special case and publishing to the shipper "can't fail" if the request itself is sent successfully, so for now the important part is just to make sure we drop batches on non-retryable errors.
The text was updated successfully, but these errors were encountered:
Currently when the shipper output encounters an error publishing to the shipper, it unconditionally calls
batch.Cancelled()
, which (despite the name) retries everything in the batch without decreasing the batch's time to live counter, which has the effect of retrying forever even when finite retry is configured. If the error itself is deterministically fatal (as for example in #34695) that means the failure will persist indefinitely, blocking the queue so that no future ingestion can take place.To work correctly with the pipeline, the shipper output should call
batch.Retry()
on retryable errors andbatch.Drop()
on fatal errors.Edit: actually,
Cancelled
is still potentially correct to call if we believe that any errors that can arise are ephemeral and cannot be related to the contents of thePublish
request. I'm not sure this is always true, but the shipper is a special case and publishing to the shipper "can't fail" if the request itself is sent successfully, so for now the important part is just to make sure we drop batches on non-retryable errors.The text was updated successfully, but these errors were encountered: