Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Added EHOSTUNREACH to retry list #16902

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

michael-redpanda
Copy link
Contributor

Fixes: #16901

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • Added EHOSTUNREACH to retry-able error code list

graphcareful
graphcareful previously approved these changes Mar 5, 2024
@@ -53,6 +54,7 @@ bool is_reconnect_error(const std::system_error& e) {
case ECONNABORTED:
case EAGAIN:
case EPIPE:
case EHOSTUNREACH:
Copy link
Member

@BenPope BenPope Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are any other error codes we could opportunistically add here? E.g.:

  1. ESHUTDOWN
  2. EHOSTDOWN
  3. ENETRESET
  4. ENETDOWN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ESHUTDOWN is where we shutdown the socket (https://linux.die.net/man/3/shutdown). So probably an artifact of shutting down and not a retry. Thoughts?

I think the other three can be added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shut down the socket on all manner of alternative errors. At initial glance, I didn't see an abort source which would be more explicit about shutdown.

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Mar 5, 2024

@michael-redpanda
Copy link
Contributor Author

Force push a6d1670:

  • Added more error codes to retry list

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Mar 6, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/45755#018e1580-415e-4e34-bab0-4adc2c4a1b87:

"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/45975#018e2df2-7481-48f7-9604-2165d4c2644e:

"rptest.tests.data_transforms_test.DataTransformsLoggingTest.test_logs_volume"

Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

nit: commit message

Added:
- EHOSTUNREACH
- EHOSTDOWN
- ENETRESET
- ENETDOWN

Fixes: redpanda-data#16901

Signed-off-by: Michael Boquard <michael@redpanda.com>
@michael-redpanda
Copy link
Contributor Author

Force push 0f35361:

  • No functional change - just updated commit message

@michael-redpanda michael-redpanda merged commit deed6e7 into redpanda-data:dev Mar 11, 2024
13 of 16 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@@ -17,6 +17,7 @@
#include <seastar/core/future.hh>
#include <seastar/net/tls.hh>

#include <asm-generic/errno.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm pretty sure you should only need <errno.h> but the LSP usually doesn't know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Treat EHOSTUNREACH as a reconnect error in kafka client
6 participants