Skip to content

[hailtop] yet another transient error #13817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 16, 2023

Conversation

danking
Copy link
Contributor

@danking danking commented Oct 16, 2023

CHANGELOG: Mitigate new transient error from Google Cloud Storage which manifests as aiohttp.client_exceptions.ClientOSError: [Errno 1] [SSL: SSLV3_ALERT_BAD_RECORD_MAC] sslv3 alert bad record mac (_ssl.c:2548).

As of around 1500 ET 2023-10-16, this exception happens whenever we issue a lot of requests to GCS.

See Zulip thread.

As of around 1500 ET 2023-10-16, this exception happens whenever we issue a lot of requests to GCS.

See [Zulip thread](https://hail.zulipchat.com/#narrow/stream/300487-Hail-Batch-Dev/topic/cluster.20size/near/396777320).
jigold
jigold previously requested changes Oct 16, 2023
@@ -687,6 +687,14 @@ def is_transient_error(e):
if (isinstance(e, aiohttp.ClientPayloadError)
and e.args[0] == "Response payload is not completed"):
return True
if (isinstance(e, aiohttp.ClientOSError)
and len(e.args) >= 2
and e.args[0] == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to parse. Should it be e.args[0] == '1' as is e.args[0] a string or int? Can we just check for the message in e.args[1]?

Copy link
Contributor Author

@danking danking Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first arg is the error number, so it's definitely an integer. I suppose we can be more lax though. I'll change.

@danking danking dismissed jigold’s stale review October 16, 2023 14:51

I am now ignoring the error number.

@danking danking merged commit fcaafc5 into hail-is:main Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants