Skip to content

[qob] In GCS, recreate the ReadChannel if a transient error occurs#13730

Merged
danking merged 1 commit intohail-is:mainfrom
danking:qob-fully-recreate-read-channel
Sep 28, 2023
Merged

[qob] In GCS, recreate the ReadChannel if a transient error occurs#13730
danking merged 1 commit intohail-is:mainfrom
danking:qob-fully-recreate-read-channel

Conversation

@danking
Copy link
Contributor

@danking danking commented Sep 27, 2023

CHANGELOG: Fix #13356 and fix #13409. In QoB pipelines with 10K or more partitions, transient "Corrupted block detected" errors were common. This was caused by incorrect retry logic. That logic has been fixed.

I now assume we cannot reuse a ReadChannel after any exception occurs during read. We also do not assume that the ReadChannel "atomically", in some sense, modifies the ByteBuffer. In particular, if we encounter any error, we blow away the ByteBuffer and restart our read entirely.

As I described in this comment to #13409, I have a 10K partition pipeline which was reliably producing this error but now reliably does not produce this error (it produces another one, #13721, fix forthcoming for that too).

CHANGELOG: Fix hail-is#13356 and fix hail-is#13409. In QoB pipelines with 10K or more partitions, transient "Corrupted block detected" errors were common. This was caused by incorrect retry logic. That logic has been fixed.

I now assume we cannot reuse a ReadChannel after any exception occurs during read. We also do not
assume that the ReadChannel "atomically", in some sense, modifies the ByteBuffer. In particular, if
we encounter any error, we blow away the ByteBuffer and restart our read entirely.
Copy link
Collaborator

@chrisvittal chrisvittal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! Thanks for tracking this down.

@danking danking merged commit ed58853 into hail-is:main Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants