Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[qob] remove resource leak in readNoCompression #13065

Merged
merged 4 commits into from May 20, 2023

Conversation

danking
Copy link
Collaborator

@danking danking commented May 16, 2023

CHANGELOG: In Azure Query-on-Batch, fix a resource leak that prevented running pipelines with >500 partitions and created flakiness with >250 partitions.

fixes: #13061

CHANGELOG: In Azure Query-on-Batch, fix a resource leak that prevented running pipelines with >500 partitions and created flakiness with >250 partitions.
@danking
Copy link
Collaborator Author

danking commented May 16, 2023

I verified this scales all the way to 50,000 partitions but the batch-driver can't schedule these fast enough to make the test fast. They take less than a second but more than 300ms. We'd need like 64 8-core (512 cores) nodes to bring test time down to a reasonable amount. We should strive to get there but batch would need to schedule at 512 jobs per second for that to make sense. Hmm, something went wrong.

OK, we'll need to revisit 50k partition tables. But let's get this in, the current code is obviously wrong.

@daniel-goldstein
Copy link
Contributor

@danking Gotta regenerate the dev pinned requirements.

@jigold jigold added the WIP label May 16, 2023
@danking
Copy link
Collaborator Author

danking commented May 17, 2023

New ruff version found new issues which I fixed. Should pass now/

@danking danking mentioned this pull request May 19, 2023
@danking danking merged commit c626b49 into hail-is:main May 20, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In Query-on-Batch in Azure, a 1500 partition VCF fails to import
3 participants