Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Automatically break up big spark jobs #14584

Open
patrick-schultz opened this issue Jun 18, 2024 · 0 comments
Open

[query] Automatically break up big spark jobs #14584

patrick-schultz opened this issue Jun 18, 2024 · 0 comments
Assignees

Comments

@patrick-schultz
Copy link
Collaborator

Spark breaks down when a job has too many partitions. We should modify the implementation of CollectDistributedArray on the Spark backend to automatically break up jobs that are above some threshold of number of partitions into a few sequential smaller jobs. This would have a large impact on groups like AoU who are using Hail on the biggest datasets, who currently have to hack around this issue with trial and error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants