[query] Extremely large jobs often run out of memory on the driver #14584

patrick-schultz · 2024-06-18T20:08:02Z

Spark breaks down when a job has too many partitions. We should modify the implementation of CollectDistributedArray on the Spark backend to automatically break up jobs that are above some threshold of number of partitions into a few sequential smaller jobs. This would have a large impact on groups like AoU who are using Hail on the biggest datasets, who currently have to hack around this issue with trial and error.

chrisvittal · 2024-08-05T17:11:17Z

Part 1 is #14590, making this some sort of default will be part 2.

chrisvittal · 2024-10-07T17:45:46Z

Some discussion from 10/7

Maybe use fast external storage to keep and then query job results such that we never materialize all the results while the job is running.

The call caching framework may help here.

patrick-schultz assigned chrisvittal Jun 18, 2024

chrisvittal changed the title ~~[query] Automatically break up big spark jobs~~ [query] Extremely large jobs often run out of memory on the driver Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[query] Extremely large jobs often run out of memory on the driver #14584

[query] Extremely large jobs often run out of memory on the driver #14584

patrick-schultz commented Jun 18, 2024

chrisvittal commented Aug 5, 2024

chrisvittal commented Oct 7, 2024

[query] Extremely large jobs often run out of memory on the driver #14584

[query] Extremely large jobs often run out of memory on the driver #14584

Comments

patrick-schultz commented Jun 18, 2024

chrisvittal commented Aug 5, 2024

chrisvittal commented Oct 7, 2024