Excessive metadata array reads in `Workflow.write_commands` #667

aplowman · 2024-05-01T09:31:10Z

When checking if we need to add a loop termination command to the commands of an action, we call Workflow.get_iteration_final_run_IDs, which in turn calls Workflow.get_loop_map. This then calls Workflow.get_EARs_from_IDs (which reads the runs metadata array) on all run IDs from that submission, which could be many thousands of runs for large workflows.

In principle, this shouldn't be a problem because Zarr support multiprocess reading. In practice, it seems something is going wrong here under high concurrency scenarios (i.e. using a large job array when the cluster has very good availability). We get random RuntimeErrors from numcodecs during the chunk decompression from this metadata array. These errors are guarded against using the reretry package. However, for tasks that should be quick, this introduces a potentially lengthy delay to execution, especially for large workflows.

Additionally, reading the whole array is slow on Lustre file systems in general, because this array must be single-chunked (one chunk/file per run) to allow for multi-process writing during execution. So we ideally want to avoid reading most of/the whole array anyway.

Two steps to solve:

Fix for the case where the workflow has no loops. This is easy, and should just require an wrapping some existing code in an if statement.
Fix for the case where the workflow has loops.

The text was updated successfully, but these errors were encountered:

aplowman · 2024-05-01T09:48:31Z

First step fixed in #668.

aplowman added bug Something isn't working zarr persistence Related to persistent workflow data storage/manipulation labels May 1, 2024

aplowman mentioned this issue May 1, 2024

Reduce overhead introduced by hpcflow in jobscripts #670

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive metadata array reads in `Workflow.write_commands` #667

Excessive metadata array reads in `Workflow.write_commands` #667

aplowman commented May 1, 2024 •

edited

Loading

aplowman commented May 1, 2024

Excessive metadata array reads in Workflow.write_commands #667

Excessive metadata array reads in Workflow.write_commands #667

Comments

aplowman commented May 1, 2024 • edited Loading

aplowman commented May 1, 2024

Excessive metadata array reads in `Workflow.write_commands` #667

Excessive metadata array reads in `Workflow.write_commands` #667

aplowman commented May 1, 2024 •

edited

Loading