Refactor zarr encode #81

jeromekelleher · 2024-03-20T20:25:43Z

No description provided.

jeromekelleher · 2024-03-21T09:30:36Z

This refactors the encode code-path quite a bit, and opens the way for making this distributable on a cluster. Main changes:

Merged 1D and 2D encode steps into one, and change rate reporting to bytes
Add --max-memory for encode (approximate memory budget based on the size of the variant-chunk numpy buffer, seems to work quite well)
Changed the way we try and keep track of in-progress encoding to adding a "wip_" prefix to all arrays that are being written to. Currently this just waits until the end and renames them all, but the idea is to do this as the array encoding jobs complete.

jeromekelleher added 8 commits March 20, 2024 13:53

Initial refactor of the encode path

6c96407

Add per-array WIP and atomic swap

099101e

Change encode to a single progress monitor

2439c70

Tweak tqdm displays

bdabfdb

Add memory budget

526cce1

Minor refactor to prepare for incremental finalise

430ac5e

Remove timeout on wait for completed

eaa9714

Add CHANGELOG

986f999

jeromekelleher merged commit 8c3ec06 into sgkit-dev:main Mar 21, 2024

jeromekelleher deleted the refactor-zarr-encode branch March 21, 2024 09:30

Provide feedback