Skip to content

Conversation

jeromekelleher
Copy link
Contributor

No description provided.

@jeromekelleher
Copy link
Contributor Author

This refactors the encode code-path quite a bit, and opens the way for making this distributable on a cluster. Main changes:

  • Merged 1D and 2D encode steps into one, and change rate reporting to bytes
  • Add --max-memory for encode (approximate memory budget based on the size of the variant-chunk numpy buffer, seems to work quite well)
  • Changed the way we try and keep track of in-progress encoding to adding a "wip_" prefix to all arrays that are being written to. Currently this just waits until the end and renames them all, but the idea is to do this as the array encoding jobs complete.

cc @shz9 @benjeffery

@jeromekelleher jeromekelleher merged commit 8c3ec06 into sgkit-dev:main Mar 21, 2024
@jeromekelleher jeromekelleher deleted the refactor-zarr-encode branch March 21, 2024 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant