Feat: Supply size string for `n_obs_per_dataset` by felix0097 · Pull Request #159 · scverse/annbatch

felix0097 · 2026-03-10T13:47:56Z

No description provided.

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

…oaders into ff/refactor-zarr-params

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

codecov · 2026-03-10T13:52:09Z

Codecov Report

❌ Patch coverage is 81.81818% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.82%. Comparing base (0c6f025) to head (a365acf).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/annbatch/io.py	81.81%	12 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #159      +/-   ##
==========================================
- Coverage   93.56%   90.82%   -2.75%     
==========================================
  Files          11       11              
  Lines         886      937      +51     
==========================================
+ Hits          829      851      +22     
- Misses         57       86      +29

Files with missing lines	Coverage Δ
src/annbatch/io.py	`90.06% <81.81%> (-2.12%)`	⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

for more information, see https://pre-commit.ci

…oaders into ff/refactor-zarr-params

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

ilan-gold · 2026-03-12T13:45:48Z

@felix0097 please fix the diff - i thnk you may need to merge main or rebase.

ilan-gold

Nice!

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

review-notebook-app · 2026-03-16T15:58:00Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ilan-gold · 2026-03-17T13:22:08Z

+- The ``sparse_chunk_size``, ``sparse_shard_size``, ``dense_chunk_size``, and ``dense_shard_size`` parameters of {func}`annbatch.write_sharded` have been replaced by ``n_obs_per_chunk`` (number of observations per chunk, automatically converted to element counts for sparse arrays) and ``shard_size`` (number of observations per shard or a size string). The corresponding parameters in {meth}`annbatch.DatasetCollection.add_adatas` are ``n_obs_per_chunk`` and ``shard_size``.
+- `shard_size` in {meth}`annbatch.DatasetCollection.add_adatas` and `shard_size` in {func}`annbatch.write_sharded` now accept a human-readable size string (e.g. ``'1GB'``, ``'512MB'``) in addition to an integer number of observations. When a string is provided, the observation count is derived independently for each array element from its uncompressed bytes-per-row so that every shard stays close to the target size.
+- ``dataset_size`` in {meth}`annbatch.DatasetCollection.add_adatas` now accepts a human-readable size string (e.g. ``'20GB'``, ``'512MB'``) in addition to an integer number of observations. When a string is provided, the per-row byte size is estimated from the on-disk metadata of the input datasets during validation and used to derive the observation count. The default has changed from ``2_097_152`` to ``'20GB'``.


Maybe split these between "Breaking" header and "Features" header

felix0097 and others added 13 commits March 6, 2026 15:18

Clean up params for zarr writing

fe5ad84

Update src/annbatch/io.py

99bc83f

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update size calulation + size parsing

d979d5d

Add zarr param changes

62869f0

Merge branch 'ff/refactor-zarr-params' of github.com:laminlabs/arrayl…

9ee7620

…oaders into ff/refactor-zarr-params

fix readthedocs errors

cfc99d1

Fix errors

7b1893c

Update src/annbatch/io.py

fab4eea

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update src/annbatch/io.py

c8088c6

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update src/annbatch/io.py

5dcfaf3

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

chore: update variable names + changelog

8075339

support for size strings for n_obs_per_dataset

48cc713

add changelog entry

c5c7876

felix0097 requested a review from ilan-gold March 10, 2026 13:47

felix0097 self-assigned this Mar 10, 2026

felix0097 added enhancement New feature or request skip-gpu-ci Whether gpu ci should be skipped labels Mar 10, 2026

Merge branch 'main' into ff/automatic-size-estimation

414fd37

felix0097 and others added 10 commits March 10, 2026 17:53

Fix bytes_per_row calculation

de96b94

Update src/annbatch/io.py

be69b39

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c794067

for more information, see https://pre-commit.ci

Update method params

124f2b8

Merge branch 'main' into ff/refactor-zarr-params

62afc65

Merge branch 'ff/refactor-zarr-params' of github.com:laminlabs/arrayl…

e582390

…oaders into ff/refactor-zarr-params

Fix tests

147e4a3

Rename method params

275aff7

Update CHANGELOG.md

33c797e

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Merge branch 'ff/refactor-zarr-params' into ff/automatic-size-estimation

ceae676

Merge branch 'main' into ff/automatic-size-estimation

1e930e3

ilan-gold reviewed Mar 13, 2026

View reviewed changes

Comment thread src/annbatch/io.py Outdated

Comment thread src/annbatch/io.py

Comment thread src/annbatch/io.py Outdated

Comment thread src/annbatch/io.py

Small fixes

b285299

felix0097 requested a review from ilan-gold March 13, 2026 13:35

Rename zarr_shard_size to n_obs_pe_shard

de256cb

ilan-gold reviewed Mar 16, 2026

View reviewed changes

Comment thread src/annbatch/io.py Outdated

Comment thread src/annbatch/io.py

Comment thread src/annbatch/io.py

Comment thread src/annbatch/io.py Outdated

Comment thread src/annbatch/io.py Outdated

felix0097 and others added 3 commits March 16, 2026 16:40

Update src/annbatch/io.py

5fdc082

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update src/annbatch/io.py

26a4b89

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Rename args + add support for awkward arrays

153c647

felix0097 requested a review from ilan-gold March 16, 2026 16:05

ilan-gold reviewed Mar 16, 2026

View reviewed changes

Comment thread src/annbatch/io.py Outdated

Comment thread src/annbatch/io.py Outdated

Comment thread tests/test_preshuffle.py

felix0097 added 2 commits March 16, 2026 17:29

chore: rename variables

c8b77f8

Add lower bound to size_param test

046f0da

ilan-gold reviewed Mar 17, 2026

View reviewed changes

Comment thread tests/test_preshuffle.py Outdated

felix0097 added 4 commits March 17, 2026 11:37

Clamp chunk and shard size to dataset size

02ffbd5

Merge branch 'main' into ff/automatic-size-estimation

01574ca

remove print statements

b72989d

Remove unnessary test case

3d55e19

felix0097 requested a review from ilan-gold March 17, 2026 10:49

ilan-gold approved these changes Mar 17, 2026

View reviewed changes

Update change log

a365acf

felix0097 merged commit fac9795 into main Mar 18, 2026
12 checks passed

felix0097 deleted the ff/automatic-size-estimation branch March 18, 2026 10:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Supply size string for `n_obs_per_dataset`#159

Feat: Supply size string for `n_obs_per_dataset`#159
felix0097 merged 37 commits intomainfrom
ff/automatic-size-estimation

felix0097 commented Mar 10, 2026

Uh oh!

codecov Bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

ilan-gold commented Mar 12, 2026

Uh oh!

ilan-gold left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

review-notebook-app Bot commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilan-gold Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

felix0097 commented Mar 10, 2026

Uh oh!

codecov Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ilan-gold commented Mar 12, 2026

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

review-notebook-app Bot commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilan-gold Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Mar 10, 2026 •

edited

Loading