Skip to content

X shape check slices to first 2 shape values#2432

Open
eroell wants to merge 4 commits into
scverse:mainfrom
eroell:x-shape-slice-ndim
Open

X shape check slices to first 2 shape values#2432
eroell wants to merge 4 commits into
scverse:mainfrom
eroell:x-shape-slice-ndim

Conversation

@eroell
Copy link
Copy Markdown
Contributor

@eroell eroell commented May 10, 2026

  • Release note not necessary because:

Before

higher-than-2D X was rejected at construction; you could not hold a higher-dimensional X in memory:

import numpy as np
import anndata as ad
ad.AnnData(X=np.zeros((3, 4, 5)))
# ValueError: too many values to unpack (expected 2)
#   (from `n_obs, n_vars = X.shape`)

adata = ad.AnnData(X=np.zeros((3, 4)))
adata.X = np.zeros((3, 4, 5))
# FutureWarning: Automatic reshaping when setting X will be removed ...
# -> then ValueError, since (3, 4, 5) can't be reshaped to (3, 4)

higher-than-2D layers slipped through the in-memory shape check (only axes 0 and 1 are validated) and the writer happily wrote them to disk, silently violating the spec.

After

In-memory: higher-than-2D X and layers are explicitly allowed (the shape check uses X.shape[:2]).
Writing a non-2D X / layer now hard-fails.
Reading a non-conforming file warns but still succeeds:

adata = ad.AnnData(X=np.zeros((3, 4, 5)))   # now OK in memory
adata.X = np.zeros((3, 4, 5))               # also OK
adata.layers["L"] = np.zeros((3, 4, 5))     # also OK

adata.write_h5ad("out.h5ad")
# ValueError: X must be 2-dimensional, but got an array with shape
# (3, 4, 5) (ndim=3). Storing higher-dimensional arrays in `X` or
# `layers` violates the AnnData specification.

ad.read_h5ad("legacy_non_conforming.h5ad")
# UserWarning: X must be 2-dimensional, but got an array with shape ...
# -> still returns the AnnData

Same applies to layers["L"] (error/warning message says Layer 'L' instead of X), and to the zarr IO path.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

❌ Patch coverage is 90.19608% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.73%. Comparing base (3dfada6) to head (098e6cf).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/anndata/_io/zarr.py 57.14% 3 Missing ⚠️
src/anndata/_core/storage.py 93.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2432      +/-   ##
==========================================
+ Coverage   85.70%   85.73%   +0.03%     
==========================================
  Files          49       49              
  Lines        7806     7851      +45     
==========================================
+ Hits         6690     6731      +41     
- Misses       1116     1120       +4     
Files with missing lines Coverage Δ
src/anndata/_core/anndata.py 86.81% <100.00%> (ø)
src/anndata/_io/h5ad.py 93.45% <100.00%> (+0.22%) ⬆️
src/anndata/_io/specs/methods.py 91.41% <100.00%> (+0.04%) ⬆️
src/anndata/_core/storage.py 94.52% <93.33%> (-0.83%) ⬇️
src/anndata/_io/zarr.py 80.39% <57.14%> (-1.06%) ⬇️

@eroell eroell marked this pull request as ready for review May 11, 2026 12:57
@flying-sheep flying-sheep added this to the 0.12.15 milestone May 18, 2026
@ilan-gold ilan-gold modified the milestones: 0.12.15, 0.12.17 May 18, 2026
shape = getattr(value, "shape", None)
if shape is None:
return None
ndim = len(shape)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also means that an array of shape (m, n, 1) can't be written to disk anymore, before this was ok.

Do you want to allow this or not?

@eroell
Copy link
Copy Markdown
Contributor Author

eroell commented May 19, 2026

Seems I can't request reviews, but I'd be interested in your comments at this stage :)

# Older / non-conforming files may contain higher-dimensional `X` or
# `layers`. The on-disk spec forbids that; surface it as a warning so
# the user knows, but still construct the AnnData with what's there.
_warn_if_x_or_layers_3d_kwargs(d)
Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this warning into the construction of the AnnData object instead of reading (also in the other spots) so that people who do this in-memory will know their data is technically unwritable

Copy link
Copy Markdown
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much to add to what Ilan said, thank you this looks clean!

Comment on lines +606 to 609
if hasattr(value, "shape") and value.shape[:2] != self.shape:
msg = "Automatic reshaping when setting X will be removed in the future."
warn(msg, FutureWarning)
value = value.reshape(self.shape)
Copy link
Copy Markdown
Member

@flying-sheep flying-sheep May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the error thrown by value.reshape(self.shape) is probably ugly/misleading if value.ndim > self.ndim, no? Please check and if the error is confusing, and manually throw a clearer one in that case.

Comment thread tests/test_x_layers_2d.py
Comment on lines +27 to +34
DISK_FORMATS = [
pytest.param("h5ad", id="h5ad"),
pytest.param("zarr", id="zarr"),
]
WHICH_ATTRS = [
pytest.param("X", id="X"),
pytest.param("layers", id="layers"),
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can also just be (parametrized) fixtures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

layers can be 3D on-disk

3 participants