AnnData raw is a reference when a copy is expected #1139

gtca · 2023-09-12T23:00:15Z

Please make sure these conditions are met

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of anndata.
(optional) I have confirmed this bug exists on the master branch of anndata.

Report

Code:

import numpy as np
from anndata import AnnData

adata = AnnData(np.arange(1000).reshape(-1, 100))
adata.raw = adata

id(adata.raw.X) == id(adata.X)  # => True

With anndata v0.8, for instance, the last line returns False.

This causes issues like this one: scverse/muon#126.

Versions

Version info

-----
anndata             0.10.0rc1
muon                0.1.5
numpy               1.24.4
session_info        1.0.0
-----
PIL                 10.0.0
colorama            0.4.6
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
exceptiongroup      1.1.3
h5py                3.9.0
joblib              1.3.2
kiwisolver          1.4.5
llvmlite            0.40.1
matplotlib          3.7.2
mpl_toolkits        NA
mudata              0.2.3
natsort             8.4.0
numba               0.57.1
packaging           23.1
pandas              2.1.0
patsy               0.5.3
pkg_resources       NA
pynndescent         0.5.10
pyparsing           3.0.9
pytz                2023.3.post1
scanpy              1.9.3
scipy               1.11.2
seaborn             0.12.2
six                 1.16.0
sklearn             1.3.0
statsmodels         0.14.0
threadpoolctl       3.2.0
tqdm                4.66.1
typing_extensions   NA
umap                0.5.3
yaml                6.0
zoneinfo            NA
-----
Python 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:25:38) [Clang 11.1.0 ]
macOS-11.7.4-x86_64-i386-64bit
-----

The text was updated successfully, but these errors were encountered:

ivirshup · 2023-10-04T14:07:04Z

I'm not sure I would consider this a bug. Assigning to other attributes of an AnnData doesn't cause a copy, and this would be a pretty large copy to make.

You can instead do adata.raw = adata.copy() if you need data to be copied.

gtca · 2023-10-04T14:14:39Z

I guess it's mainly the (unintended?) behaviour change that is a bit of an issue for the users, e.g. as seen in the linked issue.

Scanpy tutorials like the PBMC 3k one feature the code adata.raw = adata, which creates a problem when it is not a copy.

flying-sheep · 2023-10-06T09:39:26Z

Would be easy to fix:

@raw.setter
def _set_raw(self, raw: AnnData | Raw) -> None:
    if self is raw:
        raw = raw.copy()
    ...

I don’t things adata.raw is adata ever makes sense

ivirshup · 2023-10-06T11:10:09Z

I think this behavior is expected, and the old behavior was wrong. We generally try not to make too many defensive copies due to the large performance hit.

adata.raw is adata

I don't think that happens? The behavior being described is that adata.raw.X is adata.X can happen, which I think is expected. Similarly: AnnData(X).X is X, adata.X = X; adata.X is X, adata.obsm = d; adata.obsm["mtx"] is d["mtx"].

gtca · 2023-10-06T11:31:03Z

@ivirshup Nevertheless I think the issue for the user persists as adata.raw = adata is in many tutorials, even on scanpy.rtfd.io. And considering the in-place modifications in workflows this is quite an issue in the end.

Given this, I think this behaviour change should come with some note or a warning (not sure there was one?).

Should we now open a cascade of issues to fix the tutorials? That's another topic but shouldn't we aim towards deprecating raw altogether?

Just searching adata.raw = adata line on GitHub returns 1.1k files that seem to have been relying on the old behaviour.

flying-sheep · 2023-10-06T11:37:46Z

The behavior being described is that adata.raw.X is adata.X can happen

When X is a mutable DataFrame, that’s a problem (one of the reasons I didn’t want to allow data frames as .X)

shouldn't we aim towards deprecating raw altogether?

I’d say yes

ivirshup · 2023-10-06T11:38:01Z

Shouldn't we aim towards deprecating raw altogether?

Yes.

Should we now open a cascade of issues to fix the tutorials?

Maybe a comment in the tutorial would be appropriate? I don't think we should be recommending that people make big copies when it's not necessary.

I think typically there is no unintended mutation here since people subset the anndata after assigning to raw, which makes adata.X is not adata.raw.X. Sure this can lead to unintentional mutation when adata.raw = adata and then you immediately mutate X, but then you're skipping the use case for raw (different shape).

I would like to replace the scanpy tutorial with the new tutorial, but the tutorial workflow got changed and needs fixing.

ivirshup · 2023-10-06T14:20:33Z

When X is a mutable DataFrame

@flying-sheep Can X be a dataframe? I'm pretty sure we aren't supporting that in X or layers?

gtca added the Bug 🐛 label Sep 12, 2023

gtca mentioned this issue Sep 12, 2023

raw is also normalized with clr scverse/muon#126

Closed

ivirshup added question and removed Bug 🐛 labels Oct 4, 2023

ivirshup closed this as not planned Won't fix, can't repro, duplicate, stale Oct 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnnData raw is a reference when a copy is expected #1139

AnnData raw is a reference when a copy is expected #1139

gtca commented Sep 12, 2023 •

edited

Loading

ivirshup commented Oct 4, 2023

gtca commented Oct 4, 2023

flying-sheep commented Oct 6, 2023

ivirshup commented Oct 6, 2023

gtca commented Oct 6, 2023 •

edited

Loading

flying-sheep commented Oct 6, 2023 •

edited

Loading

ivirshup commented Oct 6, 2023

ivirshup commented Oct 6, 2023

AnnData raw is a reference when a copy is expected #1139

AnnData raw is a reference when a copy is expected #1139

Comments

gtca commented Sep 12, 2023 • edited Loading

Please make sure these conditions are met

Report

Versions

ivirshup commented Oct 4, 2023

gtca commented Oct 4, 2023

flying-sheep commented Oct 6, 2023

ivirshup commented Oct 6, 2023

gtca commented Oct 6, 2023 • edited Loading

flying-sheep commented Oct 6, 2023 • edited Loading

ivirshup commented Oct 6, 2023

ivirshup commented Oct 6, 2023

gtca commented Sep 12, 2023 •

edited

Loading

gtca commented Oct 6, 2023 •

edited

Loading

flying-sheep commented Oct 6, 2023 •

edited

Loading