Add obsm, etc to ExperimentAxisQuery #179

ebezzi · 2023-11-06T23:44:54Z

Adds obsm, obsp, varm, varp to ExperimentAxisQuery._read and .to_anndata.

Unit tests can be found here: single-cell-data/TileDB-SOMA#1934

ebezzi · 2023-11-06T23:53:45Z

python-spec/src/somacore/query/query.py

+        joinids = getattr(self._joinids, axis.value)
+        return axism[layer].read((joinids, col_joinids))
+
+    def _axism_inner_csr(


This method exists purely to be passed to _read, which loads everything in memory. This works for now but I am not sure this is the right format we want for obsm. In principle, for obsm it would be better to use a csc matrix since it has way more rows than column. Most embeddings though are dense, so maybe we could just use a numpy array here?

I suggest looking at the AnnData documentation. obsm/varm/obsp/varp are ndarray in typical use, and I don't see any reason they would not be here... Using a sparse matrix is likely to create incompatibility with downstream user assumptions.

Good point, for some reason I thought AnnData didn't enforce the typing here, but it looks like it does. I'll convert to ndarray.

its more than "enforce typing" - it is conceptually a dense array in the AnnData data model (but not in SOMA)

bkmartinjr · 2023-11-22T18:39:32Z

python-spec/src/somacore/query/query.py

    def to_anndata(
        self,
        X_name: str,
        *,
        column_names: Optional[AxisColumnNames] = None,
        X_layers: Sequence[str] = (),
+        obsm_keys: Sequence[str] = [],


Need to use consistent names for X and obsm:

X_layers

obsm_keys

Suggest simply using _layers

Also, use a tuple, not a list, as the default, a la X_layers. I'm surprised this passed mypy linting...

Also, missing the rest of the layers (varm, etc). Any reason we should not add all of it in one pass?

It is probably worth thinking about how to do this, e.g.,

simply add all four to the function as optional args,

or, add a dict-like structure a la column_names.

@thetorpedodog - style suggestions?

Before I add the other layers, I'd like the approach to be validated

By the way, it looks like AnnData uses "keys" to define the members of obsm, etc. layers seems to specifically refer to X. I'll leave the naming as is right now, but open to more discussion.

python-spec/src/somacore/query/query.py

bkmartinjr

several concerns:

this doesn't produce a "legal" AnnData, which specifies that obsm/obsp/varm/varp are ndarray
the purpose/signature of query._read seems to morphed over time (not just from this PR), such that the docstrings and sig don't match the original "produces pure Arrow objects" intent. We should make an intentional decision to either a) dedicate this code path to the to_anndata use case exclusively, or b) refactor it back to generating Arrow objects, with the conversion to AnnData wrapped around it. I'm OK with either.

python-spec/src/somacore/query/query.py

thetorpedodog

I’m liking what I see in general. I have a few points of feedback, both API design–related and otherwise.

python-spec/src/somacore/query/query.py

thetorpedodog · 2023-11-30T16:14:25Z

python-spec/src/somacore/query/query.py

+        obsm = dict()
+        for key in obsm_keys:
+            obsm[key] = self._axism_inner_ndarray(_Axis.OBS, key)


non–API style note: this can be made into a comprehension:

obsm = { key: self._axism...(..., key) }

thetorpedodog · 2023-11-30T16:17:24Z

python-spec/src/somacore/query/query.py

+        joinids = getattr(self._joinids, axis.value)
+        return axism[layer].read((joinids, slice(None)))
+
+    def _convert_to_ndarray(self, is_obs: bool, T: pa.Table, n_row: int, n_col: int) -> np.ndarray:


More non–API design feedback: to avoid boolean parameters, I would make this either the _Axis you’re using, or pass in the indexer function (looking like self._convert_to_ndarray(self.indexer.by_obs, ...)). Also, to avoid confusion with type names or generic parameters, prefer lowercase for parameter / variable names.

thetorpedodog · 2023-11-30T16:22:28Z

python-spec/src/somacore/query/query.py

+        is_obs = axis is _Axis.OBS
+        n_row = n_col = len(self._joinids.obs) if is_obs else len(self._joinids.var)


this is not something for you to do; just thinking aloud here:

it might make sense to add a function to _Axis that gets the given attribute, so that this could look something like:

n_row = n_col = axis.getattr_from(self._joinids)

so it does self._joinids.obs and self._joinids.var by itself without you having to specify (and thus avoids potential obs/var switcheroos)

(maybe also have it take a suffix, so you could say axis.getattr_from(something, "m") to get something.obsm/something.varm)

decided to see what this would look like here #183

This would be great - I approved #183. If you want to merge it, I can accommodate the changes here, otherwise we can do it later.

This aims to eliminate the annoyance (and potential error) of writing thing = whatever.obs if axis is _Axis.OBS else whatever.var by letting you say thing = axis.getattr_from(whatever) instead.

python-spec/src/somacore/query/query.py

thetorpedodog

A few minor style suggestions; nothing critical. Looks good!

thetorpedodog · 2023-12-01T14:36:49Z

python-spec/src/somacore/query/query.py

+        obsm = obsm_ft.result()
+        obsp = obsp_ft.result()
+        varm = varm_ft.result()
+        varp = varp_ft.result()


It seems like these could be passed as the named arguments to _AxisQueryResult without the need for a temporary variable.

thetorpedodog · 2023-12-01T14:39:27Z

python-spec/src/somacore/query/query.py

+        if key not in self._ms:
+            raise ValueError(f"Measurement does not contain {key} data")
+
+        axism = axis.getitem_from(self._ms, suf="m")
+        if not (layer and layer in axism):
+            raise ValueError(f"Must specify '{key}' layer")


In both of these cases, you can use EAFP style:

try: axism = axis.getitem_from(...) except KeyError as ke: raise ValueError(...) from ke # or maybe `from None` try: axism_layer = axism[layer] except KeyError as ke: ...

thetorpedodog · 2023-12-01T14:40:03Z

python-spec/src/somacore/query/query.py

+        if not isinstance(axism[layer], data.SparseNDArray):
+            raise TypeError(f"Unexpected SOMA type stored in '{key}' layer")
+
+        joinids = getattr(self._joinids, axis.value)


axis.getitem_from(self._joinids)

thetorpedodog · 2023-12-01T14:40:29Z

python-spec/src/somacore/query/query.py

+    ) -> np.ndarray:
+        indexer: pd.Index = axis.getattr_from(self.indexer, pre="by_")
+        idx = indexer(table["soma_dim_0"])
+        Z = np.zeros(n_row * n_col, dtype=np.float32)


nit: recommend making z lowercase

johnkerl

🚢

johnkerl

🚢

pablo-gar · 2023-12-01T16:15:08Z

python-spec/src/somacore/query/query.py

+            _read_axis_mappings, self._axism_inner_ndarray, _Axis.OBS, obsm_keys
+        )
+        obsp_ft = self._threadpool.submit(
+            _read_axis_mappings, self._axisp_inner_ndarray, _Axis.OBS, obsp_keys


@bkmartinjr and @ebezzi I believe you had a conversation about this offline (or maybe here but I don't see it). We know that the most common uses case of axisp arrays are numerically sparse, and even though anndata's schema says they should be numpy dense arrays, scanpy's methods fill it in with scipy sparse matrices.

Ultimately we would like to move to a world where either AnnData schema takes both (dense and sparse) or only sparse. Given that the ecosystem already violates the schema in favor of better numerical representation I lean towards tiledbsoma also violating the schema, and then we request AnnData's schema to be relaxed.

What are your thoughts?

In principle I have no issues. You might want to ping Isaac V. and see what he thinks

Thanks I wanted to make sure I didn't miss anything important. I don't see it as a blocker for now, I will file a few issues (here and in AnnData) to move towards the support of sparse matrices in the axisp arrays.

Also look at this page, where they claim that obsm etc can be sparse. This is relative to the on-disk format, but I don't believe there is anything that converts them when loading in memory.

pablo-gar

LGTM from an API signature viewpoint!

aaronwolen

🙏🏻

Add obsm, etc to ExperimentAxisQuery

068afdb

ebezzi requested a review from johnkerl November 6, 2023 23:44

ebezzi added 2 commits November 6, 2023 15:45

Remove comments

c22a140

linter

7f792dc

ebezzi commented Nov 6, 2023

View reviewed changes

ebezzi requested a review from bkmartinjr November 6, 2023 23:56

johnkerl mentioned this pull request Nov 21, 2023

ExperimentAxisQuery._read() and ExperimentAxisQuery.to_anndata() should read and export, respectively, obsm varm obsp and varp. #177

Closed

bkmartinjr requested a review from pablo-gar November 22, 2023 18:38

bkmartinjr reviewed Nov 22, 2023

View reviewed changes

bkmartinjr requested a review from thetorpedodog November 22, 2023 18:44

bkmartinjr reviewed Nov 22, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Outdated Show resolved Hide resolved

bkmartinjr reviewed Nov 22, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Outdated Show resolved Hide resolved

bkmartinjr reviewed Nov 22, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Outdated Show resolved Hide resolved

bkmartinjr reviewed Nov 22, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Outdated Show resolved Hide resolved

bkmartinjr requested changes Nov 22, 2023

View reviewed changes

ebezzi added 5 commits November 27, 2023 16:44

Fully implement keys + use ndarrays

dbdeb57

Remove type subscription

2c89128

Bugfixes

f9ca66c

type subscription again

fb05ebd

whitespace

862ed4c

ebezzi requested a review from bkmartinjr November 29, 2023 17:10

ebezzi mentioned this pull request Nov 29, 2023

[python] Add unit tests for obsm, obsp, and to_anndata single-cell-data/TileDB-SOMA#1934

Merged

bkmartinjr requested a review from mlin November 29, 2023 17:13

ebezzi marked this pull request as ready for review November 29, 2023 17:14

Add AnnData docstrings

34236eb

bkmartinjr reviewed Nov 29, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Outdated Show resolved Hide resolved

bkmartinjr reviewed Nov 29, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Show resolved Hide resolved

bkmartinjr reviewed Nov 29, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Show resolved Hide resolved

bkmartinjr reviewed Nov 29, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Show resolved Hide resolved

bkmartinjr reviewed Nov 29, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Outdated Show resolved Hide resolved

PR comments, DRY

9c4b54f

bkmartinjr reviewed Nov 30, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Outdated Show resolved Hide resolved

thetorpedodog reviewed Nov 30, 2023

View reviewed changes

thetorpedodog and others added 4 commits November 30, 2023 12:00

Add _Axis.getattr_from and _Axis.getitem_from.

164e7e7

This aims to eliminate the annoyance (and potential error) of writing thing = whatever.obs if axis is _Axis.OBS else whatever.var by letting you say thing = axis.getattr_from(whatever) instead.

PR review, part 1

e63d334

Linter

a4ecf94

Thread pool

990530a

ebezzi commented Nov 30, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Show resolved Hide resolved

refactor _AxisQueryResult

afaafb8

ebezzi requested a review from bkmartinjr November 30, 2023 19:48

Rename variables

0575f7d

bkmartinjr reviewed Nov 30, 2023

View reviewed changes

python-spec/src/somacore/query/query.py Show resolved Hide resolved

ebezzi added 2 commits November 30, 2023 14:51

Move parallel computation early

af2f1ca

Merge branch 'axis-getters' into ebezzi/add-obsm-etc

879378f

ebezzi mentioned this pull request Nov 30, 2023

Add _Axis.getattr_from and _Axis.getitem_from. #183

Merged

Use new getattr_from

32076c6

ebezzi requested a review from bkmartinjr November 30, 2023 23:10

Use new getattr_from (for real, this time)

9a505b1

bkmartinjr approved these changes Dec 1, 2023

View reviewed changes

thetorpedodog approved these changes Dec 1, 2023

View reviewed changes

johnkerl reviewed Dec 1, 2023

View reviewed changes

johnkerl approved these changes Dec 1, 2023

View reviewed changes

pablo-gar reviewed Dec 1, 2023

View reviewed changes

pablo-gar approved these changes Dec 1, 2023

View reviewed changes

ebezzi added 2 commits December 1, 2023 08:54

Last minute changes

ce42934

keys -> layers

94d4fd7

aaronwolen approved these changes Dec 4, 2023

View reviewed changes

ebezzi merged commit 10dc344 into main Dec 4, 2023
6 checks passed

ebezzi deleted the ebezzi/add-obsm-etc branch December 4, 2023 17:24

johnkerl mentioned this pull request Dec 4, 2023

Provide obsm/varm from ExperimentAxisQuery, as obsp/varp #178

Closed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add obsm, etc to ExperimentAxisQuery #179

Add obsm, etc to ExperimentAxisQuery #179

ebezzi commented Nov 6, 2023 •

edited

Loading

ebezzi Nov 6, 2023

bkmartinjr Nov 22, 2023

ebezzi Nov 22, 2023

bkmartinjr Nov 22, 2023

bkmartinjr Nov 22, 2023 •

edited

Loading

bkmartinjr Nov 22, 2023 •

edited

Loading

ebezzi Nov 22, 2023

ebezzi Nov 28, 2023

bkmartinjr left a comment

thetorpedodog left a comment

thetorpedodog Nov 30, 2023

thetorpedodog Nov 30, 2023

thetorpedodog Nov 30, 2023

thetorpedodog Nov 30, 2023

ebezzi Nov 30, 2023

thetorpedodog left a comment

thetorpedodog Dec 1, 2023

thetorpedodog Dec 1, 2023

thetorpedodog Dec 1, 2023

thetorpedodog Dec 1, 2023

johnkerl left a comment

johnkerl left a comment

pablo-gar Dec 1, 2023 •

edited

Loading

bkmartinjr Dec 1, 2023

pablo-gar Dec 1, 2023

ebezzi Dec 1, 2023

pablo-gar left a comment

aaronwolen left a comment

		is_obs = axis is _Axis.OBS
		n_row = n_col = len(self._joinids.obs) if is_obs else len(self._joinids.var)

Add obsm, etc to ExperimentAxisQuery #179

Add obsm, etc to ExperimentAxisQuery #179

Conversation

ebezzi commented Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmartinjr Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

bkmartinjr Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmartinjr left a comment

Choose a reason for hiding this comment

thetorpedodog left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thetorpedodog left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnkerl left a comment

Choose a reason for hiding this comment

johnkerl left a comment

Choose a reason for hiding this comment

pablo-gar Dec 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablo-gar left a comment

Choose a reason for hiding this comment

aaronwolen left a comment

Choose a reason for hiding this comment

ebezzi commented Nov 6, 2023 •

edited

Loading

bkmartinjr Nov 22, 2023 •

edited

Loading

bkmartinjr Nov 22, 2023 •

edited

Loading

pablo-gar Dec 1, 2023 •

edited

Loading