(feat): Aggregation via group-by in `sc.get` #2590

ilan-gold · 2023-08-03T15:46:47Z

Modified (for scanpy) version of scverse/anndata#564. Fixes scverse/anndata#556

Big points of change:

No more tuple-indices and related functionality (i.e., scoring pairwise)
Allow for obs and var group-by +varm, obsm, layers as options for data to aggregate
Output is AnnData object instead of DataFrame
scanpy-style public API

TODO (by @ivirshup):

Necessary:

Docs
Aggregate along other axis
Keep grouping cols in result
Reconsider API for non-anndata version (maybe return a dict of arrays?)
Decide on naming convention for "nonzero" variations, should this be "nonzero_count" so it's a little like "nanmean"

Optional, can do later:

Weighted (although.... Idk, maybe can skip. Does "weights" affect "count_nonzero"?)
Option for keeping around unseen groups, probably needs fill_value argument for those values
Support for obsm, varm
Directly pass Series to groupby
More aggregation functions (mean_nonzero, min, max, std, nan* variations)
Mask argument
Dask support

codecov · 2023-08-03T16:04:16Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (b4ba81d) 74.62% compared to head (0a7cf85) 74.83%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2590      +/-   ##
==========================================
+ Coverage   74.62%   74.83%   +0.20%     
==========================================
  Files         115      116       +1     
  Lines       12763    12891     +128     
==========================================
+ Hits         9525     9647     +122     
- Misses       3238     3244       +6

Files	Coverage Δ
scanpy/_utils/__init__.py	`68.68% <100.00%> (+0.48%)`	⬆️
scanpy/get/__init__.py	`100.00% <100.00%> (ø)`
scanpy/get/_aggregated.py	`95.04% <95.04%> (ø)`

…o split_apply_combine

ilan-gold · 2023-08-04T14:24:21Z

Ok @ivirshup I think we're ready to go here. Thanks for the guidance!

flying-sheep

Please either revert 0aef147 or allow both as in #2771, or wait for scverse/anndata#1245

also as I asked before: why go away from dataclasses?

ivirshup · 2024-01-11T14:25:29Z

also as I asked before: why go away from dataclasses?

I don't think that switching away from data classes removed any meaningful functionality here, but having to use default_factory, InitVar, and/or __post_init__ would add more complexity.

I don't think that there being some internal data classes is important here, especially since it's not user visible and may change at any time anyways. I have a few ideas for ways to change the implementation to add more methods, none of which are compatible with Aggregate being a data class.

One path forward just removes the class entirely, since it doesn't do much now
The other uses a number of cached properties, which I don't think make a ton of sense to use with dataclasses

Is there some functionality the data class was adding that I'm missing?

flying-sheep

Regarding data classes, as said in person: I tend to use them when

they basically just mean I can delete the __init__ function and maybe add a = field(...)
instances of the class actually end up as dict keys, being compared or so

The former is almost the case, at the time I wrote it, it was the case.

Let’s get the consensus axis design into this!

scanpy/get/_aggregated.py

flying-sheep · 2024-02-20T09:18:42Z

sparse_indicator doesn’t have its weights branches hit at all, maybe we should remove that? Or will this be used at some point?

ivirshup · 2024-02-20T09:49:46Z

sparse_indicator doesn’t have its weights branches hit at all, maybe we should remove that? Or will this be used at some point?

I think it will be used at some point, but also happy to remove.

I think parameterizing test_aggregate_axis_specification is overkill for what the test does.

flying-sheep · 2024-02-20T09:55:47Z

I think it will be used at some point, but also happy to remove.

OK, good to know! Then this PR is fine as far as I’m concerned.

ilan-gold added 9 commits August 3, 2023 11:08

(chore): migrate anndata PR

1d4bfd9

(feat): add option for custom data

9eb1993

(chore): remove pair scoring

e1c7eef

(feat): change return types to AnnData

31beb0d

(feat): keep superset columns.

668e725

(chore): remove explode option (i.e., tuples)

b23dd9c

(feat): first pass at var/obs

6177857

(chore): add temporary note for now

b9d75f9

(chore): change df_key -> groupby_df_key

43b5d3f

ilan-gold marked this pull request as draft August 3, 2023 15:47

ilan-gold added 2 commits August 4, 2023 09:49

(chore): clean up public/private methods and do some renaming

2399a5c

(chore): black

f02dacc

ilan-gold marked this pull request as ready for review August 4, 2023 08:03

ilan-gold added 11 commits August 4, 2023 12:14

(feat): refactor to allow for functional API

a50ea3b

(style): use bool for writing to obsm

0060e0e

(refactor): AnnData object separate from groupby

5a56c6a

(chore): export aggregated_from_array

61b9345

(refactor): remove GroupBy dependence of df

6408daf

(chore): black

ed95373

(chore): g(G)roupby -> a(A)ggregated

f1b9d4d

(style): small docstring changes + export docs

8faeec8

(chore): small doc fix

10b2056

(fix): replace Union in singledispatch with classes

225ee79

Merge branch 'master' into split_apply_combine

df78e73

ilan-gold changed the title ~~(feat): GroupBy Aggregation in sc.get~~ (feat): Aggregation via groupby in sc.get Aug 4, 2023

ilan-gold changed the title ~~(feat): Aggregation via groupby in sc.get~~ (feat): Aggregation via group-by in sc.get Aug 4, 2023

ilan-gold added 2 commits August 4, 2023 16:23

(fix): remove final menions of score and other small doc fixes

b8e4fda

Merge branch 'split_apply_combine' of github.com:ilan-gold/scanpy int…

014159d

…o split_apply_combine

ivirshup added 6 commits December 11, 2023 16:20

Remove dead code

bbdbb4c

Remove code for handling weighted mean and variance (put off for later)

edfe57d

Remove change to pyproject.toml

6c7892f

support for obsm/ varm

3764a7f

dim -> axis

0aef147

Add mask argument

062eea9

flying-sheep requested changes Dec 12, 2023

View reviewed changes

ivirshup added 3 commits December 12, 2023 17:44

release note

86532ac

Merge branch 'master' into split_apply_combine

c413e2f

Merge branch 'master' into split_apply_combine

c81e4df

Merge branch 'master' into split_apply_combine

18bdd0b

flying-sheep requested changes Jan 16, 2024

View reviewed changes

scanpy/get/_aggregated.py Outdated Show resolved Hide resolved

scanpy/get/_aggregated.py Outdated Show resolved Hide resolved

scanpy/get/_aggregated.py Outdated Show resolved Hide resolved

ivirshup added 5 commits February 19, 2024 13:11

Merge branch 'master' into split_apply_combine

fd0f659

add support for semantic axis

f89c4d3

Fixup signature

30a2f2a

fix error message

f6d5ac9

fix docs error

5f2d063

ivirshup requested a review from flying-sheep February 19, 2024 15:49

flying-sheep added 5 commits February 20, 2024 09:00

even better formatting

d910e74

Some type fixes

ab5aa80

test style

cf9c1eb

test names

64dd227

parametrize test_aggregate_axis_specification

776f420

flying-sheep approved these changes Feb 20, 2024

View reviewed changes

Merge branch 'master' into split_apply_combine

0a7cf85

ivirshup merged commit 383a61b into scverse:master Feb 20, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat): Aggregation via group-by in `sc.get` #2590

(feat): Aggregation via group-by in `sc.get` #2590

ilan-gold commented Aug 3, 2023 •

edited by ivirshup

codecov bot commented Aug 3, 2023 •

edited

ilan-gold commented Aug 4, 2023 •

edited

flying-sheep left a comment

ivirshup commented Jan 11, 2024 •

edited

flying-sheep left a comment

flying-sheep commented Feb 20, 2024

ivirshup commented Feb 20, 2024

flying-sheep commented Feb 20, 2024

(feat): Aggregation via group-by in sc.get #2590

(feat): Aggregation via group-by in sc.get #2590

Conversation

ilan-gold commented Aug 3, 2023 • edited by ivirshup

TODO (by @ivirshup):

codecov bot commented Aug 3, 2023 • edited

Codecov Report

ilan-gold commented Aug 4, 2023 • edited

flying-sheep left a comment

Choose a reason for hiding this comment

ivirshup commented Jan 11, 2024 • edited

flying-sheep left a comment

Choose a reason for hiding this comment

flying-sheep commented Feb 20, 2024

ivirshup commented Feb 20, 2024

flying-sheep commented Feb 20, 2024

(feat): Aggregation via group-by in `sc.get` #2590

(feat): Aggregation via group-by in `sc.get` #2590

ilan-gold commented Aug 3, 2023 •

edited by ivirshup

codecov bot commented Aug 3, 2023 •

edited

ilan-gold commented Aug 4, 2023 •

edited

ivirshup commented Jan 11, 2024 •

edited