Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): Aggregation via group-by in sc.get #2590

Merged
merged 105 commits into from Feb 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
1d4bfd9
(chore): migrate `anndata` PR
ilan-gold Aug 3, 2023
9eb1993
(feat): add option for custom data
ilan-gold Aug 3, 2023
e1c7eef
(chore): remove pair scoring
ilan-gold Aug 3, 2023
31beb0d
(feat): change return types to `AnnData`
ilan-gold Aug 3, 2023
668e725
(feat): keep `superset` columns.
ilan-gold Aug 3, 2023
b23dd9c
(chore): remove `explode` option (i.e., tuples)
ilan-gold Aug 3, 2023
6177857
(feat): first pass at `var`/`obs`
ilan-gold Aug 3, 2023
b9d75f9
(chore): add temporary note for now
ilan-gold Aug 3, 2023
43b5d3f
(chore): change `df_key` -> `groupby_df_key`
ilan-gold Aug 3, 2023
2399a5c
(chore): clean up public/private methods and do some renaming
ilan-gold Aug 4, 2023
f02dacc
(chore): `black`
ilan-gold Aug 4, 2023
a50ea3b
(feat): refactor to allow for functional API
ilan-gold Aug 4, 2023
0060e0e
(style): use `bool` for writing to `obsm`
ilan-gold Aug 4, 2023
5a56c6a
(refactor): `AnnData` object separate from groupby
ilan-gold Aug 4, 2023
61b9345
(chore): export `aggregated_from_array`
ilan-gold Aug 4, 2023
6408daf
(refactor): remove `GroupBy` dependence of `df`
ilan-gold Aug 4, 2023
ed95373
(chore): `black`
ilan-gold Aug 4, 2023
f1b9d4d
(chore): `g(G)roupby` -> `a(A)ggregated`
ilan-gold Aug 4, 2023
8faeec8
(style): small docstring changes + export docs
ilan-gold Aug 4, 2023
10b2056
(chore): small doc fix
ilan-gold Aug 4, 2023
225ee79
(fix): replace `Union` in `singledispatch` with classes
ilan-gold Aug 4, 2023
df78e73
Merge branch 'master' into split_apply_combine
ilan-gold Aug 4, 2023
b8e4fda
(fix): remove final menions of `score` and other small doc fixes
ilan-gold Aug 4, 2023
014159d
Merge branch 'split_apply_combine' of github.com:ilan-gold/scanpy int…
ilan-gold Aug 4, 2023
c3a0823
(fix): doc string
ilan-gold Aug 8, 2023
24b0749
Apply suggestions from code review
ilan-gold Aug 8, 2023
cf5e952
(chore): apply style/args changes
ilan-gold Aug 8, 2023
b0a449b
(chore): Remove defaults.
ilan-gold Aug 8, 2023
7c6727d
(chore): `black`
ilan-gold Aug 8, 2023
ee7a44c
Merge branch 'master' into split_apply_combine
ilan-gold Aug 8, 2023
4ad8f14
(chore): `how` -> `func`
ilan-gold Aug 8, 2023
d159166
(feat): allow for list of `AggType`s
ilan-gold Aug 8, 2023
068f13f
Merge branch 'split_apply_combine' of github.com:ilan-gold/scanpy int…
ilan-gold Aug 8, 2023
e9eede5
(chore): `pre-commit`
ilan-gold Aug 8, 2023
19a3da1
(fix): `|` -> `Union` for 3.8
ilan-gold Aug 8, 2023
231612c
Fix doc formatting
flying-sheep Aug 14, 2023
99553ad
more doc format fix
flying-sheep Aug 14, 2023
cdbc228
clearer test data
flying-sheep Aug 14, 2023
4b71da5
Update scanpy/get/_aggregated.py
ilan-gold Aug 14, 2023
4aa5e58
(chore): comment on power for sparse matrices
ilan-gold Aug 14, 2023
8d8eb1e
(chore): add `TODO` for old code
ilan-gold Aug 14, 2023
34b3f21
(style): refactor `_extract_indices` + `_filter_indices` with docstri…
ilan-gold Aug 14, 2023
b6dc026
(style): use generator for `all` argument
ilan-gold Aug 14, 2023
d1cc61b
(fix): use `Iterable` for `func` argument instead of `List`
ilan-gold Aug 14, 2023
57e17ee
(feat): check `layer` `obsm` `varm` combination
ilan-gold Aug 14, 2023
a2c08ab
(style): refactor to use fixtures for backing data
ilan-gold Aug 14, 2023
c5e5b94
(style): don't use spreading in `AnnData` constructor
ilan-gold Aug 14, 2023
9c52e1f
Merge branch 'master' into split_apply_combine
ilan-gold Aug 14, 2023
8d78e36
typo
flying-sheep Aug 18, 2023
d70d086
simplify
flying-sheep Aug 18, 2023
6bac717
condense
flying-sheep Aug 18, 2023
99a3f2e
style
flying-sheep Aug 18, 2023
f89c4d3
minor deduplication
flying-sheep Aug 18, 2023
73aa963
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 18, 2023
025203f
Merge branch 'master' into split_apply_combine
flying-sheep Aug 21, 2023
88521f4
fix check
flying-sheep Aug 21, 2023
06ecfff
simpler
flying-sheep Aug 21, 2023
4ae27a6
(chore): dedup tests
ilan-gold Aug 22, 2023
9e8b983
Merge branch 'master' into split_apply_combine
ilan-gold Aug 22, 2023
6edf202
(chore): update comments/errors
ilan-gold Aug 22, 2023
cc39c01
Merge branch 'split_apply_combine' of github.com:ilan-gold/scanpy int…
ilan-gold Aug 22, 2023
ca2ba76
improve typing
flying-sheep Aug 22, 2023
ed72ef1
Merge branch 'master' into split_apply_combine
ilan-gold Aug 22, 2023
d114600
doc consistency
flying-sheep Aug 24, 2023
d30cd61
improve typing
flying-sheep Aug 24, 2023
834b9f9
correct caching
flying-sheep Aug 24, 2023
14b67f3
simplify
flying-sheep Aug 29, 2023
7872258
Merge branch 'master' into split_apply_combine
ivirshup Nov 27, 2023
d235487
* Don't allocate for indicator matrix
ivirshup Nov 27, 2023
9ae8f88
Start clean up
ivirshup Nov 28, 2023
461f049
Add array types to tests
ivirshup Nov 28, 2023
dc4e920
Fix typo
ivirshup Nov 28, 2023
9c7db29
Fix variance calculation
ivirshup Nov 28, 2023
7012042
Get tests passing
ivirshup Nov 29, 2023
487e4c5
count_nonzero + test speed
ivirshup Nov 29, 2023
b228451
Fix up mean_var
ivirshup Dec 5, 2023
6276f8c
Add test for dim
ivirshup Dec 6, 2023
4ea5471
Remove old test
ivirshup Dec 6, 2023
b78f6bc
Better dim handling + test
ivirshup Dec 6, 2023
560eae2
Retain input categories in result
ivirshup Dec 6, 2023
a173140
Examples in docs + some minor fixes
ivirshup Dec 6, 2023
5e71b63
Merge branch 'master' into split_apply_combine
ivirshup Dec 6, 2023
d00fea1
Rename from aggregated -> aggregate
ivirshup Dec 6, 2023
1f83713
Simplify non-anndata version
ivirshup Dec 6, 2023
bbdbb4c
Remove dead code
ivirshup Dec 11, 2023
edfe57d
Remove code for handling weighted mean and variance (put off for later)
ivirshup Dec 11, 2023
6c7892f
Remove change to pyproject.toml
ivirshup Dec 11, 2023
3764a7f
support for obsm/ varm
ivirshup Dec 11, 2023
0aef147
dim -> axis
ivirshup Dec 12, 2023
062eea9
Add mask argument
ivirshup Dec 12, 2023
86532ac
release note
ivirshup Dec 12, 2023
c413e2f
Merge branch 'master' into split_apply_combine
ivirshup Dec 12, 2023
c81e4df
Merge branch 'master' into split_apply_combine
ivirshup Jan 11, 2024
18bdd0b
Merge branch 'master' into split_apply_combine
ivirshup Jan 16, 2024
fd0f659
Merge branch 'master' into split_apply_combine
ivirshup Feb 19, 2024
f89c4d3
add support for semantic axis
ivirshup Feb 19, 2024
30a2f2a
Fixup signature
ivirshup Feb 19, 2024
f6d5ac9
fix error message
ivirshup Feb 19, 2024
5f2d063
fix docs error
ivirshup Feb 19, 2024
d910e74
even better formatting
flying-sheep Feb 20, 2024
ab5aa80
Some type fixes
flying-sheep Feb 20, 2024
cf9c1eb
test style
flying-sheep Feb 20, 2024
64dd227
test names
flying-sheep Feb 20, 2024
776f420
parametrize test_aggregate_axis_specification
flying-sheep Feb 20, 2024
0a7cf85
Merge branch 'master' into split_apply_combine
flying-sheep Feb 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/api/get.md
Expand Up @@ -19,5 +19,6 @@ useful formats.
get.obs_df
get.var_df
get.rank_genes_groups_df
get.aggregate

```
1 change: 1 addition & 0 deletions docs/release-notes/1.10.0.md
Expand Up @@ -15,6 +15,7 @@
* Enhanced dask support for some internal utilities, paving the way for more extensive dask support {pr}`2696` {smaller}`P Angerer`
* {func}`scanpy.pp.pca`, {func}`scanpy.pp.scale`, {func}`scanpy.pl.embedding`, and {func}`scanpy.experimental.pp.normalize_pearson_residuals_pca`
now support a `mask` parameter {pr}`2272` {smaller}`C Bright, T Marcella, & P Angerer`
* New function {func}`scanpy.get.aggregate` which allows grouped aggregations over your data. Useful for pseudobulking! {pr}`2590` {smaller}`Isaac Virshup` {smaller}`Ilan Gold` {smaller}`Jon Bloom`
* {func}`scanpy.tl.rank_genes_groups` no longer warns that it's default was changed from t-test_overestim_var to t-test {pr}`2798` {smaller}`L Heumos`
* {func}`scanpy.tl.leiden` now offers `igraph`'s implementation of the leiden algorithm via via `flavor` when set to `igraph`. `leidenalg`'s implementation is still default, but discouraged. {pr}`2815` {smaller}`I Gold`
* {func}`scanpy.pp.highly_variable_genes` has new flavor `seurat_v3_paper` that is in its implementation consistent with the paper description in Stuart et al 2018. {pr}`2792` {smaller}`E Roellin`
Expand Down
10 changes: 10 additions & 0 deletions scanpy/_utils/__init__.py
Expand Up @@ -874,3 +874,13 @@ def _choose_graph(adata, obsp, neighbors_key):
"to compute a neighborhood graph."
)
return neighbors["connectivities"]


def _resolve_axis(
axis: Literal["obs", 0, "var", 1],
) -> tuple[Literal[0], Literal["obs"]] | tuple[Literal[1], Literal["var"]]:
if axis in {0, "obs"}:
return (0, "obs")
if axis in {1, "var"}:
return (1, "var")
raise ValueError(f"`axis` must be either 0, 1, 'obs', or 'var', was {axis!r}")
4 changes: 2 additions & 2 deletions scanpy/get/__init__.py
@@ -1,7 +1,6 @@
# Public
# Private
from __future__ import annotations

from ._aggregated import aggregate
from .get import (
_check_mask,
_get_obs_rep,
Expand All @@ -15,6 +14,7 @@
"_check_mask",
"_get_obs_rep",
"_set_obs_rep",
"aggregate",
"obs_df",
"rank_genes_groups_df",
"var_df",
Expand Down