Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): igraph leiden implementation now included as an option in sc.tl.leiden #2815

Merged
merged 87 commits into from Feb 19, 2024
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
eba6a9a
(feat): igraph as option for leiden
ilan-gold Jan 12, 2024
519cad3
(feat): add test for similarity
ilan-gold Jan 15, 2024
25b6705
(feat): migrate defaults to `igraph`
ilan-gold Jan 15, 2024
00f5904
(chore): add test for `directed` + `igraph`
ilan-gold Jan 15, 2024
7f46900
(chore): change expected images
ilan-gold Jan 16, 2024
e306ac3
(fix): weights condition bug
ilan-gold Jan 16, 2024
642235d
Merge branch 'master' into igraph_leiden
ilan-gold Jan 16, 2024
5439d9d
(fix): change `rank_genes_groups` tolerance and update test images
ilan-gold Jan 16, 2024
2449148
Merge branch 'igraph_leiden' of github.com:ilan-gold/scanpy into igra…
ilan-gold Jan 16, 2024
2fe2b9a
(feat): new violin plot based on redone cluster assignments
ilan-gold Jan 16, 2024
a14b13e
(chore): check parameters matching
ilan-gold Jan 16, 2024
8f3b169
(fix): handle import properly
ilan-gold Jan 16, 2024
b89eaa0
(fix): handle `partition_type` with `use_igraph`
ilan-gold Jan 16, 2024
f67225d
(chore): remove unnecessary test args
ilan-gold Jan 16, 2024
202787c
(chore): add test for old defaults
ilan-gold Jan 16, 2024
d738092
(chore): pre-commit?
ilan-gold Jan 16, 2024
2d8ab25
(chore): pre-commit hooks run
ilan-gold Jan 16, 2024
ece40bf
(chore): make violin plot `expected` correct
ilan-gold Jan 16, 2024
b24d1c4
(fix): change `tol` again for violin plots
ilan-gold Jan 17, 2024
a4aebfd
Merge branch 'master' into igraph_leiden
ilan-gold Jan 19, 2024
4fcbcc6
(chore): revert tolerance change - separate issue incoming
ilan-gold Jan 22, 2024
7a75fdf
Merge branch 'igraph_leiden' of github.com:ilan-gold/scanpy into igra…
ilan-gold Jan 22, 2024
a79e00c
Merge branch 'master' into igraph_leiden
ilan-gold Jan 22, 2024
fd748f4
(chore): release note
ilan-gold Jan 22, 2024
be32bc2
(chore): try new plots with random seed set
ilan-gold Jan 22, 2024
07ffc84
(test): try publishing artifacts
ilan-gold Jan 22, 2024
fbd2173
(fix): publish artifact
ilan-gold Jan 22, 2024
214eaa4
Merge branch 'master' into igraph_leiden
ilan-gold Jan 23, 2024
345fcf4
(fix): publish other images
ilan-gold Jan 23, 2024
1f35a00
(chore): umap
ilan-gold Jan 23, 2024
ac75b6b
(fix): fix random seeding for `igraph`
ilan-gold Jan 23, 2024
488ea75
(fix): import in function
ilan-gold Jan 23, 2024
3db3bb5
(fix): remove umap from test
ilan-gold Jan 23, 2024
540a204
(fix): try different random?
ilan-gold Jan 23, 2024
d3afd43
(feat): try marker gene labeling + write results
ilan-gold Jan 23, 2024
0dbfe7b
(fix): publish artifacts
ilan-gold Jan 23, 2024
26e6540
(fix): try writing out data after relabel
ilan-gold Jan 23, 2024
0bcb2b7
(fix): try stable dataset
ilan-gold Jan 23, 2024
d583619
(chore): add more writes
ilan-gold Jan 24, 2024
cf8449c
(fix): sort categories
ilan-gold Jan 24, 2024
6a306e8
(fix): require igraph
ilan-gold Jan 24, 2024
3cdd337
(chore): remove build artifact
ilan-gold Jan 24, 2024
60c53eb
(fix): spelling error
ilan-gold Jan 24, 2024
bf02b53
(fix): swap changed after re-ordering
ilan-gold Jan 24, 2024
8833246
(chore): `use_igraph` -> `use_leidenalg`
ilan-gold Jan 24, 2024
015a2ac
fmt
flying-sheep Jan 25, 2024
3cf18f7
(refactor): `use_leidenalg` -> `backend`
ilan-gold Jan 25, 2024
e903794
(refactor): get `objective_function` from `clustering_args`
ilan-gold Jan 25, 2024
3dc2d95
(fix): docstring links
ilan-gold Jan 25, 2024
0ba3a04
Merge branch 'master' into igraph_leiden
ilan-gold Jan 25, 2024
bd8382d
(refactor): create rng for igraph
ilan-gold Jan 25, 2024
6958e7d
(refactor): less lines
ilan-gold Jan 25, 2024
4f736df
(chore): add test for random state
ilan-gold Jan 26, 2024
eb070d6
(refactor): fix initial state settings for other `igraph` methods by …
ilan-gold Jan 26, 2024
4b8c823
(refactor): `FLAVORS` reuse in test
ilan-gold Jan 26, 2024
66cc1e2
Update scanpy/tools/_leiden.py
ilan-gold Jan 26, 2024
f2fc12b
Merge branch 'master' into igraph_leiden
ilan-gold Jan 26, 2024
5c49c56
Update scanpy/_utils/__init__.py
ilan-gold Jan 26, 2024
04cd7f9
Update scanpy/_utils/__init__.py
ilan-gold Jan 26, 2024
16d822c
(fix): fix heatmap plot
ilan-gold Jan 26, 2024
bca86ff
(fix): change out images for new random seed method
ilan-gold Jan 26, 2024
dd540dc
Merge branch 'igraph_leiden' of github.com:ilan-gold/scanpy into igra…
ilan-gold Jan 26, 2024
b9b5b19
Merge branch 'master' into igraph_leiden
ilan-gold Jan 26, 2024
3482560
Merge branch 'master' into igraph_leiden
ilan-gold Jan 29, 2024
b3bb3d2
Merge branch 'master' into igraph_leiden
ilan-gold Jan 29, 2024
ebe1c16
Update scanpy/tools/_leiden.py
ilan-gold Jan 30, 2024
0736afc
Merge branch 'master' into igraph_leiden
ilan-gold Feb 15, 2024
de7af6a
(chore): switch back to `leidenalg` default
ilan-gold Feb 15, 2024
99aa47c
(chore): fix clustering tests and update message
ilan-gold Feb 15, 2024
726c59e
(fix): plotting test
ilan-gold Feb 15, 2024
f2da795
(fix): `test_leiden_basic` `directed` arg
ilan-gold Feb 15, 2024
13ae6b6
(fix): fix iterations to defaults
ilan-gold Feb 15, 2024
cc31a2e
(fix): correct category swapping
ilan-gold Feb 16, 2024
935d34f
(fix): need to reorder categories as well
ilan-gold Feb 16, 2024
662f918
Merge branch 'master' into igraph_leiden
ilan-gold Feb 16, 2024
5df37d2
(fix): clean up simple tests
ilan-gold Feb 16, 2024
51f0a02
Merge branch 'igraph_leiden' of github.com:ilan-gold/scanpy into igra…
ilan-gold Feb 16, 2024
9f6b535
(fix): remove unnecessary cluster swap.
ilan-gold Feb 16, 2024
102d128
(fix): just use random state that gives same number of categories
ilan-gold Feb 16, 2024
84dd615
(fix): use `np.random` instead of `random` module
ilan-gold Feb 16, 2024
d6b1dff
(chore): remove unnecessary comment in test about state
ilan-gold Feb 19, 2024
f2db271
Merge branch 'master' into igraph_leiden
ilan-gold Feb 19, 2024
5e09532
(refactor): simplify conditions
ilan-gold Feb 19, 2024
579f005
(refactor): `elif` -> `else` when `flavor` already checked
ilan-gold Feb 19, 2024
1cefa19
(fix): move leiden import for test
ilan-gold Feb 19, 2024
6247d76
(fix): revert unnecessary image changes
ilan-gold Feb 19, 2024
2549f61
(chore): address comments
ilan-gold Feb 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Binary file modified scanpy/tests/_images/heatmap_var_as_dict/expected.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 11 additions & 8 deletions scanpy/tests/notebooks/test_pbmc3k.py
Expand Up @@ -29,6 +29,9 @@
@needs.leidenalg
def test_pbmc3k(image_comparer):
save_and_compare_images = partial(image_comparer, ROOT, tol=20)
save_and_compare_images_rank_genes = partial(
image_comparer, ROOT, tol=10
) # 20 is too high for such sparse plots
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved

adata = sc.read(
"./data/pbmc3k_raw.h5ad", backup_url="https://falexwolf.de/data/pbmc3k_raw.h5ad"
Expand Down Expand Up @@ -115,32 +118,32 @@ def test_pbmc3k(image_comparer):

sc.tl.rank_genes_groups(adata, "leiden")
sc.pl.rank_genes_groups(adata, n_genes=20, sharey=False, show=False)
save_and_compare_images("rank_genes_groups_1")
save_and_compare_images_rank_genes("rank_genes_groups_1")

sc.tl.rank_genes_groups(adata, "leiden", method="logreg")
sc.pl.rank_genes_groups(adata, n_genes=20, sharey=False, show=False)
save_and_compare_images("rank_genes_groups_2")
save_and_compare_images_rank_genes("rank_genes_groups_2")

sc.tl.rank_genes_groups(adata, "leiden", groups=["0"], reference="1")
sc.pl.rank_genes_groups(adata, groups="0", n_genes=20, show=False)
save_and_compare_images("rank_genes_groups_3")
save_and_compare_images_rank_genes("rank_genes_groups_3")

# gives a strange error, probably due to jitter or something
# sc.pl.rank_genes_groups_violin(adata, groups='0', n_genes=8)
# save_and_compare_images('rank_genes_groups_4')

if adata[adata.obs["leiden"] == "4", "CST3"].X.mean() < 1:
if adata[adata.obs["leiden"] == "3", "CST3"].X.mean() < 1:
( # switch clusters
adata.obs["leiden"][adata.obs["leiden"] == "3"],
adata.obs["leiden"][adata.obs["leiden"] == "4"],
adata.obs["leiden"][adata.obs["leiden"] == "5"],
) = ("5", "4")
) = ("4", "3")
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved
new_cluster_names = [
"CD4 T cells",
"CD14+ Monocytes",
"B cells",
"CD8 T cells",
"B cells",
"NK cells",
"FCGR3A+ Monocytes",
"CD14+ Monocytes",
"Dendritic cells",
"Megakaryocytes",
]
Expand Down
48 changes: 46 additions & 2 deletions scanpy/tests/test_clustering.py
@@ -1,6 +1,7 @@
from __future__ import annotations

import pytest
from sklearn.metrics.cluster import normalized_mutual_info_score

import scanpy as sc
from scanpy.testing._helpers.data import pbmc68k_reduced
Expand All @@ -13,8 +14,51 @@ def adata_neighbors():


@needs.leidenalg
def test_leiden_basic(adata_neighbors):
sc.tl.leiden(adata_neighbors)
@pytest.mark.parametrize("use_igraph", [True, False, False])
@pytest.mark.parametrize("resolution", [1, 2])
@pytest.mark.parametrize("n_iterations", [-1, 3])
def test_leiden_basic(adata_neighbors, use_igraph, resolution, n_iterations):
sc.tl.leiden(
adata_neighbors,
use_igraph=use_igraph,
resolution=resolution,
n_iterations=n_iterations,
)
assert adata_neighbors.uns["leiden"]["params"]["resolution"] == resolution
assert adata_neighbors.uns["leiden"]["params"]["n_iterations"] == n_iterations


def test_leiden_igraph_directed(adata_neighbors):
with pytest.raises(ValueError):
sc.tl.leiden(adata_neighbors, directed=True)


@needs.leidenalg
def test_leiden_equal_defaults(adata_neighbors):
"""Ensure the two implementations are the same for the same args."""
leiden_alg_clustered = sc.tl.leiden(adata_neighbors, use_igraph=False, copy=True)
igraph_clustered = sc.tl.leiden(adata_neighbors, copy=True)
assert (
normalized_mutual_info_score(
leiden_alg_clustered.obs["leiden"], igraph_clustered.obs["leiden"]
)
> 0.9
)


@needs.leidenalg
def test_leiden_equal_old_defaults(adata_neighbors):
"""Ensure that the old leidenalg defaults are close enough to the current default outputs."""
leiden_alg_clustered = sc.tl.leiden(
adata_neighbors, use_igraph=False, directed=True, n_iterations=-1, copy=True
)
igraph_clustered = sc.tl.leiden(adata_neighbors, copy=True)
assert (
normalized_mutual_info_score(
leiden_alg_clustered.obs["leiden"], igraph_clustered.obs["leiden"]
)
> 0.9
)


@pytest.mark.parametrize(
Expand Down
57 changes: 38 additions & 19 deletions scanpy/tools/_leiden.py
@@ -1,5 +1,6 @@
from __future__ import annotations

import random
from typing import TYPE_CHECKING

import numpy as np
Expand Down Expand Up @@ -34,14 +35,15 @@ def leiden(
random_state: _utils.AnyRandom = 0,
key_added: str = "leiden",
adjacency: sparse.spmatrix | None = None,
directed: bool = True,
directed: bool = False,
use_weights: bool = True,
n_iterations: int = -1,
n_iterations: int = 2,
partition_type: type[MutableVertexPartition] | None = None,
neighbors_key: str | None = None,
obsp: str | None = None,
copy: bool = False,
**partition_kwargs,
use_igraph: bool = True,
**clustering_args,
) -> AnnData | None:
"""\
Cluster cells into subgroups [Traag18]_.
Expand Down Expand Up @@ -96,9 +98,9 @@ def leiden(
`obsp` and `neighbors_key` at the same time.
copy
Whether to copy `adata` or modify it inplace.
**partition_kwargs
Any further arguments to pass to `~leidenalg.find_partition`
(which in turn passes arguments to the `partition_type`).
**clustering_args
Any further arguments to pass to `~leidenalg.find_partition` (which in turn passes arguments to the `partition_type`)
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved
or `community_detection` from `igraph`.

Returns
-------
Expand All @@ -112,13 +114,14 @@ def leiden(
A dict with the values for the parameters `resolution`, `random_state`,
and `n_iterations`.
"""
try:
import leidenalg
except ImportError:
raise ImportError(
"Please install the leiden algorithm: `conda install -c conda-forge leidenalg` or `pip3 install leidenalg`."
)
partition_kwargs = dict(partition_kwargs)
if not use_igraph:
try:
import leidenalg
except ImportError:
raise ImportError(
"Please install the leiden algorithm: `conda install -c conda-forge leidenalg` or `pip3 install leidenalg`."
)
clustering_args = dict(clustering_args)

start = logg.info("running Leiden clustering")
adata = adata.copy() if copy else adata
Expand All @@ -134,22 +137,38 @@ def leiden(
adjacency=adjacency,
)
# convert it to igraph
if use_igraph and directed:
raise ValueError(
"Cannot use igraph's leiden implementaiton with a directed graph."
)
g = _utils.get_igraph_from_adjacency(adjacency, directed=directed)
# flip to the default partition type if not overriden by the user
if partition_type is None:
if partition_type is None and not use_igraph:
partition_type = leidenalg.RBConfigurationVertexPartition
elif use_igraph and partition_type is not None:
raise ValueError("Do not pass in partition_type argument when using igraph.")
# Prepare find_partition arguments as a dictionary,
# appending to whatever the user provided. It needs to be this way
# as this allows for the accounting of a None resolution
# (in the case of a partition variant that doesn't take it on input)
if use_weights:
partition_kwargs["weights"] = np.array(g.es["weight"]).astype(np.float64)
partition_kwargs["n_iterations"] = n_iterations
partition_kwargs["seed"] = random_state
clustering_args["weights"] = (
"weight" if use_igraph else np.array(g.es["weight"]).astype(np.float64)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these should be added to uns as outputs, weights in the params part of the dict and use_igraph at the top levels

adata['leiden']
# { 'params': { ... 'use_weights': True }, 'use_igraph': False }

clustering_args["n_iterations"] = n_iterations
if not use_igraph:
clustering_args["seed"] = random_state
else:
random.seed(random_state)
if resolution is not None:
partition_kwargs["resolution_parameter"] = resolution
clustering_args[
f"resolution{'_parameter' if not use_igraph else ''}"
] = resolution
# clustering proper
part = leidenalg.find_partition(g, partition_type, **partition_kwargs)
if use_igraph:
part = g.community_leiden(objective_function="modularity", **clustering_args)
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved
else:
part = leidenalg.find_partition(g, partition_type, **clustering_args)
# store output into adata.obs
groups = np.array(part.membership)
if restrict_to is not None:
Expand Down