Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): igraph leiden implementation now included as an option in sc.tl.leiden #2815

Merged
merged 87 commits into from Feb 19, 2024

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Jan 16, 2024

  • Release notes not necessary because:

@ilan-gold
Copy link
Contributor Author

ilan-gold commented Jan 16, 2024

TODOs:

  1. Figure out why some tests are passing when they shouldn't (hence why I pushed the branch, curious about CI). UPDATE: tol for matplotlib.testing.compare.compare_images is too high for a sparse-ish plot like rank_genes_groups. This is somewhat worrying so will need to be amended. Other than that, changed plotting outputs make sense so this should be resolved.
  2. Check with scanpy tutorials to see what needs to be changed there as well, if anything (if needed, the two PRs should be merged in tandem). The following use leiden in some capacity:
    a. https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html
    b. https://scanpy-tutorials.readthedocs.io/en/latest/plotting/core.html
    c. https://scanpy-tutorials.readthedocs.io/en/latest/spatial/basic-analysis.html
    d. https://scanpy-tutorials.readthedocs.io/en/latest/spatial/integration-scanorama.html
  3. Do a large dataset test - check NMI for accuracy of the new default against the old one, check speed to confirm what we're doing makes sense (although this was covered, it seems, in Leiden now included in python-igraph #1053), and scalability

@ilan-gold
Copy link
Contributor Author

ilan-gold commented Jan 16, 2024

Failing violin_2 does not fail locally, so that's bizarre.

Copy link

codecov bot commented Jan 16, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (1ac74a7) 74.57% compared to head (2549f61) 74.63%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2815      +/-   ##
==========================================
+ Coverage   74.57%   74.63%   +0.05%     
==========================================
  Files         115      115              
  Lines       12716    12756      +40     
==========================================
+ Hits         9483     9520      +37     
- Misses       3233     3236       +3     
Files Coverage Δ
scanpy/_utils/__init__.py 68.20% <90.90%> (+1.00%) ⬆️
scanpy/tools/_leiden.py 88.40% <94.44%> (+3.29%) ⬆️

Comment on lines 155 to 157
clustering_args["weights"] = (
"weight" if use_igraph else np.array(g.es["weight"]).astype(np.float64)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these should be added to uns as outputs, weights in the params part of the dict and use_igraph at the top levels

adata['leiden']
# { 'params': { ... 'use_weights': True }, 'use_igraph': False }

@ilan-gold ilan-gold changed the title (feat): igraph leiden implementation is now the default (feat): igraph leiden implementation now included as an option in sc.tl.leiden Feb 16, 2024
Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of branches here like if flavor == "leidenalg", I think it could be cleaner to just have one higher level branch for the two cases.


I think we can change the default for directed to be None then dynamically set the value based on which flavor is passed.

We could do the same with n_iterations, though I'm a little on the fence about this.


A number of plots are being changed that don't seem to be very different. Mostly things pre-clustering, like pca and qc metrics. I'm going to check if these are actually different enough to trigger test failures.

scanpy/tools/_leiden.py Outdated Show resolved Hide resolved
scanpy/tools/_leiden.py Outdated Show resolved Hide resolved
scanpy/tools/_leiden.py Outdated Show resolved Hide resolved
@ilan-gold
Copy link
Contributor Author

@ivirshup You are right about some of the pre-processing plots. I should have created a separate branch for those. I simply think hte outputs have changed and we never caught it but would be curious to see what you get. It could be my M1

@ilan-gold
Copy link
Contributor Author

ilan-gold commented Feb 19, 2024

@ivirshup I simplified the conditionals a bit and there are only two sets now. One to check for various {Value/Import}Errors and another to do the clustering_kwargs building. I think this is cleaner and faster since no code will run that doesn't have to. I didn't really see a way to do it with only one set of conditionals without code duplication. There's some code that's just common to both, but that shouldn't be run in the case of one of the {Value/Import}Errors .

@ivirshup
Copy link
Member

About the preprocessing plots:

scanpy/tests/notebooks/_images_pbmc3k/filter_genes_dispersion/expected.png

  • Text and some points are shifted slightly. I'm not totally sure whether any points are actually in a different place

scanpy/tests/notebooks/_images_pbmc3k/highest_expr_genes/expected.png
scanpy/tests/notebooks/_images_pbmc3k/pca/expected.png
scanpy/tests/notebooks/_images_pbmc3k/pca_variance_ratio/expected.png
scanpy/tests/notebooks/_images_pbmc3k/scatter_2/expected.png

  • Axis text shifted slightly
  • Can probably be reverted if the tests still pass

scanpy/tests/notebooks/_images_pbmc3k/scatter_1/expected.png

  • y axis moved

@ilan-gold
Copy link
Contributor Author

@ivirshup I will revert those, and hopefully tests pass

Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! But I don't think that you addressed these:

I think we can change the default for directed to be None then dynamically set the value based on which flavor is passed.

We could do the same with n_iterations, though I'm a little on the fence about this.

I think I would definitely change directed, and add a note in the docs that setting n_iterations=2 will be faster and is the default for the underlying packages.

@ilan-gold
Copy link
Contributor Author

Ah, you're right. Will change.

Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, looks good!

@ivirshup ivirshup enabled auto-merge (squash) February 19, 2024 16:37
@ivirshup ivirshup merged commit 6ee18b9 into scverse:master Feb 19, 2024
13 checks passed
@ilan-gold ilan-gold deleted the igraph_leiden branch February 19, 2024 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Leiden now included in python-igraph
3 participants