Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Shannon Component Analysis #1780

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Add Shannon Component Analysis #1780

wants to merge 7 commits into from

Conversation

bendemeo
Copy link

@bendemeo bendemeo commented Apr 6, 2021

Integrated Shannon Component Analysis into the external API, with full documentation and references. SCA operates like PCA, storing a lower-dimensional representation of adata.X (or the chosen layer) in adata.obsm[key_added].

Like PCA and ICA, SCA is linear; however, we have found SCA representations better than PCA or ICA at separating true cell types, yielding better clusters downstream. The source repository can be found here and installed using pip. If you decide you'd like to integrate this into the main API (i.e. as sc.tl.sca), I would be happy to assist.

@codecov
Copy link

codecov bot commented Apr 6, 2021

Codecov Report

Merging #1780 (95999f9) into master (7abd072) will decrease coverage by 0.04%.
The diff coverage is 33.33%.

@@            Coverage Diff             @@
##           master    #1780      +/-   ##
==========================================
- Coverage   71.18%   71.14%   -0.05%     
==========================================
  Files          92       93       +1     
  Lines       11190    11202      +12     
==========================================
+ Hits         7966     7970       +4     
- Misses       3224     3232       +8     
Impacted Files Coverage Δ
scanpy/external/__init__.py 100.00% <ø> (ø)
scanpy/external/tl/_sca.py 27.27% <27.27%> (ø)
scanpy/external/tl/__init__.py 100.00% <100.00%> (ø)

@bendemeo bendemeo marked this pull request as ready for review April 6, 2021 16:46
@ivirshup
Copy link
Member

Thanks for opening this PR!

Similar to #1775, I think this might fit better in ecosystem than external. Initially, we started external as a way of providing a scanpy-like API for tools which didn't use scanpy. Since your tool already has this kind of API, I think it's a better fit for the ecosystem page. We are working on making this page more visible to users (#1801), but the addition of this tool will be in change log and mentioned in the announcement of the next minor release.

How does this sound?

@bendemeo
Copy link
Author

Thanks for the response! The core reduce function of SCA is not scanpy-based, but I wrote a very simple wrapper called reduce_scanpy to make it easier for scanpy users while this pull request is being considered. It would be even easier for scanpy users to access this code natively in sc.tl.external, and it seems odd that the existence of the wrapper (which just runs reduce and adds the result to the input AnnData) should disqualify it. Although the current pull request implements sc.tl.external.scaas an additional wrapper to reduce_scanpy, I could easily write it as a wrapper to reduce, which would remove the redundancy of having separate scanpy interfaces in the base package and in sc.tl.external. I would then mark reduce_scanpy as deprecated in further releases of SCA, and direct the user instead to sc.tl.external.sca. Does this seem reasonable? Of course, I'd be happy to be part of ecosystem if that's still where you think it belongs!

@ivirshup
Copy link
Member

Sorry about the late reply to this!

and it seems odd that the existence of the wrapper (which just runs reduce and adds the result to the input AnnData) should disqualify it.

I guess I wouldn't think of it as disqualification. If a wrapper is added to external, it adds maintanence burden to both of us by giving you multiple sets of documentation and code to keep in sync, and us for issue management and CI. Plus all the documentation you can provide through external is a docstring, while you can offer much more on your own repo.

To us it just seems easier on both of us, especially since you've already implemented the interface with anndata on your side. We're aiming to make the ecosystem documentation much more visible for the next release as well (and are open to input of improving this further), in case that was your concern.

So yes, I would still prefer to have your tool added to the ecosystem.

@bendemeo
Copy link
Author

bendemeo commented Sep 2, 2021

Thanks for your reply! (And no worries - mine is even later).

I see your point about ecosystem vs. external. My main qualm about ecosystem (at least in its current form) is that it's just links to external projects that happen to use scanpy, and the burden of downloading these projects, learning their unique syntax, and seeing how they apply to the scanpy project at hand is off-loaded to the user. The main reason I have pushed for inclusion in external is the convenience of being able to call the function with a single scanpy command, in a format the user is already very familiar with.

On the other hand, I do see your point about code maintenance and syncing between my project and scanpy. Changes in my shannonca project might necessitate changes in the wrapper function. That said, since my wrapper is very agnostic to the underlying methods used, I would hope this wouldn't have to happen very often (basically, it just controls where the inputs are found and where the outputs are deposited. This wouldn't change unless scanpy's architecture did). However, as currently written, the documentation may have to change more frequently since it refers to specific function arguments used in my package.

For now, I am willing to open a new pull request into ecosystem (if that is the correct workflow) and you can feel free to close this issue. For future releases, if you want to combine the convenience of external with the low maintenance burden of ecosystem, you might consider allowing external modules to "outsource" their documentation. So in scanpy's documentation, a function F under external would simply have the format sc.external.tl.F(adata, **kwargs), where **kwargs is passed directly to a method maintained by the tool developer, with a link to a docstring in the external repository. I would happily make this for shannonca as a proof of concept, if you think it's worth trying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants