Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow sc.pp.subsample in backed mode when returning a copy #2624

Merged
merged 8 commits into from Aug 24, 2023

Conversation

eroell
Copy link
Contributor

@eroell eroell commented Aug 18, 2023

Adresses issue #2495.

Suggested change to allow sc.pp.subsample in backed mode of AnnData, when copy=True in sc.pp.subsample

import scanpy as sc

adata = sc.read_h5ad("matrix/scatlas.h5ad", backed="r")
adata_sample = sc.pp.subsample(adata, fraction=0.01, copy=True)

Tagging @ivirshup.

@codecov
Copy link

codecov bot commented Aug 18, 2023

Codecov Report

Merging #2624 (5641e88) into master (32a3567) will increase coverage by 0.00%.
Report is 1 commits behind head on master.
The diff coverage is 100.00%.

❗ Current head 5641e88 differs from pull request most recent head 9764a06. Consider uploading reports for the commit 9764a06 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2624   +/-   ##
=======================================
  Coverage   72.00%   72.01%           
=======================================
  Files         105      105           
  Lines       11840    11844    +4     
=======================================
+ Hits         8525     8529    +4     
  Misses       3315     3315           
Files Changed Coverage Δ
scanpy/preprocessing/_simple.py 80.24% <100.00%> (+0.19%) ⬆️

@nleroy917
Copy link

Appreciate the PR - let me know if I can be of any assistance.

@eroell eroell changed the title allow sc.pp.subsampling in backed mode when returning a copy allow sc.pp.subsample in backed mode when returning a copy Aug 24, 2023
Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this!

What happens if the object is backed, but copy=False?

@ivirshup ivirshup added this to the 1.10.0 milestone Aug 24, 2023
@eroell
Copy link
Contributor Author

eroell commented Aug 24, 2023

When the object is backed, but copy=False, the ValueError, which before occured for both copy=False and copy=True, is shown:
ValueError: To copy an AnnData object in backed mode, pass a filename: '.copy(filename='myfilename.h5ad')'. To load the object into memory, use '.to_memory()'.

@ivirshup
Copy link
Member

Could you throw a more informative error message for copy=False? Maybe:

NotImplementedError("Inplace subsampling is not implemented for backed objects")

?

@eroell eroell marked this pull request as ready for review August 24, 2023 12:08
Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change, I would invert the loops. I think it makes more sense to consider the isbacked cases together. So:

if data.isbacked:
    if copy:
        return data[obs_indices].to_memory()
    else:
        raise NotImplementedError(...)
else:
    if copy:
        ...

Otherwise looking good! One minor change in the tests.

scanpy/tests/test_preprocessing.py Outdated Show resolved Hide resolved
eroell and others added 2 commits August 24, 2023 16:28
Co-authored-by: Isaac Virshup <ivirshup@gmail.com>
@ivirshup ivirshup self-requested a review August 24, 2023 14:40
Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@ivirshup ivirshup enabled auto-merge (squash) August 24, 2023 14:41
@ivirshup ivirshup merged commit 2de9121 into scverse:master Aug 24, 2023
9 checks passed
flying-sheep pushed a commit that referenced this pull request Sep 7, 2023
* allow backed subsampling when returning a copy

* no extra copying, added test

* added description in release-notes

* more informative error message for copy=False

* Update scanpy/tests/test_preprocessing.py

Co-authored-by: Isaac Virshup <ivirshup@gmail.com>

* inverted branching

---------

Co-authored-by: Eljas <eljas.roellin@gmail.com>
Co-authored-by: Isaac Virshup <ivirshup@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants