Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up dataloader, remove drop_last as int #1975

Merged
merged 7 commits into from
Mar 22, 2023
Merged

Clean up dataloader, remove drop_last as int #1975

merged 7 commits into from
Mar 22, 2023

Conversation

adamgayoso
Copy link
Member

@adamgayoso adamgayoso commented Mar 22, 2023

Fixes #1892

  1. Cleanup of AnnDataLoader (typing, moving checks into AnnTorchDataset)
  2. Removal of our custom batch sampler inplace of PyTorch objects to achieve the same functionality
  3. Cleanup of AnnTorchDataset, made some methods private

The most important change here is removing the feature to have drop_last be an integer. This is something we've had since the beginning of scVI and it was more relevant when datasets were smaller. At this point the feature is more of a pain as it would have to be disabled anyway under some multi gpu training settings where you need even batch sizes. I'm curious what everyone thinks about this breaking change.

@adamgayoso adamgayoso changed the title refactor dataloader Clean up dataloader, remove drop_last as int Mar 22, 2023
@adamgayoso adamgayoso marked this pull request as ready for review March 22, 2023 03:58
@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Patch coverage: 93.33% and project coverage change: -0.02 ⚠️

Comparison is base (b928108) 89.14% compared to head (b0aed18) 89.13%.

❗ Current head b0aed18 differs from pull request most recent head 7bdab9a. Consider uploading reports for the commit 7bdab9a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1975      +/-   ##
==========================================
- Coverage   89.14%   89.13%   -0.02%     
==========================================
  Files         141      141              
  Lines       11113    11079      -34     
==========================================
- Hits         9907     9875      -32     
+ Misses       1206     1204       -2     
Flag Coverage Δ
unittests 89.13% <93.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
scvi/dataloaders/_anntorchdataset.py 90.90% <87.50%> (-1.74%) ⬇️
scvi/dataloaders/_ann_dataloader.py 96.29% <100.00%> (+2.35%) ⬆️
scvi/dataloaders/_data_splitting.py 96.98% <100.00%> (ø)

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@adamgayoso adamgayoso added this to the 1.0.0 milestone Mar 22, 2023
def test_ann_dataloader():
a = scvi.data.synthetic_iid()
@pytest.mark.parametrize(
"data", [scvi.data.synthetic_iid(200), scvi.data.synthetic_iid(200, sparse=True)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait this is so simple

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's simple? pytest parameterize?

iter_ndarray: bool = False,
**data_loader_kwargs,
):
if adata_manager.adata is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surprised that you can delete this w/o any test breaking. it's been too long so I forget why this code is here in the first place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm i can see it was moved to dataset

Copy link
Contributor

@martinkim0 martinkim0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adamgayoso adamgayoso enabled auto-merge (squash) March 22, 2023 18:13
@adamgayoso adamgayoso merged commit 6f4cda2 into main Mar 22, 2023
@adamgayoso adamgayoso deleted the dataloaders branch March 22, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up torch samplers
3 participants