Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(model): support batch key in condscvi #2626

Merged
merged 11 commits into from Mar 26, 2024
Merged

feat(model): support batch key in condscvi #2626

merged 11 commits into from Mar 26, 2024

Conversation

martinkim0
Copy link
Contributor

closes #2623

@martinkim0
Copy link
Contributor Author

@canergen The main backwards compatibility concern right now is that, with the current AnnDataManager setup with a batch key, the model will receive a tensor of zeroes for batch_index even if batch_key is None.

There's not a straightforward way to distinguish this case from the case where a batch_key is registered but all observations have the same batch assignment. However, we'd like to treat the latter differently from the former because of backwards compatibility.

This is because a batch_index of zeros will still be passed into the encoder (if encode_covariates is True) and the decoder, and will affect model training. So anyone repeating model training from a previous version will see different results.

There are two potential ways to address this:

  1. Modify CategoricalObsField so that it is evaluated to None in AnnDataLoader (similar to CategoricalJointObsField) if registry_key is None. More complicated but the better long-term solution. I need to look into this one more.
  2. Just don't register a new CategoricalObsField if batch_key is None in CondSCVI.setup_anndata. Easiest solution but you have to do a couple of checks in CondSCVI and VAEC, potentially buggy.

@martinkim0 martinkim0 marked this pull request as draft March 22, 2024 22:10
Copy link

codecov bot commented Mar 25, 2024

Codecov Report

Attention: Patch coverage is 97.56098% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 88.68%. Comparing base (6bdf1ec) to head (0d5a75d).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2626      +/-   ##
==========================================
- Coverage   89.18%   88.68%   -0.51%     
==========================================
  Files         159      159              
  Lines       13308    13306       -2     
==========================================
- Hits        11869    11800      -69     
- Misses       1439     1506      +67     
Files Coverage Δ
scvi/model/_condscvi.py 96.73% <100.00%> (ø)
scvi/module/_vaec.py 84.70% <97.14%> (-1.51%) ⬇️

... and 1 file with indirect coverage changes

@martinkim0 martinkim0 added this to the scvi-tools 1.2.0 milestone Mar 25, 2024
martinkim0 added a commit that referenced this pull request Mar 25, 2024
adds a basic `CondSCVI` save file for backwards compatibility testing in
#2626. generated as follows:
```
adata = synthetic_iid()
CondSCVI.setup_anndata(adata, "labels")
model = CondSCVI(adata)
model.train(max_epochs=1)
model.save(".../tests/test_data/condscvi_pre_batch")
```
@martinkim0 martinkim0 marked this pull request as ready for review March 25, 2024 19:34
@martinkim0 martinkim0 added the on-merge: backport to 1.2.x on-merge: backport to 1.2.x label Mar 25, 2024
@martinkim0
Copy link
Contributor Author

@canergen Here's an initial exploration with using the batch key. TLDR Seems fine but can lead to overintegration when training for too long.
condscvi_batch.pdf

@martinkim0
Copy link
Contributor Author

Losses are very similar across the runs:
Screenshot 2024-03-26 at 12 35 32

@martinkim0 martinkim0 merged commit 6dedd60 into main Mar 26, 2024
13 checks passed
@martinkim0 martinkim0 deleted the condscvi-batch branch March 26, 2024 19:39
meeseeksmachine pushed a commit to meeseeksmachine/scvi-tools that referenced this pull request Mar 26, 2024
@canergen
Copy link
Contributor

We have quite some different datasets to see better integration. You need to run one UMAP per coarse celltype above. I don't think those are necessary but I requested only support to get the key into DestVI :).

martinkim0 added a commit that referenced this pull request Mar 26, 2024
…condscvi) (#2633)

Backport PR #2626: feat(model): support batch key in condscvi

Co-authored-by: Martin Kim <46072231+martinkim0@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on-merge: backport to 1.2.x on-merge: backport to 1.2.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for batch_key in CondSCVI
2 participants