-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batch and max_kl_weight parameters ignored when mapping to scvi reference #2331
Comments
Hi, thank you for your question. This is because, based on our current defaults, the scVI encoder does not receive batch assignments ( In addition, scArches by default freezes pretrained model parameters (i.e. parameters not related to covariates), which is why training the reference model on the query data with different max KL weights does not change the latent representation (see the last section in our user guide). These should both, however, have an effect on the decoder output (normalized expression). |
Thanks a lot for the quick response! I'm not sure if I understand though: I get that the model parameters for calculating the reference embedding aren't changed, but this should still allow for changing parameters specific for the query right? That's also how I read the user guide you linked to:
As for the batch covariate not affecting the latent representation of the query: how could we still expect the mapping to perform any batch correction in that case? Is there any way in which I can still change these settings at the stage of mapping, or should it have been done already when the reference model was trained? |
Right - since the only query-specific parameters added during transfer learning (by default) are the parameters accommodating the new batch covariate categories, these will be updated. Admittedly, this seems a little limited since we end up only updating the decoder, so not exactly sure why we can expect the model to batch correct query data well. But this is the default we have and are currently not changing it due to backwards compatibility. Pre reference model training, you can change this behavior by specifying Post reference model training, we have several options in |
Okay interesting! I'm not so much interested in re-training the reference model, but more in being able to set the way the query model is trained (i.e. including batch already in the encoder, and changing the max_kl_weight setting). If I understand the documentation correctly, that is not really possible with the freeze/unfreeze parameters in |
Good point - probably a good idea to point this out in the tutorials. I wasn't involved in writing this part of scvi-tools, so I'll also double check that this is actually what is happening, but from what I recall, this is the case with our defaults. Yeah, encoding batch information when the reference model has already been trained without it is not possible yet. For max KL weight, I do believe the changes should be reflected during query training, but let me double check. |
Getting into the thread here, in all scArches tutorials it's highlighted that covariates are encoded. In our hands it doesn't make a difference for most datasets. OOD things shouldn't be done (like taking an sc model and using sn query). However, generally when reusing a model it always makes sense to check that integration was working and posterior predictive checks look good (in short reconstruction loss in query and reference is similar). This will be stressed in our publication encompassing scVI-criticism. |
Thanks for your responses! Right yes in the scArches tutorials the covariates are set that way (although not explained as far as I remember, so it might not be clear to users that those settings are important not to change), but they're not set that way nor discussed in the scVI /scANVI tutorials I think. Most people won't go through an scArches tutorial already when they're still integrating the data, and will only realize that their model isn't really scArches-friendly once they have already done a lot of quality checks and downstream analysis on their integration, and don't want to re-do it anymore. I have already experienced it with two large atlas integrations that I worked with in the past weeks. Also, for me it makes quite a large difference to set parameters differently in the models where this is possible, especially setting the batch covariate differently, but also e.g. the KL divergence, so I wouldn't say it does not make much of a difference whether covariates are encoded. |
We added a comment to the tutorial to highlight this. |
Hi! I have been trying to map data to scvi-integrated embeddings using scArches, and have noticed that for the two scVI-based reference models I have used, setting the
max_kl_weight
parameter differently has no effect on the output. The same holds for changing the batch assignment of the query cells (e.g. using sample rather than dataset as batch covariate). I do not see the same problem with an scANVI-model I have used.Here's a reproducible example of the first issue:
Which outputs the following (most importantly, 2x "True" at the bottom):
and of the second issue:
which outputs (again most importantly: "True" at the bottom)
Versions:
scArches: '0.5.8'
scvi-tools: '0.20.1'
The reference model I used is public, I downloaded it with this link:
https://zenodo.org/records/10139343/files/pb_tissue_normal_ref_scvi.zip?download=1
This is the query dataset I use in the example:
PeerMassague2020_subset_for_testing.h5ad.zip
Any idea why max_kl_weight and the batch covariate have zero effect on the output? This should not be the case, as far as I understand.
The text was updated successfully, but these errors were encountered: