-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batch-effect correction in DSB-normalized data? #12
Comments
Hi, @massonix thanks, that is a good question. How much batch variation there is depends on how much experiment-specific and expected biological variability there is between the batches. In the dataset used in the preprint, if we normalized with all background drops and cells in a single normalization, the resulting dsb normalized values were highly concordant with when we normalized each batch separately, this held true with multiple definitions of background drops. These 2 batches were run on subsequent days with the exact same protocol and pool of antibodies. I'd recommend trying both single and multi batch normalization and seeing which method minimizes the batch effect (included a snippet below for doing this). One piece of advice is to QC the background droplets to remove any cells with high RNA to reduce the impact of potential low quality cells that could drive the background signal. As for using a batch correction tool like seurat or mnn correct, typically those are used on data where the batches are completely non-overlapping due to drastic batch effects e.g. those arising between different species, single cell technologies (droplet / plate). In Fig S1A, the overlap between cells is already about what you would expect after using one of those tools. Those are great for the drastic cases described above for RNA data but I'm not certain how they would perform on protein data since they use low dimensional representation–depending on the size of your antibody panel, further compression could add significant noise. I have not tried this though, basing that on the fact that clustering using PCA representations of our protein data with that 80+ antibody panel performed worse than using a euclidean distance matrix calculated on dsb normalized cells x protein. It is not described in the preprint, but part of the non-overlapping cells are due to biological variations between the different n=10 donors in each batch, for example recovery of expected donor-specific T cell populations. If there is a large batch effect, before trying a batch correction tool which uses some low dimensional representation of the data you might just want to try then using the dsb normalized values in a simpler linear model batch removal method (see below) but this may not be necessary. Here is a snippet you can use to test both normalization schemes. Feel free to let us know what you find on your data!
|
Thanks @MattPM ! That's very good adivice. I will apply it and let you know if I have any further question |
Closing for now. Referenced this with a figure in updated readme FAQ section. See significantly updated documentation in readme. package now redirects to NIAID github: https://github.com/niaid/dsb/ |
@MattPM Error in (function (..., deparse.level = 1) : |
Dear DSB team,
Thanks so much for developing this amazing resource.
I have a question regarding batch effect correction. As specified in the preprint: "we implemented this transformation on each staining batch separately to accommodate potential batch specific ambient noise– this helped mitigate batch-to-batch variation". Thus, would you recommend me apply DSB on each batch separately? In your supplementary figure 1A there is still some batch effect. Would you run any single-cell integration tool?
Thanks in advance for your help!
The text was updated successfully, but these errors were encountered: