-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration of patient-matched tumor and normal tissue samples #230
Comments
Hi, are you using the latest version of the software? send us a
sessionInfo()
…On Tue, Dec 12, 2023, 04:45 samgest ***@***.***> wrote:
Hi,
I am performing an analysis with several published datasets of kidney
tumor (that is, raw UMI counts coming from public repositories) and I
wanted to integrate them, but I'm having some issues with overcorrection.
I have 68 tumor samples coming from 68 different patients and, of some of
them, I also have a sample coming from the surrounding healthy tissue
(normal). In total, 68 tumor + 19 normal = 87 total scRNA-seq expression
matrices (coming from 9 different datasets). I want to integrate all of
this data together and remove the batch effect (sample-wise) and the
dataset bias (i.e., the bias that arises from using different datasets),
but not the tumor-normal differences.
I tried to integrate with RunHarmony as:
dataMerged.ref <- dataMerged.ref %>%
RunHarmony(group.by.vars = c("dataset_id", "sample_id"), plot_convergence = TRUE)
But the results are quite overcorrected (Fig. 1). Despite of the fact that
tumor and normal tissues should share some cell types (such as macrophages,
lymphocytes, etc.), the gross bulk of cells should be different.
Rplot05.png (view on web)
<https://github.com/immunogenomics/harmony/assets/150608196/58cf9a4b-d95a-4a4d-aee4-a6f6b1bd7ed3>
Is there any way I could integrate tumor and normal data to remove the
batch-effect without removing the intrinsic differences between sample
types? Perhaps I should integrate them separately (tumor vs normal) and
then integrate the Harmony embeddings with Seurat v5's IntegrateEmbeddings
function?
I'm quite new to scRNA-seq analysis, so any comment or suggestion is more
than appreciated.
Thanks in advance.
—
Reply to this email directly, view it on GitHub
<#230>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADSFW2C5K2ZPT7K7VUGNXVTYJARVPAVCNFSM6AAAAABARFBBQWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTOMZWHA4TKNY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
There you go:
|
Yes, that is correct. If you do sample level correction, it basically corrects the latent dimensions for everything that is nested in that experimental design. Performing the correction separately, as you suggest, would be the way to go. You can do, however, cell abundance investigation within the tumor and normal kidney, which is fine to do this way. A minor comment in your workflow is that if you decide to use only 1:10 latent variables, perform harmony just on those. Not sure how much it will change things but it may be raising issues with the curse of dimensionality. |
Hi,
I am performing an analysis with several published datasets of kidney tumor (that is, raw UMI counts coming from public repositories) and I wanted to integrate them, but I'm having some issues with overcorrection.
I have 68 tumor samples coming from 68 different patients and, from some of them, I also have a sample coming from the surrounding healthy tissue (normal). In total, 68 tumor + 19 normal = 87 total scRNA-seq expression matrices (coming from 9 different datasets). I want to integrate all of this data together and remove the batch effect (sample-wise) and the dataset bias (i.e., the bias that arises from using different datasets), but not the tumor-normal differences.
I tried to integrate with
RunHarmony
as:NOTE: I took the first 10 dimensions based on their standard deviation and where it "plateaus" (Fig. 1).
Fig. 1

But the results are quite overcorrected. Despite of the fact that tumor and normal tissues should share some cell types (such as lymphocytes, endothelial cells, etc.), there should be at least a big cluster of cells in the tumor samples that should not be present in the normal ones (the malignant / tumoral cells themselves). I see very little difference in the UMAP graph (Fig. 2):
Fig. 2

Is there any way I could integrate tumor and normal data to remove the batch-effect without removing the intrinsic differences between sample types? Perhaps I should integrate them separately (tumor vs normal) and then integrate the Harmony embeddings with Seurat v5's
IntegrateEmbeddings
function?I'm quite new to scRNA-seq analysis, so any comment or suggestion is more than appreciated.
Thanks in advance.
The text was updated successfully, but these errors were encountered: