Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCTransform () and apply Findmarkers #5029

Closed
YeonuiKwak opened this issue Sep 1, 2021 · 2 comments
Closed

SCTransform () and apply Findmarkers #5029

YeonuiKwak opened this issue Sep 1, 2021 · 2 comments

Comments

@YeonuiKwak
Copy link

Hello,
Although I know there are many issues relevant to my question, none of them seem to clearly ressolve my issue.
In my seurat object,

total cell number :100k
total human samples: 60 (condition: "Normal", "Disease")
-Normal human samples :n=30; Disease human sampels : n=30.
there are many confounding factors that affect the gene expression of our samples other than ; sex, region, weight, etc.

I defined ~20 clulsters by integrating data using CCA.
Now what I want to do is to identify DEGs between "normal" vs "disease" in a specific cluster( let's say cluster1).

What I was confused about in running Findmarkers is,
it is recommended to use raw counts (by setting the default assay to "RNA"), and I believe that Findmarker function will take "data" which is normalized by nUMI...
But the point is that In my data, various confounding factors can make it challenging to detect DEG between "normal" cells and "disease" cells in cluster 1. I strongly feel that I gotta regress out the effects of those confounding factors, such as sex, and batches, when identifying DEG in my case.

So, I am thinking it is better to do the following way?? Please correct me if I am wrong, and suggest me what I should do.

  1. subset the seurat object only to have cluster 1 data.
  2. for the subset.seurat, do SCTranform with return.only.var.genes = FALSE so that we can retain as many genes as possible in scale.data.
    SCTransform(sub.seurat, vars.to.regress = c("mitoRatio","Phase","batch"),return.only.var.genes = FALSE)
  3. Run Findmarkers on the "scale.data" of "SCT slot.

FindMarkers( sub.seurat, ident.1 = "case_D1",
ident.2 = "control_D1",
group.by = "condition"
assay="SCT",
slot="scale.data",
only.pos = F, logfc.threshold = 0.0)

Hope to hear from you soon!
Thank you!

@saketkc
Copy link
Collaborator

saketkc commented Sep 3, 2021

One option would be to:

  • Run SCTransform(object_x, vars.to.regress=c("sex", "region", "weight")) on individual objects_x (split by batch)
  • Integrate
  • Subset Cluster 1
  • Run FindMarkers on Cluster 1 on SCT assay, data slot with idents.1="Disease" and idents.2="Normal" (do not run SCTransform again)

There is an important caveat however, which is if the library sizes across batches are very different this might result in lot of false positives (see explanation here)

@saketkc saketkc closed this as completed Sep 3, 2021
@YeonuiKwak
Copy link
Author

Thank you very much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants