Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding Cell Cycle Scoring and sctransform order #1679

Closed
j-andrews7 opened this issue Jun 12, 2019 · 14 comments
Closed

Question regarding Cell Cycle Scoring and sctransform order #1679

j-andrews7 opened this issue Jun 12, 2019 · 14 comments

Comments

@j-andrews7
Copy link

Hi,

In the Cell Cycle regression vignette, you run NormalizeData before assigning the Cell Cycle scores. However, since SCTransform combines the normalization and scaling steps, I was wondering if it's valid to perform Cell Cycle scoring on the raw data prior to running SCTransform so that they can properly be used for regression. Or would the proper approach be to run SCTransform, add the Cell Cycle scores, and then run it again to regress out the differences?

Thanks.

@yuhanH
Copy link
Collaborator

yuhanH commented Jun 12, 2019

In the SCTransform, there is one parameter called vars.to.regress.
You could put the variables containing Cell Cycle scores.

@yuhanH yuhanH closed this as completed Jun 12, 2019
@j-andrews7
Copy link
Author

Yes, I understand that, but that's not my question. Is running the CellCycleScoring function on unnormalized data prior to SCTransform valid or would it lead to potential problems? Should I run NormalizeData, run CellCycleScoring, and then run SCTransform to scale and regress out the Cell Cycle scores?

@yuhanH yuhanH reopened this Jun 12, 2019
@satijalab
Copy link
Collaborator

Thanks for the question, and this is indeed a bit complex. I would recommend exactly the procedure you list out above. Make sure you run NormalizeData prior to CellCycleScoring. This learns cell cycle scores that can be added to the vars.to.regress parameter in SCTransform. For all downstream analyses, you can use the SCT assay.

@atermanini
Copy link

Hi, why not run SCTransform, then evaluate cell cycle, than apply again SCTransform using vars.to.regress ? This should avoid using the old NormalizeData, using the more robust sctranform. Am I wrong?

@mistrm82
Copy link

mistrm82 commented Aug 6, 2019

I had the exact same thought as @atermanini , I'm assuming it should be okay?

@CodeInTheSkies
Copy link

Same question, as raised above by @atermanini !! Would be great if @satijalab could please comment :)

Thanks a tonne!!

@satijalab
Copy link
Collaborator

I think both approaches would be valid. We have not tested the latter (running SCTransform twice), but it should in principle be slightly more robust if there are substantial differences in sequencing depth across cells.

@anoronh4
Copy link

If we are running SCtransform twice, should we set assay = "RNA", new.assay.name = "SCT" the first time and then assay = "SCT", new.assay.name = "SCT" the second time? Not sure if we can overwrite the first assay.

@romanhaa
Copy link

If we are running SCtransform twice, should we set assay = "RNA", new.assay.name = "SCT" the first time and then assay = "SCT", new.assay.name = "SCT" the second time? Not sure if we can overwrite the first assay.

I think you should run SCTransform with the same parameters both times.

seurat <- SCTransform(seurat, assay = 'RNA', new.assay.name = 'SCT', ...)
seurat <- CellCycleScoring(seurat, ...)
seurat <- SCTransform(seurat, assay = 'RNA', new.assay.name = 'SCT', vars.to.regress = ...)

Otherwise you would normalise the already normalised data, no? You just need the first run of SCTransform() to learn the cell cycle status. Then, you normalise the same raw data again but while also regressing for cell cycle effects. At least this is how it makes sense to me.

@anoronh4
Copy link

hi romanhaa,

I see your point but I am not sure if assay "RNA" is original data or not, i guess that is the main gap in my knowledge. if RNA remains "unnormalized" data and and i calculate cell cycle regression based on SCT assay, then i'm not really combining the power of 2 SCTransform calculations, I'm just replacing my first regressed dataset on one set of variables with a second regression dataset on another set of variables. Let me go through the two scenarios I'm thinking about:

sample <- SCTransform(sample, vars.to.regress = c("percent.mt","nFeature_RNA","nCount_RNA",assay = 'RNA', new.assay.name = 'SCT')) # unnormalized, untransformed
sample <- CellCycleScoring(sample, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE) #presumably running on SCT assay
sample <- SCTransform(sample, vars.to.regress = c("S.Score", "G2M.Score"), assay = 'RNA', new.assay.name = 'SCT') #runs on same original assay.

And here's the next one

sample <- SCTransform(sample, vars.to.regress = c("percent.mt","nFeature_RNA","nCount_RNA",assay = 'RNA', new.assay.name = 'SCT')) # unnormalized, untransformed
sample <- CellCycleScoring(sample, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE) #presumably running on SCT assay
sample <- SCTransform(sample, vars.to.regress = c("S.Score", "G2M.Score"), assay = 'SCT', new.assay.name = 'SCT') #runs on the assay already normalized/transformed.

Looking forward to hearing from the community!

@romanhaa
Copy link

Each assay has two slots for the expression data + an additional which might contain fewer genes than the other two.

  • seurat@assays$RNA@counts
  • seurat@assays$RNA@data
  • seurat@assays$RNA@scale.data

When you initialise your Seurat object, both counts and data contain your raw transcripts counts (assuming that's your raw data). While the matrix stored in counts generally remains the raw data, the data in the data slot will be normalised when you run NormalizeData().

Now, I don't have a lot of experience with the SCTransform() method, but in the description of the function you find the following part:

Results are saved in a new assay (named SCT by default) with counts being (corrected)
counts, data being log1p(counts), scale.data being pearson residuals; sctransform::vst intermediate results are saved in misc slot of new assay.

Based on this, my proposal is the following:

# normalize data with SCTransform()
sample <- SCTransform(
  sample,
  assay = 'RNA',
  new.assay.name = 'SCT',
  vars.to.regress = c('percent.mt', 'nFeature_RNA', 'nCount_RNA')
)

# perform cell cycle analysis (make sure to specify the "assay" parameter
sample <- CellCycleScoring(
  sample,
  s.features = s.genes,
  g2m.features = g2m.genes,
  assay = 'SCT',
  set.ident = TRUE
)

# normalise again but this time including also the cell cycle scores
sample <- SCTransform(
  sample,
  assay = 'RNA',
  new.assay.name = 'SCT',
  vars.to.regress = c('percent.mt', 'nFeature_RNA', 'nCount_RNA', 'S.Score', 'G2M.Score')
)

This should then overwrite the data in the SCT assay slot with your data after normalisation for all those factors you provided while ensuring that the cell cycle analysis was performed on your data normalised with SCTransform().

Hopefully a Seurat developer will call me out if I'm wrong :)

@CodeInTheSkies
Copy link

Hello @romanhaa , @anoronh4 , @j-andrews7 , @yuhanH ,

Just wanted to check in as to which of the above-discussed methods you all tried, and at this date, how did your results turn out, and which of the methods (NormalizeData followed by SCT, or running SCT twice) would you recommend?

Many thanks for any responses, and am looking forward to hearing your views!

@j-andrews7
Copy link
Author

I tried both and doing nothing at all with cell cycle, saw little difference, and stopped caring about cell cycle phases in my data. 🤷‍♂

@anoronh4
Copy link

anoronh4 commented Jan 13, 2020

Hello @romanhaa , @anoronh4 , @j-andrews7 , @yuhanH ,

Just wanted to check in as to which of the above-discussed methods you all tried, and at this date, how did your results turn out, and which of the methods (NormalizeData followed by SCT, or running SCT twice) would you recommend?

Many thanks for any responses, and am looking forward to hearing your views!

Similar to what @j-andrews7 said, not much has come from it. I have been trying both of these approaches and even tried regressing out the difference between G2M and S scores. I have yet to to come across a dataset where it made a difference -- the phases don't even look less separated to me in the PCA plot! -- and I process datasets from lots of different tissue types. i keep it in my code simply because it is recommended, but that's about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants