Question regarding Cell Cycle Scoring and sctransform order #1679

j-andrews7 · 2019-06-12T21:43:21Z

Hi,

In the Cell Cycle regression vignette, you run NormalizeData before assigning the Cell Cycle scores. However, since SCTransform combines the normalization and scaling steps, I was wondering if it's valid to perform Cell Cycle scoring on the raw data prior to running SCTransform so that they can properly be used for regression. Or would the proper approach be to run SCTransform, add the Cell Cycle scores, and then run it again to regress out the differences?

Thanks.

The text was updated successfully, but these errors were encountered:

yuhanH · 2019-06-12T21:52:53Z

In the SCTransform, there is one parameter called vars.to.regress.
You could put the variables containing Cell Cycle scores.

j-andrews7 · 2019-06-12T21:57:41Z

Yes, I understand that, but that's not my question. Is running the CellCycleScoring function on unnormalized data prior to SCTransform valid or would it lead to potential problems? Should I run NormalizeData, run CellCycleScoring, and then run SCTransform to scale and regress out the Cell Cycle scores?

satijalab · 2019-06-14T14:52:09Z

Thanks for the question, and this is indeed a bit complex. I would recommend exactly the procedure you list out above. Make sure you run NormalizeData prior to CellCycleScoring. This learns cell cycle scores that can be added to the vars.to.regress parameter in SCTransform. For all downstream analyses, you can use the SCT assay.

atermanini · 2019-07-05T12:36:11Z

Hi, why not run SCTransform, then evaluate cell cycle, than apply again SCTransform using vars.to.regress ? This should avoid using the old NormalizeData, using the more robust sctranform. Am I wrong?

mistrm82 · 2019-08-06T17:45:35Z

I had the exact same thought as @atermanini , I'm assuming it should be okay?

CodeInTheSkies · 2019-09-27T15:53:57Z

Same question, as raised above by @atermanini !! Would be great if @satijalab could please comment :)

Thanks a tonne!!

satijalab · 2019-10-11T21:05:26Z

I think both approaches would be valid. We have not tested the latter (running SCTransform twice), but it should in principle be slightly more robust if there are substantial differences in sequencing depth across cells.

anoronh4 · 2019-11-14T19:02:09Z

If we are running SCtransform twice, should we set assay = "RNA", new.assay.name = "SCT" the first time and then assay = "SCT", new.assay.name = "SCT" the second time? Not sure if we can overwrite the first assay.

romanhaa · 2019-11-22T13:01:18Z

If we are running SCtransform twice, should we set assay = "RNA", new.assay.name = "SCT" the first time and then assay = "SCT", new.assay.name = "SCT" the second time? Not sure if we can overwrite the first assay.

I think you should run SCTransform with the same parameters both times.

seurat <- SCTransform(seurat, assay = 'RNA', new.assay.name = 'SCT', ...)
seurat <- CellCycleScoring(seurat, ...)
seurat <- SCTransform(seurat, assay = 'RNA', new.assay.name = 'SCT', vars.to.regress = ...)

Otherwise you would normalise the already normalised data, no? You just need the first run of SCTransform() to learn the cell cycle status. Then, you normalise the same raw data again but while also regressing for cell cycle effects. At least this is how it makes sense to me.

anoronh4 · 2019-11-22T23:10:58Z

hi romanhaa,

I see your point but I am not sure if assay "RNA" is original data or not, i guess that is the main gap in my knowledge. if RNA remains "unnormalized" data and and i calculate cell cycle regression based on SCT assay, then i'm not really combining the power of 2 SCTransform calculations, I'm just replacing my first regressed dataset on one set of variables with a second regression dataset on another set of variables. Let me go through the two scenarios I'm thinking about:

sample <- SCTransform(sample, vars.to.regress = c("percent.mt","nFeature_RNA","nCount_RNA",assay = 'RNA', new.assay.name = 'SCT')) # unnormalized, untransformed
sample <- CellCycleScoring(sample, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE) #presumably running on SCT assay
sample <- SCTransform(sample, vars.to.regress = c("S.Score", "G2M.Score"), assay = 'RNA', new.assay.name = 'SCT') #runs on same original assay.

And here's the next one

sample <- SCTransform(sample, vars.to.regress = c("percent.mt","nFeature_RNA","nCount_RNA",assay = 'RNA', new.assay.name = 'SCT')) # unnormalized, untransformed
sample <- CellCycleScoring(sample, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE) #presumably running on SCT assay
sample <- SCTransform(sample, vars.to.regress = c("S.Score", "G2M.Score"), assay = 'SCT', new.assay.name = 'SCT') #runs on the assay already normalized/transformed.

Looking forward to hearing from the community!

romanhaa · 2019-11-23T09:25:41Z

Each assay has two slots for the expression data + an additional which might contain fewer genes than the other two.

seurat@assays$RNA@counts
seurat@assays$RNA@data
seurat@assays$RNA@scale.data

When you initialise your Seurat object, both counts and data contain your raw transcripts counts (assuming that's your raw data). While the matrix stored in counts generally remains the raw data, the data in the data slot will be normalised when you run NormalizeData().

Now, I don't have a lot of experience with the SCTransform() method, but in the description of the function you find the following part:

Results are saved in a new assay (named SCT by default) with counts being (corrected)
counts, data being log1p(counts), scale.data being pearson residuals; sctransform::vst intermediate results are saved in misc slot of new assay.

Based on this, my proposal is the following:

# normalize data with SCTransform()
sample <- SCTransform(
  sample,
  assay = 'RNA',
  new.assay.name = 'SCT',
  vars.to.regress = c('percent.mt', 'nFeature_RNA', 'nCount_RNA')
)

# perform cell cycle analysis (make sure to specify the "assay" parameter
sample <- CellCycleScoring(
  sample,
  s.features = s.genes,
  g2m.features = g2m.genes,
  assay = 'SCT',
  set.ident = TRUE
)

# normalise again but this time including also the cell cycle scores
sample <- SCTransform(
  sample,
  assay = 'RNA',
  new.assay.name = 'SCT',
  vars.to.regress = c('percent.mt', 'nFeature_RNA', 'nCount_RNA', 'S.Score', 'G2M.Score')
)

This should then overwrite the data in the SCT assay slot with your data after normalisation for all those factors you provided while ensuring that the cell cycle analysis was performed on your data normalised with SCTransform().

Hopefully a Seurat developer will call me out if I'm wrong :)

CodeInTheSkies · 2020-01-10T20:57:19Z

Hello @romanhaa , @anoronh4 , @j-andrews7 , @yuhanH ,

Just wanted to check in as to which of the above-discussed methods you all tried, and at this date, how did your results turn out, and which of the methods (NormalizeData followed by SCT, or running SCT twice) would you recommend?

Many thanks for any responses, and am looking forward to hearing your views!

j-andrews7 · 2020-01-10T21:01:14Z

I tried both and doing nothing at all with cell cycle, saw little difference, and stopped caring about cell cycle phases in my data. 🤷‍♂

anoronh4 · 2020-01-13T15:08:48Z

Hello @romanhaa , @anoronh4 , @j-andrews7 , @yuhanH ,

Just wanted to check in as to which of the above-discussed methods you all tried, and at this date, how did your results turn out, and which of the methods (NormalizeData followed by SCT, or running SCT twice) would you recommend?

Many thanks for any responses, and am looking forward to hearing your views!

Similar to what @j-andrews7 said, not much has come from it. I have been trying both of these approaches and even tried regressing out the difference between G2M and S scores. I have yet to to come across a dataset where it made a difference -- the phases don't even look less separated to me in the PCA plot! -- and I process datasets from lots of different tissue types. i keep it in my code simply because it is recommended, but that's about it.

yuhanH closed this as completed Jun 12, 2019

yuhanH reopened this Jun 12, 2019

satijalab closed this as completed Jun 14, 2019

CodeInTheSkies mentioned this issue Sep 27, 2019

How to use cell-cycle-scoring with SCTransform? #2146

Closed

michaelp896 mentioned this issue Feb 17, 2020

CellCycleScore function : Error: Insufficient data values to produce 24 bins. #2621

Closed

ktrns mentioned this issue Jun 2, 2020

Remove cell cycle effects ktrns/scrnaseq#22

Merged

MartaBenegas mentioned this issue Nov 6, 2020

SCTransform + CellCycleScoring raises error: Insufficient data values to produce 24 bins #3692

Closed

adri-biochem mentioned this issue Jun 16, 2022

Normalization and Ordering of Cell Cycle Regression, DoubletFinder, and Integration #6087

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding Cell Cycle Scoring and sctransform order #1679

Question regarding Cell Cycle Scoring and sctransform order #1679

j-andrews7 commented Jun 12, 2019

yuhanH commented Jun 12, 2019

j-andrews7 commented Jun 12, 2019

satijalab commented Jun 14, 2019

atermanini commented Jul 5, 2019

mistrm82 commented Aug 6, 2019

CodeInTheSkies commented Sep 27, 2019

satijalab commented Oct 11, 2019

anoronh4 commented Nov 14, 2019

romanhaa commented Nov 22, 2019

anoronh4 commented Nov 22, 2019

romanhaa commented Nov 23, 2019

CodeInTheSkies commented Jan 10, 2020

j-andrews7 commented Jan 10, 2020

anoronh4 commented Jan 13, 2020 •

edited

Question regarding Cell Cycle Scoring and sctransform order #1679

Question regarding Cell Cycle Scoring and sctransform order #1679

Comments

j-andrews7 commented Jun 12, 2019

yuhanH commented Jun 12, 2019

j-andrews7 commented Jun 12, 2019

satijalab commented Jun 14, 2019

atermanini commented Jul 5, 2019

mistrm82 commented Aug 6, 2019

CodeInTheSkies commented Sep 27, 2019

satijalab commented Oct 11, 2019

anoronh4 commented Nov 14, 2019

romanhaa commented Nov 22, 2019

anoronh4 commented Nov 22, 2019

romanhaa commented Nov 23, 2019

CodeInTheSkies commented Jan 10, 2020

j-andrews7 commented Jan 10, 2020

anoronh4 commented Jan 13, 2020 • edited

anoronh4 commented Jan 13, 2020 •

edited