Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging Biological replicates and confounders #462

Closed
pandeyravi15 opened this issue May 8, 2018 · 12 comments
Closed

Merging Biological replicates and confounders #462

pandeyravi15 opened this issue May 8, 2018 · 12 comments
Labels
Analysis Question duplicate This issue or pull request already exists

Comments

@pandeyravi15
Copy link

pandeyravi15 commented May 8, 2018

Hi

I have 3 biological replicates from 10X genomic data. I want to do pool all of them and remove confounders like batch effect, cell cycle effect, nGene and nUMI. So I can merge three objects using merge Seurat. Then I am removing these 3 factors:

ScaleData(object = control.combined, vars.to.regress = c("batchid", "nUMI", "percent.mito"))

So, should I regress out cell cycle effect after this step or combined that in this step, but how? Also, where I can find S and G2M phase cell cycle genes for mouse to use here for cell cycle regression.

Thanks.

@mojaveazure
Copy link
Member

Whether or not you should regress out cell cycle effects depends on your data. For actually regressing out cell cycle effects, please see #398

@leonfodoulian
Copy link
Contributor

@mojaveazure,

It would be helpful to include in Seurat also a list of mouse cell cycle gene names. I have quickly created a list and stored it in .rds format (see mouse_cell_cycle_genes.rds), that you can find below.
mouse_cell_cycle_genes.zip

Included also is the function I used to convert human gene names to mouse gene names (see ConvertHumanGeneListToMM.R), as well as the script that I used to get these genes (see get_mouse_cell_cycle_genes.R).

I never regressed for cell cycle, so I am not sure to what extent my gene list is valid.

Hope this helps!

Best,
Leon

@pandeyravi15
Copy link
Author

pandeyravi15 commented May 8, 2018

@mojaveazure
So, I need to first check for each datasets whether I have to regress cell cycle effects by PCA plot mentined in your cell-cycle regression vignette. If cell cycle effect is there, than in that case what would be best workflow?For e.g.,if I follow this:

  1. merge all seurat object
  2. scaled out batch effect, nGene, percent.mito,
  3. cellscoring on scaled data
  4. again scaled out cell cycle effect
  5. downstream analysis

is that would be good?

Also, why only S and GEM phase cell cycle genes are considered?

Also list provided by @leonfodoulian is good enough?
@leonfodoulian
Thanks.

@pandeyravi15
Copy link
Author

Cellcycleeffect_oldR1.pdf
If PCA plot for cell cycle genes looks like attached figure, that is mean I do not need to remove cell cycle effect, right?

@mojaveazure
Copy link
Member

We consider S- and G2M-phase genes as these are the cycling genes. If a cell isn't expressing these genes, then the cell is likely in G1-phase, or not cycling. Furthermore, by scoring on these genes, we can regress out all cell cycle signal or only the cycling/uncycling status (see Alternate Workflow in the cell cycle vignette). As for whether you should regress out cell cycle effects, you can run ScaleData on your scored object and store it in a new object. That way, you can compare the PCA plots and use the object that produces the better plot.

@mojaveazure mojaveazure added duplicate This issue or pull request already exists Analysis Question labels May 8, 2018
@pandeyravi15
Copy link
Author

Okay, so my question is when to do this step: before filtering out cell and scaling out for batch effect, percent.mito or after that?

@mojaveazure
Copy link
Member

Ideally, at the same time. ScaleData scales and regress data stored in the data slot and stores the results in the scale.data slot. If you'd like to do this in separate steps, you would first scale on other effects, then pass that matrix to ScaleData in the data.use argument.

control.combined <- ScaleData(object = control.combined, vars.to.regress = c("batchid", "nUMI", "percent.mito"))
# Pass whatever gene list you're using, could be cc.genes$s.genes and cc.genes$g2m.genes
control.combined <- CellCycleScoring(object = control.combined, s.genes = s.genes, g2m.genes = g2m.genes)
# Could also use difference between scores. See Alternate Workflows of cell cycle scoring vignette
control.combined <- ScaleData(object = control.combined, data.use = control.combined@scale.data, vars.to.regress = c("S.Score", "G2M.Score"))

@pandeyravi15
Copy link
Author

Okay, thanks.

Will you recommend to do just do homolog search for cell cycle genes for mouse as @leonfodoulian suggested or do you have genes specific for mouse

@mojaveazure
Copy link
Member

We currently do not provide specific genes other than the ones in the cc.genes list. I'd suggest using @leonfodoulian's approach.

@vondoRishi
Copy link

vondoRishi commented Jun 1, 2018

@mojaveazure It would be nice if data.use = control.combined@scale.data can be highlighted in the Cell-cycle vignette itself. I have noticed this only after 3 months.
Thanks

@Sophia409
Copy link

Sophia409 commented Mar 20, 2019

@mojaveazure @vondoRishi In seuratV3.0, there is no data.use parameter in ScaleData.If I didn't use data.use = control.combined@scale.data in separate steps, does it has any bad effects? Or what should I do?

control.combined <- ScaleData(object = control.combined, vars.to.regress = c("batchid", "nUMI", "percent.mito"))
# Pass whatever gene list you're using, could be cc.genes$s.genes and cc.genes$g2m.genes
control.combined <- CellCycleScoring(object = control.combined, s.genes = s.genes, g2m.genes = g2m.genes)
# Could also use difference between scores. See Alternate Workflows of cell cycle scoring vignette
control.combined <- ScaleData(object = control.combined, vars.to.regress = c("S.Score", "G2M.Score"))
object An object
... Arguments passed to other methods
features Vector of features names to scale/center. Default is all features
vars.to.regress Variables to regress out (previously latent.vars in RegressOut). For example, nUMI, or percent.mito.
latent.data Extra data to regress out, should be cells x latent data
model.use Use a linear model or generalized linear model (poisson, negative binomial) for the regression. Options are 'linear' (default), 'poisson', and 'negbinom'
use.umi Regress on UMI count data. Default is FALSE for linear modeling, but automatically set to TRUE if model.use is 'negbinom' or 'poisson'
do.scale Whether to scale the data.
do.center Whether to center the data.
scale.max Max value to return for scaled data. The default is 10. Setting this can help reduce the effects of feautres that are only expressed in a very small number of cells. If regressing out latent variables and using a non-linear model, the default is 50.
block.size Default size for number of feautres to scale at in a single computation. Increasing block.size may speed up calculations but at an additional memory cost.
min.cells.to.block If object contains fewer than this number of cells, don't block for scaling calculations.
verbose Displays a progress bar for scaling procedure
assay Name of Assay to scale

mojaveazure added a commit that referenced this issue Nov 23, 2020
Fix issue in Load10X_Spatial documentation
@Sergio-ote
Copy link

Best answer to #2493 is the zip file in this thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Analysis Question duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

6 participants