Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seurat3.0 negative value in data slot from IntegrateData() #1057

Closed
zingery opened this issue Jan 14, 2019 · 5 comments
Closed

Seurat3.0 negative value in data slot from IntegrateData() #1057

zingery opened this issue Jan 14, 2019 · 5 comments

Comments

@zingery
Copy link

zingery commented Jan 14, 2019

Hi,

Thank you for providing the community a great single cell analysis tool. I am attempting to integrate two datasets with batch effect in Seurat v3 for evaluation purpose, following default configuration in each function. However I am obtaining negative values in the integration result. My understanding from the online FAQ (https://satijalab.org/seurat/faq):

The data slot (object@data) stores normalized and log-transformed single cell expression. This maintains the relative abundance levels of all genes, and contains only zeros or positive values

In a second try with a different datasets I am also retrieving negative values in the data slot. I have confirmed in all cases the input matrices do not contain values <0 and the outputs provide expected subpopulations / gene patterns. I am wondering if you can help pinpoint what may have been configured incorrectly?

Best regards,
Zinger

// Seurat_3.0.0.9000 from devtool

obj1 <- NormalizeData(object = obj1, verbose = FALSE)
obj2 <- NormalizeData(object = obj2, verbose = FALSE)

obj_both <- c(obj1,obj2)

obj_both.anchor <- FindIntegrationAnchors(object.list = obj)

obj_both.integrated <- IntegrateData(anchorset = obj_both.anchor)

obj_both_integrated_data <- obj_both.integrated@assays$integrated@data #this is the result table with negative results.
@timoast
Copy link
Collaborator

timoast commented Jan 14, 2019

Hi Zinger,

Does the presence of negative values cause any problem in downstream applications? For expression data, the data slot normally stores log-normalized count data (which would never be negative), but for data integration these values should not be interpreted the same way, and may well contain negative values as we are subtracting values from one dataset to remove technical differences.

Tim

@zingery
Copy link
Author

zingery commented Jan 14, 2019

Hi Tim,

Thank you for the prompt response. We are interesting to perform differential expression analysis between clusters and we are uncertain of how to approach the negative values. The need for integration stems from significant batch effect in the dataset we are provided with.

Best regards,
Zinger

@timoast
Copy link
Collaborator

timoast commented Jan 14, 2019

In general we don't recommend using the integrated matrix for differential expression, instead you might like to try using the logistic regression test with batch as a latent variable:

DefaultAssay(object) <- "RNA"
markers <- FindAllMarkers(object, test.use = "LR", latent.vars = "orig.ident")

This uses the uncorrected, log-normalized, expression data and compares a logistic regression model predicting group membership from the expression of the gene and latent variables with a null model containing only latent variables. The test is based on this paper from Lior Pachter's group: http://dx.doi.org/10.1101/258566

@zingery
Copy link
Author

zingery commented Jan 14, 2019

Hi Tim,

Thanks for the clarification and recommendation. We will look into it.

Best,
Zinger

@BioAmelie
Copy link

Hi Tim,
I have the same confusion with Zinger. I also need to use corrected data to find DE genes. You recommend we should use uncorrected data in FindAllmarkers but in the two steps FindNeighbors and FindClusters, which data should I use?

Best regards,
minfang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants