Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about the input of Seurat #668

Closed
xiexiaowei opened this issue Jul 30, 2018 · 3 comments
Closed

question about the input of Seurat #668

xiexiaowei opened this issue Jul 30, 2018 · 3 comments

Comments

@xiexiaowei
Copy link

Dear authors,

Recently, we used your powerful Seurat software and I met some questions.
Your website indicated that, "count,TPM,FPKM" are allowed as the input of Seurat, but the input expression matrix should not be log-transformed.

  1. Because we want to know the difference between TPM and logTPM, we tested our data by Seurat in the data format of TPM and logTPM.
    If data was in the format of logTPM, no the step of "NormalizeData".

We found that the tSNE result of TPM and logTPM were different, but with regard to specific cluster (we tried C0 and C1), >80% cell overlap existed between two conditions.
So what's the reason of "input expression matrix should not be log-transformed"?

2.According to the manual of Seurat, the normlization method in Seurat is log normlization,
why both raw count and TPM are allowed as the input of Seurat? Are there any steps to transform them into identical thing for later analysis?

I need your help urgently and show my great appreciation to you!
xiaowei,

@xiexiaowei
Copy link
Author

In fact, we also fount that the tSNE result of umi count and TPM were different.
So now, which type of input should I take?

@catsargent
Copy link

I would also appreciate some clarification on how to use TPM files with Seurat. I have been provided with a log(TPM+1) file. Is it necessary for me to convert this back to just TPM, then proceed with the analysis (skipping normalization)? Or is it possible to import the log(TPM+1) file and use that?
Many thanks,
Catherine

@leonfodoulian
Copy link
Contributor

Hi,

I guess that using raw counts is the easiest way to process data through Seurat. However, if you have TPM counts, I suggest you don't use Seurat::NormalizeData(), since TPM counts are already normalized for sequencing depth and transcript/gene length. Note that Seurat::NormalizeData() normalizes the data for sequencing depth, and then transforms it to log space. If you have TPM data, you can simply manually log transform the gene expression matrix in the object@data slot before scaling the data.

For more information please check issues #171, #181, and #481.

Hope these help!

Best,
Leon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants