-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About data preprocessing for diffusion pseudotime #26
Comments
Hi ShuhuaGao, thanks for your input! Monocle 2 has many more options for preprocessing, that's right. I believe though that you should get along with the limited options of Scanpy for a robust pseudotime and branching inference using DPT; simply because DPT is very robust. Nonetheless I have to admit that I've not worked with an extensive number of data types. From this experience, my understanding is the following
Ask if you have further questions. 😄 |
Hi, Alex, Many thanks for your quick reply. I just saw your reply as it is almost 10PM in Singapore now. It is understandable to perform quality control, in-cell normalization and to extract the highly variable genes for ordering. I got your point. For your reply about qPCR, do we need a log normalization? I think a log transform is only required for RNA-Seq data to get a non-skewed normal distribution. As for qPCR data, the delta_Ct value is actually already in a log scale. In the example you have mentioned, there is no call of sc.pp.log1p, either. Instead, we just read the data by Besides, in many cases, there may be no UMI data available. In such a case, the normalization per cell for RNA-Seq is actually to compute the FPKM/TPM to compensate for the sequencing depth, right? Usually, the RNA-Seq data in FPKM form is already provided in publications. And then we work on this data to find the highly variable genes. (Just personal understanding. I am new to this field from mechatronics engineering.) Anyway, thanks again for your help. I noticed that there are no examples for pseudo-time ordering with RNA-Seq data. Maybe I can provide one in the near future, as I am working on gene network modeling based on the pseudo-time information. |
Hi! Everything that you write makes sense: if the qPCR values are already on a log scale, you shouldn't log-transform them anymore / if the RNA-seq data is already in FPKM form, you do not need to do account for UMI correction ... Regarding the pseudotime example for RNA-seq data: here is a public one. But it would be nice to have more! Thanks for your input! |
Thanks for your reply. I will try that and may given more feedback. Cheers! |
Hi, first thanks for sharing this analysis tool. I prefer Python much more to R, though most Bioinformatics tools are written in R. Here I want to ask a question about data processing before we feed it as adata into dpt for pseudotime ordering.
As the DPT algorithm can accept multiple types of data, such as the most commonly single-cell qPCR (Ct values) and RNA-Seq (FPKM/TPM) data, is the data processing procedure identical with each other? Since I have also checked the Monocle 2 algorithm, it seems much more complicated in Monocle 2. For instance, in the 4th page of its document link, it asks you to specify different expressionFamily, i.e., the proper distribution of the data, for different kinds of data. Then, how about the dpt function in scanpy? Does it take all kinds of data the same way?
According to my understanding,
Is it right?
Any help is appreciated.
The text was updated successfully, but these errors were encountered: