New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
input-rna returns zero fold-change from non-rsem input #425
Comments
How many samples did you use? If just one, then it normalizes against itself and the result is all zeros. (I'll see about improving the error reporting there.) |
This is using four samples with this command: cnvkit.py import-rna -f counts -g ~/cnvkit/data/ensembl-gene-info.mm10.tsv -o ~/output /data/sample1/abundance.tsv /data/sample2/abundance.tsv /data/sample3/abundance.tsv /data/sample4/abundance.tsv. Lines in the gene info file look something like this: and the sample abundance files look like this: they are TPMs, I have also tried multiplying them by the read depth to approximate number of counts, and have also tried the --no-txlen and --no-gc flags but all combinations produce .cnr files with weights of 1 for all genes and log fold-change values of 0 for all genes. |
hg38-based tpms generated by kallisto return a similar normalization problem. However, if I submit .rsem from the same dataset, it works. |
Thanks for the details, this sounds like a bug in reading the plain 2-column format. I'll look into it. |
It looks like the 2-column input format is now working with |
Given a 2-column input .tsv (from kallisto abundance.tsv or abundance.h5 processed by sleuth), import-rna generates a .cnr with zero log fold-change for every gene. If normalization is excluded from the code then input-rna returns non-zero values.
I would appreciate any insight as to why values may be thrown out at the quantile step of normalization.
The text was updated successfully, but these errors were encountered: