Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input-rna returns zero fold-change from non-rsem input #425

Closed
umasstr opened this issue Mar 12, 2019 · 5 comments
Closed

input-rna returns zero fold-change from non-rsem input #425

umasstr opened this issue Mar 12, 2019 · 5 comments

Comments

@umasstr
Copy link

umasstr commented Mar 12, 2019

Given a 2-column input .tsv (from kallisto abundance.tsv or abundance.h5 processed by sleuth), import-rna generates a .cnr with zero log fold-change for every gene. If normalization is excluded from the code then input-rna returns non-zero values.

I would appreciate any insight as to why values may be thrown out at the quantile step of normalization.

@etal
Copy link
Owner

etal commented Mar 18, 2019

How many samples did you use? If just one, then it normalizes against itself and the result is all zeros. (I'll see about improving the error reporting there.)

@umasstr
Copy link
Author

umasstr commented Apr 10, 2019

This is using four samples with this command: cnvkit.py import-rna -f counts -g ~/cnvkit/data/ensembl-gene-info.mm10.tsv -o ~/output /data/sample1/abundance.tsv /data/sample2/abundance.tsv /data/sample3/abundance.tsv /data/sample4/abundance.tsv.

Lines in the gene info file look something like this:
ENSMUSG00000102693 34.2056074766 1 3073253 3074322 4933401J01Rik 0 1070 tsl5
ENSMUSG00000064842 36.3636363636 1 3102016 3102125 Gm26206 1 110 tsl5
ENSMUSG00000051951 50.1375894331 1 3205901 3671498 Xkr4 2 465598 tsl5
ENSMUSG00000102851 39.7916666667 1 3252757 3253236 Gm18956 3 480 tsl5

and the sample abundance files look like this:
ENSMUSG00000102693 0.006857
ENSMUSG00000064842 0.000000
ENSMUSG00000051951 0.078329
ENSMUSG00000102851 0.019840

they are TPMs, I have also tried multiplying them by the read depth to approximate number of counts, and have also tried the --no-txlen and --no-gc flags but all combinations produce .cnr files with weights of 1 for all genes and log fold-change values of 0 for all genes.

@umasstr
Copy link
Author

umasstr commented Apr 10, 2019

hg38-based tpms generated by kallisto return a similar normalization problem. However, if I submit .rsem from the same dataset, it works.

@etal etal added the bug label Apr 11, 2019
@etal
Copy link
Owner

etal commented Apr 11, 2019

Thanks for the details, this sounds like a bug in reading the plain 2-column format. I'll look into it.

@etal
Copy link
Owner

etal commented Nov 29, 2019

It looks like the 2-column input format is now working with import-rna -f counts in the development version. I'll roll another release.

@etal etal closed this as completed Nov 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants