Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant salmon TPM output #575

Closed
j-andrews7 opened this issue Mar 4, 2021 · 6 comments
Closed

Remove redundant salmon TPM output #575

j-andrews7 opened this issue Mar 4, 2021 · 6 comments
Labels
bug Something isn't working
Milestone

Comments

@j-andrews7
Copy link

Description of the bug

Currently, the pipeline generates multiple TPM output from salmon, e.g. salmon.merged.gene_tpm.tsv, salmon.merged.gene_tpm_length_scaled.tsv. These are redundant, as the TPM values don't change. These files are output on both a per-sample and merged basis, so it creates a fair amount of clutter.

Expected behaviour

A single TPM file should be produced instead.

@j-andrews7 j-andrews7 added the bug Something isn't working label Mar 4, 2021
@grst
Copy link
Member

grst commented Mar 4, 2021

Actually, they should not be the same...

  • TPM is scaled by effective gene length per sample
  • TPM length-scaled is scaled by average effective gene length across all samples

(see tximport Vignette)

So yes, on a per-sample basis, they are redundant. The merged files should be (slightly?) different.

@j-andrews7
Copy link
Author

j-andrews7 commented Mar 4, 2021

No, TPMs are TPMs. The counts derived from them will change with those different parameters, the TPM values themselves do not. The salmon.merged.gene_counts.tsv, salmon.merged.gene_counts_length_scaled.tsv, etc files are still necessary and will change slightly as you describe.

@grst
Copy link
Member

grst commented Mar 4, 2021

Are you sure? That's not how I understand this section...

We could alternatively generate counts from abundances, using the argument countsFromAbundance, scaled to library size, "scaledTPM", or additionally scaled using the average transcript length, averaged over samples and to library size, "lengthScaledTPM"

@j-andrews7
Copy link
Author

Yes. That section applies to the counts that are derived from the abundance (TPMs). The abundance values themselves do not change. You can run with countsFromAbundance set to any option - the abundance matrix will be the same for them all.

This is also confirmed by Mike Love here.

@grst
Copy link
Member

grst commented Mar 4, 2021

ah, now I got it! Thanks for clearing that up!

@drpatelh drpatelh added this to the 3.1 milestone Apr 11, 2021
@drpatelh
Copy link
Member

FIxed in #598

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants