You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am confused with this one paper (Rasmussen et al, 2019) which also follows yours, although they affirm that RPKM normalization is done to account for contig size, not library size;
"Prior any analysis the raw read counts in the vOTU-tables were normalized by reads per kilobase per million mapped reads (RPKM) [48], since the size of the viral contigs is highly variable [49]"
Is it correct to divide counts by contig size and then transform them into RPKM, or only do as Rasmussen et al?
Again, sorry if this is not the right channel to question.
Thank you very much.
The text was updated successfully, but these errors were encountered:
Sorry for the confusion, in short: Rasmussen et al. 2019 is correct. Basically RPKM (at least the way we use it) provides two "correction":
By library size (this is the "M" for "million of reads")
By contig size (this is the "K" for "kilobase of contigs"). Note that the original RPKM in e.g. RNA-Seq does this per gene, but this is the same idea.
In our benchmark, we test both a simple normalization by contig size (i.e. kind of "RPK"-only), and we test normalization by both contig size and library size (proper "RPKM"). For simplicity, it is probably best to do as Rasmussen et al. did and directly transform your read mapping data into RPKM, providing edgeR with (i) library size, and (ii) contig length.
Hi, Simon!
Sorry to bother you here, but I think you are my best source of help on this subject:
In your paper "Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity", there is a normalization of raw counts by contig size. Afterwards, normalization by RPKM (edgeR) as correction for different library sizes.
I am confused with this one paper (Rasmussen et al, 2019) which also follows yours, although they affirm that RPKM normalization is done to account for contig size, not library size;
Is it correct to divide counts by contig size and then transform them into RPKM, or only do as Rasmussen et al?
Again, sorry if this is not the right channel to question.
Thank you very much.
The text was updated successfully, but these errors were encountered: