Normalize by contig size before RPKM? #1

Aciole-David · 2021-04-26T17:17:35Z

Hi, Simon!
Sorry to bother you here, but I think you are my best source of help on this subject:

In your paper "Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity", there is a normalization of raw counts by contig size. Afterwards, normalization by RPKM (edgeR) as correction for different library sizes.

I am confused with this one paper (Rasmussen et al, 2019) which also follows yours, although they affirm that RPKM normalization is done to account for contig size, not library size;

"Prior any analysis the raw read counts in the vOTU-tables were normalized by reads per kilobase per million mapped reads (RPKM) [48], since the size of the viral contigs is highly variable [49]"

Is it correct to divide counts by contig size and then transform them into RPKM, or only do as Rasmussen et al?

Again, sorry if this is not the right channel to question.
Thank you very much.

simroux · 2021-04-26T23:19:21Z

Hi @Aciole-David

Sorry for the confusion, in short: Rasmussen et al. 2019 is correct. Basically RPKM (at least the way we use it) provides two "correction":

By library size (this is the "M" for "million of reads")
By contig size (this is the "K" for "kilobase of contigs"). Note that the original RPKM in e.g. RNA-Seq does this per gene, but this is the same idea.

In our benchmark, we test both a simple normalization by contig size (i.e. kind of "RPK"-only), and we test normalization by both contig size and library size (proper "RPKM"). For simplicity, it is probably best to do as Rasmussen et al. did and directly transform your read mapping data into RPKM, providing edgeR with (i) library size, and (ii) contig length.

Let me know if it makes sense !
Best,
Simon

Aciole-David · 2021-04-27T12:07:34Z

Simon, it makes sense to me perfectly.
You just saved me some hours of discussions here!
Thank you a lot for the quick reply
Cheers

Aciole-David closed this as completed Apr 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize by contig size before RPKM? #1

Normalize by contig size before RPKM? #1

Aciole-David commented Apr 26, 2021

simroux commented Apr 26, 2021

Aciole-David commented Apr 27, 2021

Normalize by contig size before RPKM? #1

Normalize by contig size before RPKM? #1

Comments

Aciole-David commented Apr 26, 2021

simroux commented Apr 26, 2021

Aciole-David commented Apr 27, 2021