Lower than expected coverage for illumina paired end reads #22

ilyavs · 2021-05-26T05:50:58Z

Hello,
I am using rasusa extensively in my illumina paired end pipelines. I noticed that sometimes, I get significantly lower coverage than the one I gave as a parameter. For example, I subsample to 250x and get 240x. This is somewhat of a problem for my applications.
From an anecdotic check, I see that this might have something to do with poor quality reads that I am quality trimming to minimum length with cutadapt. This results in a highly skewed read length distribution. Here is an example:

Can this issue be resolved within the scope of this project?
Thanks,
Ilya.

mbhall88 · 2021-06-15T04:58:51Z

When you say you get 240x instead of 250x is that based on the rasusa logs or some other way of calculating this?

The way rasusa subsamples for paired reads is it gathers all of the read lengths for the first fastq file (R1), and randomly selects reads until the total length of those sampled reads is half of the required total read length for the given genome size and coverage. It then outputs those sampled reads and their mate in R2. So there is an implicit assumption here that the mate has the same read length. So if you're seeing 240x instead of 250x that suggests your R2 might have had more bases trimmed than R1...

ilyavs · 2021-06-15T10:17:45Z

Hi,
Thank you for your response. It is typical for R2 to have lower quality than R1 and therefore one could expect R2 reads to be shorter after quality trimming. Given your explanation of how rasusa works, I understand the reason for the difference in coverage.
I get 240x instead of 250x calculated by taking the total number of base pairs after subsampling with rasusa and dividing by the genome size provided to rasusa. If I remember correctly, this matched the rasusa logs but I don't have these at hand.
Thanks,
Ilya.

ilyavs · 2021-06-15T10:20:04Z

I would suggest summing the read length of R1 and R2 and randomly selecting read pairs until the total length of those sampled reads matches the required total read length for the given genome size and coverage.

mbhall88 · 2021-06-16T01:55:02Z

Ok, that makes sense then.

I would suggest summing the read length of R1 and R2 and randomly selecting read pairs until the total length of those sampled reads matches the required total read length for the given genome size and coverage.

This is a more accurate solution to the than what I currently do for sure. I'll look at trying to change the implementation to do this.

ilyavs · 2021-06-16T05:19:45Z

Great. Thank you :)

ilyavs · 2021-07-21T09:33:58Z

Hi,
Any idea when this fix will be implemented and distributed?
Thanks,
Ilya.

mbhall88 · 2021-07-22T02:08:02Z

Sorry @ilyavs , I am currently writing up my PhD thesis so I don't have a huge amount of spare time. However, it is still very much high on my list of things to do so hopefully I can get around to it soon.

mbhall88 · 2021-07-29T07:26:09Z

Hi @ilyavs, would be able to do me a favour and test out b72405e and see if it resolves this? You can either build from the source in that commit, or if you are using a container you can use the following image

# for docker
quay.io/mbhall88/rasusa:b72405e
# for singularity
docker://quay.io/mbhall88/rasusa:b72405e

mbhall88 · 2021-08-09T07:06:43Z

This should be sorted in version 0.4.0. Let me know if there are still problems

Merge fix for #22 onto master

mbhall88 added the enhancement New feature or request label Jun 16, 2021

mbhall88 closed this as completed Aug 9, 2021

mbhall88 mentioned this issue Aug 13, 2021

Merge fix for #22 onto master #24

Merged

mbhall88 added a commit that referenced this issue Aug 13, 2021

Merge pull request #24 from mbhall88/issue22

72054f4

Merge fix for #22 onto master

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower than expected coverage for illumina paired end reads #22

Lower than expected coverage for illumina paired end reads #22

ilyavs commented May 26, 2021

mbhall88 commented Jun 15, 2021

ilyavs commented Jun 15, 2021

ilyavs commented Jun 15, 2021

mbhall88 commented Jun 16, 2021

ilyavs commented Jun 16, 2021 •

edited

Loading

ilyavs commented Jul 21, 2021

mbhall88 commented Jul 22, 2021

mbhall88 commented Jul 29, 2021

mbhall88 commented Aug 9, 2021 •

edited

Loading

Lower than expected coverage for illumina paired end reads #22

Lower than expected coverage for illumina paired end reads #22

Comments

ilyavs commented May 26, 2021

mbhall88 commented Jun 15, 2021

ilyavs commented Jun 15, 2021

ilyavs commented Jun 15, 2021

mbhall88 commented Jun 16, 2021

ilyavs commented Jun 16, 2021 • edited Loading

ilyavs commented Jul 21, 2021

mbhall88 commented Jul 22, 2021

mbhall88 commented Jul 29, 2021

mbhall88 commented Aug 9, 2021 • edited Loading

ilyavs commented Jun 16, 2021 •

edited

Loading

mbhall88 commented Aug 9, 2021 •

edited

Loading