Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a module for joining overlapping reads #292

Open
levinas opened this issue Feb 12, 2015 · 3 comments
Open

add a module for joining overlapping reads #292

levinas opened this issue Feb 12, 2015 · 3 comments

Comments

@levinas
Copy link
Contributor

levinas commented Feb 12, 2015

and perhaps include it in the smart/auto recipes.

We have a case where SPAdes seems to be confused by the insert size and gets stuck in the final misassembly correction step: it's a small dataset, but SPAdes has already spent 10X time in bwa-spades than the core assembly. I haven't seen this before.

[bwa_read_seq] 1.5% bases are trimmed.
[bwa_read_seq] 5.2% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...
[infer_isize] (25, 50, 75) percentile: (21933, 47699, 73351)
[infer_isize] low and high boundaries: 151 and 176187 for estimating avg and std
[infer_isize] inferred external isize from 6214 pairs: 48110.862 +/- 29546.256
[infer_isize] skewness: 0.043; kurtosis: -1.214; ap_prior: 1.00e-05
[infer_isize] inferred maximum insert size: 194069 (4.94 sigma)
[bwa_sai2sam_pe_core] time elapses: 2.27 sec
[bwa_sai2sam_pe_core] changing coordinates of 1553 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...
@cbun
Copy link
Contributor

cbun commented Feb 12, 2015

Can you elaborate?

On Thu, Feb 12, 2015, 7:55 AM Fangfang Xia notifications@github.com wrote:

and perhaps include it in the smart/auto recipes.

We have a case where SPAdes seems to be confused by the insert size and
gets stuck in the final misassembly correction step: it's a small dataset,
but SPAdes has already spent 10X time in bwa-spades than the core assembly.

[bwa_read_seq] 1.5% bases are trimmed.
[bwa_read_seq] 5.2% bases are trimmed.
[bwa_sai2sam_pe_core] convert to sequence coordinate...
[infer_isize](25, 50, 75) percentile: (21933, 47699, 73351)
[infer_isize] low and high boundaries: 151 and 176187 for estimating avg and std
[infer_isize] inferred external isize from 6214 pairs: 48110.862 +/- 29546.256
[infer_isize] skewness: 0.043; kurtosis: -1.214; ap_prior: 1.00e-05
[infer_isize] inferred maximum insert size: 194069 (4.94 sigma)
[bwa_sai2sam_pe_core] time elapses: 2.27 sec
[bwa_sai2sam_pe_core] changing coordinates of 1553 alignments.
[bwa_sai2sam_pe_core] align unmapped mate...


Reply to this email directly or view it on GitHub
#292.

@levinas
Copy link
Contributor Author

levinas commented Feb 12, 2015

So some paired end libs contain overlapping reads within most pairs (e.g., the classical Broad 100bp x 2 library with an insert size of 180bp). It's probably always good to join these reads into longer single end reads and remaining pairs that don't overlap to improve assembly quality. But in the past, I have not seen SPAdes have trouble handling that. In this case, SPAdes is not estimating the insert size correctly and that may have slowed down the final misassembly correction step drastically.

The tools to join reads include pear, flash and maybe some others.

@cbun
Copy link
Contributor

cbun commented Feb 12, 2015

Ah okay, it's clear now, thanks.

On Thu Feb 12 2015 at 10:51:14 AM Fangfang Xia notifications@github.com
wrote:

So some paired end libs containing overlapping reads within most pairs
(e.g., the classical Broad 100bp x 2 library with an insert size of 180bp).
It's probably always good to join these reads into longer single end reads
and remaining pairs that don't overlap to improve assembly quality. But in
the past, I have not seen SPAdes have trouble handling that. In this case,
SPAdes is not estimating the insert size correctly and that may have slowed
down the final misassembly correction step drastically.

The tools to join reads include pear, flash and maybe some others.


Reply to this email directly or view it on GitHub
#292 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants