Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple loop bridge step is stuck #181

Closed
ptric opened this issue Apr 22, 2019 · 1 comment
Closed

Simple loop bridge step is stuck #181

ptric opened this issue Apr 22, 2019 · 1 comment

Comments

@ptric
Copy link

ptric commented Apr 22, 2019

Its my first time using unicycler (conda v0.4.7) for hybrid nanopore/illumina assembly of a 5.5Mb bacterial genome. The analysis always hangs at the same point: when creating simple long read bridges. When evaluating simple loops, after scoring loops 1 to 3, it produces no further output in the simple_bridging folder , except for previous output all_segments.fasta. There is no error message and seemingly no progress; after 24 hours the cluster kills the job.

I've attempted several fixes but they were all unsuccessful: using a fresh conda install of older unicycler version 0.4.4, using a single thread instead of 12, subsampling my nanopore reads to min length 5000bp, running bold mode, skipping minasm steps, and using an input canu assembly of 17 contigs instead of nanopore reads. I also didn't find a solution in issue number #118

Partial logfile is below:

Creating SPAdes contig bridges (2019-04-22 13:44:28)
SPAdes uses paired-end information to perform repeat resolution (RR) and produce
contigs from the assembly graph. SPAdes saves the graph paths corresponding to these
contigs in the contigs.paths file. When one of these paths contains two or more anchor
contigs, Unicycler can create a bridge from the path.

Start Path End quality
-61 -160 71 48.7
-54 -160 -> 174 67 55.7
-49 -159 -> -116 -> -136 56 22.0
-30 139 40 50.8
-12 107 66 1.1
3 154 -70 63.0
15 -67 44.7
17 -152 -49 53.4
25 13 63.1
27 -172 48 61.6
30 171 -57 19.3
36 150 -> -137 -65 6.0
47 -167 68 62.9
60 185 -> 166 35 62.7
68 -167 10 61.9
74 173 -> -125 -> 147 78 27.9
75 173 -> -126 -> 147 77 25.9
77 -151 -> -101 -> 164 -> 92 -> 165 -> -96 -> 161 -> 128 -> -169 85 9.8
78 -151 -> -100 -> 164 -> -91 -> 165 -> -95 -> 161 -> -127 -> -169 84 9.7
83 -158 -85 62.8
84 158 82 63.0

Creating loop unrolling bridges (2019-04-22 13:44:28)
When a SPAdes contig path connects an anchor contig with the middle contig of a
simple loop, Unicycler concludes that the sequences are contiguous (i.e. the loop is
not a separate piece of DNA). It then uses the read depth of the middle and repeat
contigs to guess the number of times to traverse the loop and makes a bridge.

Loop count Loop count Loop Bridge
Start Repeat Middle End by repeat by middle count quality
-12 107 66 26 10.29 9.06 9 2.1
47 -167 68 10 0.92 1.06 1 40.2
72 -154 70 -3 1.14 0.97 1 35.3

Loading reads (2019-04-22 13:44:28)
187,322 / 187,322 (100.0%) - 2,627,031,784 bp

Creating simple long read bridges (2019-04-22 13:45:42)
Unicycler uses long read alignments (from minimap) to resolve simple repeat
structures in the graph. This takes care of some "low-hanging fruit" of the graph
simplification.

Aligning long reads to graph using minimap
Saving /gpfs/fs0/scratch/d/dguttman/pstapton/temp/b/unicycler_output_100419/simple_bridging/all_segments.fasta
Number of minimap alignments: 187126

Two-way junctions are defined as cases where two graph contigs (A and B) join
together (C) and then split apart again (D and E). This usually represents a simple
2-copy repeat, and there are two possible options for its resolution: (A->C->D and
B->C->E) or (A->C->E and B->C->D). Each read which spans such a junction gets to "vote"
for option 1, option 2 or neither. Unicycler creates a bridge at each junction for the
most voted for option.

Op.1 Op. 2 Neither Final Bridge
Junction Option 1 Option 2 votes votes votes op. quality
107 -12 -> 107 -> 26, -12 -> 107 -> 66, 319 1 16 1 94.8
66 -> 107 -> 66 66 -> 107 -> 26
112 -28 -> 112 -> -41, -28 -> 112 -> -23, 738 0 9 1 98.6
-14 -> 112 -> -23 -14 -> 112 -> -41
130 -53 -> 130 -> 2, -53 -> 130 -> 47, 365 377 25 2 0.0
53 -> 130 -> 47 53 -> 130 -> 2
154 -70 -> 154 -> -72, -70 -> 154 -> -70, 742 1 5 1 98.9
3 -> 154 -> -70 3 -> 154 -> -72
158 84 -> 158 -> -83, 84 -> 158 -> 82, 0 694 16 2 97.1
85 -> 158 -> 82 85 -> 158 -> -83
167 -68 -> 167 -> -68, -68 -> 167 -> -47, 2 778 3 2 97.3
-10 -> 167 -> -47 -10 -> 167 -> -68
168 -76 -> 168 -> -76, -76 -> 168 -> 46, 10 890 17 2 55.8
16 -> 168 -> 46 16 -> 168 -> -76

Simple loops are parts of the graph where two contigs (A and B) are connected via a
repeat (C) which loops back to itself (via D). It is possible to traverse the loop zero
times (A->C->B), one time (A->C->D->C->B), two times (A->C->D->C->D->C->B), etc. Long
reads which span the loop inform which is the correct number of times through. In this
step, such reads are found and each is aligned against alternative loop counts. A reads
casts its "vote" for the loop count it agrees best with, and Unicycler creates a bridge
using the most voted for count.

Read Loop Bridge
Start Repeat Middle End count Read votes count quality
-46 -168 76 -16 320 0 loops: 1 vote 1 83.9
1 loop: 299 votes
2 loops: 11 votes
3 loops: 9 votes

@rrwick
Copy link
Owner

rrwick commented Jan 22, 2022

This seems to be the same problem as #256, which I've fixed in the current version of Unicycler (v0.5.0). Thanks for letting me know!

@rrwick rrwick closed this as completed Jan 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants