Skip to content
martinghunt edited this page Oct 23, 2015 · 8 revisions

Troubleshooting

Sometimes, Circlator may not work as expected with the default settings, and miss some contig merges, or miss circularizing a circular sequence. This page describes the most common reasons for this, and how to interpret the log files to find out what happened.

Important parameters

Circlator is most sensitive to the length of contig ends that are reassembled. This is determined by circlator all --b2r_length_cutoff X when running the whole pipeline, or by circlator bam2reads --length_cutoff X, when just extracting reads mapped to contig ends. The value X determines the cutoff between 'short' and 'long' contigs. All reads mapped to contigs no longer than X are used for reassembly. For contigs longer than X, the reads mapped within X/2 of the contig's ends are used for reassembly. A larger value of X means that more sequence is reassembled, which increases the chances of unique sequence in the reassembly (particularly if the input contigs end in repeats). However, making X very large means that a large proportion of the genome is being reassembled, which could result in a longer run time and more assembly errors. Typically, X should be set to at least twice the read length. Relatively small values of X can result in SPAdes failing to produce an assembly.

The other parameters that are most likely to affect the results are:

  • --merge_min_length_merge: the minimum length of nucmer match;
  • --merge_reassemble_end: the maximum allowed distance between a nucmer match and the end of a reassembly SPAdes contig;
  • --merge_ref_end: the maximum allowed distance between a nucmer match and the end of an input assembly contig.

Contig merging

During iterative contig merging, Circlator writes verbose output to a log file called 04.merge.merge.iterations.log (if you ran circlator all, otherwise the filename has a different prefix if you ran circlator merge). There are two things you can do to check what happened, and why merges were made or missed:

  1. Check in the details of 04.merge.merge.iterations.log.
  2. Use ACT (Artemis Comparison Tool) to compare the current state of the assembly against the SPAdes reassembled contigs. At each iteration, Circlator writes a script that will open ACT showing the comparison of the assemblies relevant to that iteration. When ACT starts, if the "Indexed FASTA Found" message appears, choose "Concatenate Sequences", not "Use index".

The first two lines of each iteration detail the file of nucmer matches being used, and how to view it in ACT. For example, iteration 1 will look like this in the log file:

[merge iterative_merge 1]       Using nucmer matches from 04.merge.merge.iter.1.coords
[merge iterative_merge 1]       You can view the nucmer matches with ACT using: ./04.merge.merge.iter.1.start_act.sh

Using ACT is often the most informative method, since assemblies can vary dramatically, and so the log file cannot describe every possible situation. Some possible scenarios are shown below

Potential pairs of nucmer matches

First, Circlator checks each SPAdes reassembly contig in turn, to see if it can be used to join two of the input assembly contigs. It reports any potential pair of hits in the log file, for example:

[merge iterative_merge 1]       NODE_4_length_78927_cov_20.3595_ID_2334: checking nucmer matches
[merge iterative_merge 1]       NODE_4_length_78927_cov_20.3595_ID_2334: potential pair of hits to merge contigs...
[merge iterative_merge 1]       NODE_4_length_78927_cov_20.3595_ID_2334:    1   58376   58374   1       58376   58374   99.99   62258   78927   1       unitig_5 NODE_4_length_78927_cov_20.3595_ID_2334
[merge iterative_merge 1]       NODE_4_length_78927_cov_20.3595_ID_2334:    1436        24901   78927   55467   23466   23461   99.94   24901   78927   1       unitig_12        NODE_4_length_78927_cov_20.3595_ID_2334

This means that one of the matches is at the start of the SPAdes contig NODE_4, and the other is at the end. Also, the orientation and distances from the contig ends are OK. Next, Circlator checks if each of these matches have the correct orientation in the input contigs (in this case unitig_5 and unitig_12), and if there are longer hits to elsewhere in the genome. Based on these checks, the pair of matches can be accepted or rejected.

A successful merge

Lines in the log file showing a successful merge will look like this:

[merge iterative_merge 1]       NODE_4_length_78927_cov_20.3595_ID_2334: checking nucmer matches
[merge iterative_merge 1]       NODE_4_length_78927_cov_20.3595_ID_2334: potential pair of hits to merge contigs...
[merge iterative_merge 1]       NODE_4_length_78927_cov_20.3595_ID_2334:    1   58376   58374   1       58376   58374   99.99   62258   78927   1       unitig_5 NODE_4_length_78927_cov_20.3595_ID_2334
[merge iterative_merge 1]       NODE_4_length_78927_cov_20.3595_ID_2334:    1436        24901   78927   55467   23466   23461   99.94   24901   78927   1       unitig_12        NODE_4_length_78927_cov_20.3595_ID_2334
[merge iterative_merge 1]       NODE_4_length_78927_cov_20.3595_ID_2334:    might be used - no longer hits elsewhere and orientation/distance to ends OK

and later in the log:

[merge iterative_merge 1]       Merging contigs unitig_5 and unitig_12 using these two hits:
[merge iterative_merge 1]               1       58376   58374   1       58376   58374   99.99   62258   78927   1       unitig_5 NODE_4_length_78927_cov_20.3595_ID_2334
[merge iterative_merge 1]               1436    24901   78927   55467   23466   23461   99.94   24901   78927   1       unitig_12        NODE_4_length_78927_cov_20.3595_ID_2334

The SPAdes contig NODE_4 was used to merge the input assembly contigs unitig_5 and unitig_7.

No merge made

There must be no longer match from the SPAdes contig to somewhere other than the two input assembly contigs that are being considered for merging. If there is a longer match, the log file contains output like this:

[merge iterative_merge 1]       NODE_2_length_92383_cov_16.1266_ID_1432:    1   50006   49851   1       50006   49851   99.67   837058  92383   1       unitig_7 NODE_2_length_92383_cov_16.1266_ID_1432
[merge iterative_merge 1]       NODE_2_length_92383_cov_16.1266_ID_1432:    33430       50001   75814   92383   16572   16570   99.82   195489  92383   1       unitig_2 NODE_2_length_92383_cov_16.1266_ID_1432
[merge iterative_merge 1]       NODE_2_length_92383_cov_16.1266_ID_1432:    rejected - there is a longer hit to elsewhere

The matches must also have the correct orientation and not be too far from the input assembly contig ends. If this is not the case, then the log file will look like this:

[merge iterative_merge 1]       NODE_2_length_92383_cov_16.1266_ID_1432:    1   50006   49851   1       50006   49851   99.67   837058  92383   1       unitig_7 NODE_2_length_92383_cov_16.1266_ID_1432
[merge iterative_merge 1]       NODE_2_length_92383_cov_16.1266_ID_1432:    33430       50001   75814   92383   16572   16570   99.82   195489  92383   1       unitig_2 NODE_2_length_92383_cov_16.1266_ID_1432
[merge iterative_merge 1]       NODE_2_length_92383_cov_16.1266_ID_1432:    rejected - orientation/distance from ends not correct to make a merge

Note that there could be a longer hit to elswehere, and the orientation is bad, in which case both 'rejected' lines appear in the output:

[merge iterative_merge 1]       NODE_2_length_92383_cov_16.1266_ID_1432:    1   50006   49851   1       50006   49851   99.67   837058  92383   1       unitig_7 NODE_2_length_92383_cov_16.1266_ID_1432
[merge iterative_merge 1]       NODE_2_length_92383_cov_16.1266_ID_1432:    33430       50001   75814   92383   16572   16570   99.82   195489  92383   1       unitig_2 NODE_2_length_92383_cov_16.1266_ID_1432
[merge iterative_merge 1]       NODE_2_length_92383_cov_16.1266_ID_1432:    rejected - there is a longer hit to elsewhere
[merge iterative_merge 1]       NODE_2_length_92383_cov_16.1266_ID_1432:    rejected - orientation/distance from ends not correct to make a merge

Using ACT and changing parameters

It is usually easiest to visualise the results in ACT, in addition to using the information in the log file. If what looks like a plausible merge was missed by Circlator, then the reason should be clear when using ACT. For example, the two ACT screenshots below show a failed contig merge. SPAdes did not produce a contig that spanned the ends of unitig_1 and unitig_7 (in light brown at the top), with them assembled into separate contigs.

However, decreasing the value of --b2r_length_cutoff resulted in a more contiguous assembly (see ACT screenshots below), so that the SPAdes contig NODE_2 (in dark brown at the bottom) can be used to merge the input assembly contigs unitig_1 and unitig_7 (in light brown at the top).

If the merge failed because the nucmer matches were too far from the contig ends, then this would also be clear in ACT. In which case, increasing one (or both) of the options --merge_reassemble_end and --merge_ref_end would fix the problem.


Contig Circularization

Circlator writes a verbose log file containing details of the circularization process. This file is called 04.merge.circularise_details.log (if you ran circlator all, otherwise the filename has a different prefix if you ran circlator merge). Similarly to the merge stage, this file can be used in conjunction with ACT to find out why circularizations were or were not made. The script to run ACT is called

04.merge.circularise.start_act.sh

(it will have a different prefix if circlator merge was used instead of circlator all).

Potential pairs of nucmer matches

First, Circlator checks each input assembly contig for a pair of nucmer matches to a SPAdes reassembly contig, that can be used for circularization. The first section of the log file shows the details of this process. For example:

[merge circularise_details]     unitig_0 Checking 0 nucmer hits
[merge circularise_details]     unitig_1 Checking 3 nucmer hits
[merge circularise_details]     unitig_1 potential pair of nucmer hits for circularization:
[merge circularise_details]     unitig_1         1       50000   42532   92375   50000   49844   99.67   5323329 92375   1       unitig_1 NODE_1_length_92375_cov_16.2004_ID_1471
[merge circularise_details]     unitig_1         5286971 5323329 15341   51485   36359   36145   99.38   5323329 92375   1       unitig_1 NODE_1_length_92375_cov_16.2004_ID_1471
[merge circularise_details]     unitig_1 cannot use this pair because positions/orientations of matches no good
[merge circularise_details]     unitig_2 Checking 1 nucmer hits
[merge circularise_details]     unitig_3 Checking 1 nucmer hits
[merge circularise_details]     unitig_4 Checking 2 nucmer hits
[merge circularise_details]     unitig_4 potential pair of nucmer hits for circularization:
[merge circularise_details]     unitig_4         1       28924   59443   88367   28924   28925   99.97   96564   88367   1       unitig_4 NODE_2_length_88367_cov_17.6052_ID_1473
[merge circularise_details]     unitig_4         28221   96564   1       68346   68344   68346   99.98   96564   88367   1       unitig_4 NODE_2_length_88367_cov_17.6052_ID_1473
[merge circularise_details]     unitig_4 can use this pair of hits

In this example:

  • unitig_0 had no nucmer matches, so cannot be circularized using pairs of matches
  • unitig_1 had a pair of nucmer matches with positions that looked OK in the input contig unitig_1. However, the matches were rejected due to their positions and/or orientations in the reassembled contig (NODE_1).
  • unitig_2 and unitig_3 had one match each, so cannot be circularized using pairs of matches
  • unitig_4 has a pair of matches to NODE_2, which can be used to circularize unitig_4.

The above report only includes matches that are longer than the minimum length cutoff, defined by --merge_min_length_merge, which is 4000 by default.

Circular SPAdes contigs

Next, Circlator checks for any reassembly SPAdes contigs that SPAdes called as circular. This is reported in a line in the log file, like this:

[merge circularise_details]     SPAdes reassembly contigs that are circular: NODE_2_length_88367_cov_17.6052_ID_1473,NODE_4_length_267_cov_1.07333_ID_1477

In this example, NODE_2 and NODE_4 are circular.

Circularization

Finally, Circlator tries to circularize each of the input contigs in turn. The results are reported in the log file. For each input contig, Circlator tries to use a match to a SPAdes circular contig. If this is not possible, then it will use a suitable pair of nucmer matches (reported earlier), if such a pair was found.

Continuing the example from above, the output for unitig_4 looks as follows:

[merge circularise_details]     unitig_4 Trying to circularize. Has nucmer hits to check...
[merge circularise_details]     unitig_4 Checking hit:   1       28924   59443   88367   28924   28925   99.97   96564   88367   1       unitig_4 NODE_2_length_88367_cov_17.6052_ID_1473
[merge circularise_details]     unitig_4         Not using hit to call as circular (hit too short)

then several similar lines, all rejecting the nucmer matches to the circular SPAdes contig NODE_2. No suitable match is found to any circular SPAdes contigs, so the pair of matches reported earlier is used to circularize unitig_4:

[merge circularise_details]     unitig_4 No suitable matches to SPAdes circular contigs
[merge circularise_details]     unitig_4 Could not circularize using matches to SPAdes circular contigs
[merge circularise_details]     unitig_4 Circularizing using this pair of nucmer matches to SPAdes contig:
[merge circularise_details]     unitig_4         1       28924   59443   88367   28924   28925   99.97   96564   88367   1       unitig_4 NODE_2_length_88367_cov_17.6052_ID_1473
[merge circularise_details]     unitig_4         28221   96564   1       68346   68344   68346   99.98   96564   88367   1       unitig_4 NODE_2_length_88367_cov_17.6052_ID_1473
[merge circularise_details]     unitig_4 Circularized: yes

However, unitig_0 could not be circularized:

[merge circularise_details]     unitig_0 Trying to circularize. Has nucmer hits to check...
[merge circularise_details]     unitig_0 Checking hit:   15103   16439   85426   86762   1337    1337    100.00  27823   88367   1       unitig_0 NODE_2_length_88367_cov_17.6052_ID_1473
[merge circularise_details]     unitig_0         Not using hit to call as circular (hit too short)
[merge circularise_details]     unitig_0 No suitable matches to SPAdes circular contigs
[merge circularise_details]     unitig_0 Could not circularize using matches to SPAdes circular contigs
[merge circularise_details]     unitig_0 Cannot circularize: no suitable nucmer hits
[merge circularise_details]     unitig_0 Circularized: no

Note that there is a nucmer match to be checked, despite 0 hits reported earlier, because the earlier report only includes matches above a minimum length (default 4000bp). In this part of the pipeline, all matches are considered (so as not to exclude plasmids shorter than 4000bp)

Finally, here is an example where a SPAdes circular contig (NODE_9) is used to circularize an input contig (unitig_6). In this case, unitig_6 is multiple tandem copies of a plasmid, which SPAdes correctly assembled into one copy (NODE_9). This is the situation viewed in ACT:

The report in the log file begins like this:

[merge circularise_details]     unitig_6 Trying to circularize. Has nucmer hits to check...
[merge circularise_details]     unitig_6 Checking hit:   1       4528    4527    1       4528    4527    99.98   18939   4556    1       unitig_6 NODE_9_length_4556_cov_19.9306_ID_2485
[merge circularise_details]     unitig_6         Hit is long enough. Percent of contig covered by hit is 99.36347673397718

unitig_6 has a match to NODE_9 that covers 99.36 percent of NODE_9 (4527 of 4556 bp). Next, Circlator checks if enough of unitig_6 is covered by one or more nucmer matches to NODE_9:

[merge circularise_details]     unitig_6         reference bases covered by spades contig:(0,18938)
[merge circularise_details]     unitig_6            ... which is 100.0 percent of 18939 bases
[merge circularise_details]     unitig_6         Using hit to call as circular (enough bases covered)
[merge circularise_details]     unitig_6 Circularized using matches to SPAdes circular contigs
[merge circularise_details]     unitig_6 Circularized: yes

and we see that unitig_6 was circularized.