Graph paths

Ryan Wick edited this page Sep 15, 2015 · 11 revisions

In Bandage, a path is a means of specifying a sequence which extends through multiple nodes. You can use paths to extract sequences, and Bandage also uses paths to describe the location of BLAST queries (see BLAST searches).

Syntax

Path syntax

Note the following: * The node names must be exact and end with a '+' or '-' (see single vs double node style). * Node positions use a 1-based index. I.e. position 1 is the first base in a node's sequence and the position of a node's last base is equal to the length of its sequence. * A path is only valid if the necessary edges exist in the graph to connect the sequences in the specified order.

Examples:

  • 9+, 12-
    • The entirety of node 9+, followed by the entirety of node 12-
  • (51) 9+, 12-
    • From position 51 to the end of node 9+, followed by the entirety of node 12-
  • (51) 9+, 12- (87)
    • From position 51 to the end of node 9+, followed by the first 87 bases of node 12-
  • 9+, 12-, 8+, 12-, 3-
    • This path contains a loop and includes the sequence for node 12- twice.

Exporting path sequences

Simple paths

In Bandage, you can easily output path sequences for selected nodes.

Unambiguous path selection

If the selected nodes form an unambiguous path, then you copy the sequence to clipboard or save it to file using the options in Bandage's 'Output' menu.

Copy/save node path sequence

The resulting path sequence will contain the entirety of the constituent nodes.
Complex paths

If you wish to export the sequence for a more complex path (containing loops, start/end positions, etc.), the above approach will not work. Instead, you must select 'Specify exact path for copy/save' from the 'Output' menu.

Specify exact path

This will open a new window where you can define a path using the syntax described above. As a shortcut, you can double-click on a node in the visualisation to add it to the path. Bandage will show your specified path by shading it in the visualisation.

Complex path

Since it is necessary to specify exact node names, it may be helpful to first draw the graph in double node style (see single vs double node style).
Overlaps

In graphs made by some assemblers, nodes connected by an edge have overlapping sequences (see assembler differences). If present, Bandage will remove this overlap when creating a path sequence. Therefore, a path sequence may be shorter than the sequences of its constituent nodes.

Path overlap

Circular paths

In the 'Specify exact path' window, there is a tick box for 'Circular path'. A circular path forms a loop where the sequence at the end directly leads into the sequence at the beginning. This is useful for extracting circular sequences from an assembly graph, such as bacterial chromosomes or plasmids. Circular paths, by definition, include the entirety of their constituent nodes and therefore cannot have start/end positions.

Circular path tick box

The difference between a circular path and a linear path that contain the same nodes is subtle, and the distinction only really matters for graphs where the nodes sequences overlap (see assembler differences).

Consider two nodes which make a loop in the graph and therefore have overlaps on both ends. If you make a linear path from the two nodes, the overlap will be removed in the middle, but the start will still overlap with the end:

Circular path 1

In contrast, if a circular path is made with the same two nodes, then the overlap between start and end will also be removed, resulting in a sequence that forms a perfect loop:

Circular path 2

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.