Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
In Bandage, a path is a means of specifying a sequence which extends through multiple nodes. You can use paths to extract sequences, and Bandage also uses paths to describe the location of BLAST queries (see BLAST searches).
Note the following: * The node names must be exact and end with a '+' or '-' (see single vs double node style). * Node positions use a 1-based index. I.e. position 1 is the first base in a node's sequence and the position of a node's last base is equal to the length of its sequence. * A path is only valid if the necessary edges exist in the graph to connect the sequences in the specified order.
- The entirety of node 9+, followed by the entirety of node 12-
(51) 9+, 12-
- From position 51 to the end of node 9+, followed by the entirety of node 12-
(51) 9+, 12- (87)
- From position 51 to the end of node 9+, followed by the first 87 bases of node 12-
9+, 12-, 8+, 12-, 3-
- This path contains a loop and includes the sequence for node 12- twice.
Exporting path sequences
In Bandage, you can easily output path sequences for selected nodes.
If the selected nodes form an unambiguous path, then you copy the sequence to clipboard or save it to file using the options in Bandage's 'Output' menu.
The resulting path sequence will contain the entirety of the constituent nodes.
If you wish to export the sequence for a more complex path (containing loops, start/end positions, etc.), the above approach will not work. Instead, you must select 'Specify exact path for copy/save' from the 'Output' menu.
This will open a new window where you can define a path using the syntax described above. As a shortcut, you can double-click on a node in the visualisation to add it to the path. Bandage will show your specified path by shading it in the visualisation.
Since it is necessary to specify exact node names, it may be helpful to first draw the graph in double node style (see single vs double node style).
In graphs made by some assemblers, nodes connected by an edge have overlapping sequences (see assembler differences). If present, Bandage will remove this overlap when creating a path sequence. Therefore, a path sequence may be shorter than the sequences of its constituent nodes.
In the 'Specify exact path' window, there is a tick box for 'Circular path'. A circular path forms a loop where the sequence at the end directly leads into the sequence at the beginning. This is useful for extracting circular sequences from an assembly graph, such as bacterial chromosomes or plasmids. Circular paths, by definition, include the entirety of their constituent nodes and therefore cannot have start/end positions.
The difference between a circular path and a linear path that contain the same nodes is subtle, and the distinction only really matters for graphs where the nodes sequences overlap (see assembler differences).
Consider two nodes which make a loop in the graph and therefore have overlaps on both ends. If you make a linear path from the two nodes, the overlap will be removed in the middle, but the start will still overlap with the end:
In contrast, if a circular path is made with the same two nodes, then the overlap between start and end will also be removed, resulting in a sequence that forms a perfect loop: