Skip to content

Subcommand: graft

Lucas Czech edited this page Sep 26, 2022 · 15 revisions

Make a tree with each of the query sequences represented as a pendant edge.

Usage: gappa examine graft [options]

Options

Input
--jplace-path Required. TEXT:PATH(existing)=[] ...
List of jplace files or directories to process. For directories, only files with the extension .jplace[.gz] are processed.
Settings
--fully-resolve FLAG
If set, branches that contain multiple pqueries are resolved by creating a new branch for each of the pqueries individually, placed according to their distal/proximal lengths. If not set (default), all pqueries at one branch are collected in a subtree that branches off from the branch.
--name-prefix TEXT
Specify a prefix to be added to all new leaf nodes, i.e., to the query sequence names.
Output
--out-dir TEXT=.
Directory to write output files to.
--file-prefix TEXT
File prefix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data.
--file-suffix TEXT
File suffix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data.
Newick Tree Output
--newick-tree-quote-invalid-chars FLAG
If set, node labels that contain characters that are invalid in the Newick format (i.e., spaces and :;()[],{}) are put into quotation marks. If not set (default), these characters are instead replaced by underscores, which changes the names, but works better with most downstream tools.
Global Options
--allow-file-overwriting FLAG
Allow to overwrite existing output files instead of aborting the command.
--verbose FLAG
Produce more verbose output.
--threads UINT
Number of threads to use for calculations.
--log-file TEXT
Write all output to a log file, in addition to standard output to the terminal.

Description

The command takes the reference tree of the provided placefile(s), and for each pquery, it attaches a new leaf node to the tree, positioned according to its proximal length and pendant length of the most likely placement. The resulting tree is useful to get an overview of the distribution of placements. It is mainly intended to view a few placements. For large samples, it might be a bit cluttered.

Similar trees are produced by RAxML-EPA, where the file is called RAxML_labelledTree, and by the guppy tog command. Both programs differ in the exact way the the placements are added as edges. To control this behaviour, use the --fully-resolve parameter.

Details

The provided jplace files are processed individually, producing a newick tree for each of them. They are named like the input files, but replace the file extension by .newick.

Important remark: Note that the grafting simply attaches the pqueries to the tree at their most likely placement position. The phylogeny of the pquries itself however is not resolved at all.

Without --fully-resolve

If --fully-resolve is not provided (default), all placements at one edge are collected as children of one central base edge:

Multifurcating grafted tree.

This method is similar to the way RAxML-EPA produces a grafted tree, which is there called "labelled tree".

The base edge is positioned on the original edge at the average proximal_length of the placements. The base edge has a multifurcation if there are more than two placements on the edge.

The pendant length of the placements is used to calculate the branch length of the new placement edges. This calculation subtracts the shortest pendant length of the placements on the edge, so that the base edge is maximally "moved" towards the placement edges. This also implies that at least one of the placement edges has branch length == 0.0. Furthermore, the placements are sorted by their pendant length.

Using this method, the new nodes of the resulting tree are easier to distinguish and collapse, as all placements are collected as children branching off from the base edge. However, this comes at the cost of losing the detailled information of the proximal length of the placements. If you want to keep this information, use --fully-resolve instead.

With --fully-resolve

If --fully-resolve is provided, all placements per branch are turned into individual single leaf nodes:

Fully resolved grafted tree.

This method is similar to the way guppy tog produces a grafted tree.

The original edge is split into separate parts where each placement edge is attached. The branch lengths between those parts are calculated using the proximal length of the placements, while the branch lengths of the placement edges use their pendant length.

Using this method gives the most detailled information, but results in a more crowded tree. The new placement edges are "sorted" along the original edge by their proximal length. For this reason in the example image above, "Query 2" is closer to "Node A" then "Query 1": it has a higher proximal length. This information was lost in the multifurcating tree shown before (without --fully-resolve).

Further Details

For edges that contain only a single placement (or none at all), both versions (with and without --fully-resolve) behave the same. In this case, the placement is simply attached using its proximal length and pendant length.

Pqueries with multiple names are treated as if each name is a separate placement, i.e., for each of them, a new (identical) edge is added to the Tree. If using --fully-resolve, this results in a branch length of 0.0 between the nodes of those placements.

--name-prefix

Specify a prefix to be added to all new leaf nodes (the ones that represent placements). This is useful if a pquery name also occurs as a name in the original tree. By default, empty. In order to get the same naming as grafted trees as produced by RAxML, use --name-prefix "QUERY___".

Citation

When using this method, please do not forget to cite

Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070

Clone this wiki locally