Skip to content
Simon Penel edited this page May 27, 2024 · 145 revisions

logo


Feel free to get in touch if you find a bug or if you have a suggestion, a question or a request


Introduction

The history of a species is closely related to the history of its genes. Connecting the evolution of a genome to the evolution of its genes is a way to describe this relationship. In this context, reconciliation of the genes with the species consists into mapping the nodes of a gene tree and the associated events (speciation, duplication, loss, tranfer) to the nodes of the species tree. Reconciliation can as well be used to map the history of a parasite with the history of a host, or to map the history of a protein domain with the history of a sequence.

In the remaining of this document, we will adopt gene/species (for two levels) and genes/symbiont/host (for three levels) as a vocabulary, keeping in mind that all what we present is generic reconciliations

Reconciliation is a complex task and many programs are dedicated to it. Recently the XML format recPhyloXML (Duchemin et al. 2018), inspired from the phyloXML format (Han and Zmasek, 2009), has been proposed as a standard to describe phylogenetic reconciliations.

Visualisation of phylogenetic reconciliations are proposed by various programs and interfaces as NOTUNG (Chen et al., 2000), SylvX (Chevenet et al., 2016), Treerecs (Comte et al., 2020), Jane (Conow et al., 2010), eMPRess (Santichaivekin et al., 2021) and Capybara (Wang et al., 2020). However at the exception of SylvX, all are integrated in a specific reconciliation program and cannot visualise reconciliations produced by others. None of these software is handling RecPhyloXML input files, and none of them is generic to any kind of reconciliation (for example SylvX does not allow temporary free living symbionts, as it is not allowed for genes to live outside a genome) nor can handle multiple horizontal transfer (i.e. several genes transfered with the same donor and recipient) and the consideration of numerous possible scenarios. DoubleRecViz (Kuitche et al., 2021) uses a derived version of recPhyloXML, adding a transcript level to gene and species format but without support for horizontal transfers.

Eventually there is no software able to combine two nested reconciliations i.e. to get in a single representation the gene/symbiont reconciliation and the symbiont/host reconciliation.

Here we present Thirdkind a very simple command-line program allowing the user to easily generate graphical output (svg) from one or several recphyloXML files with a large choice of options (as for example orientation, police size, branch length, multiple gene trees, multiples species trees, multiple files, redundant transfers handling, etc.) and to handle the display of 2 nested reconciliations.

Methods

Were are using recphyloXML a format which has been recently proposed to describe reconciliation between a gene (or a symbiont or a domain) and a species (or a host or a sequence). Thirdkind is written in Rust and is thus very easy to install. Thirdkind use a Rust API we developed to handle phylogenetic trees: light_phylogeny. This API may be used to write Rust codes dedicated to read newick, phyloXML and recPhyloXML files, to build, modify and to display phylogenetic trees.

Results

The program Thirdkind is available at the Rust community’s crate registry: https://crates.io/crates/thirdkind

thirdkind at crates.io

Code sources and input file examples are available here: https://github.com/simonpenel/thirdkind

Graphical interface

A web sever dedicated to Thirdkind is available here: http://thirdkind.univ-lyon1.fr/ It focuses on recPhyloXML files and reconciliations.

Installation

To install Thirdkind, you need to install cargo. For Linux and MacOS sytems type:

curl https://sh.rustup.rs -sSf | sh

For Windows see: https://doc.rust-lang.org/cargo/getting-started/installation.html

Note: Since Rust does not include its own linker yet, building thirdkind needs to have a C compiler like gcc installed to act as the linker . If it is note the case, install essential build needed by Rust:

sudo apt install build-essential

Once Cargo is installed, just open a new terminal and then type:

cargo install thirdkind

To check that Thirdkind is installed type:

thirdkind

Alternatively it is possible to install Thirdkind from the sources available here with the command cargo build

Contact

https://lbbe.univ-lyon1.fr/fr/annuaire-des-membres/penel-simon

Citation

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac062/6525213

Usage

Usage: thirdkind [OPTIONS] --input-file <INPUT_FILE>

Options:
  -a, --output-transfer-analysis
          Display transfers analysis
  -A, --starting-node <STARTING_NODE>
          Display transfers starting from this node only
  -b, --browser
          Open svg in browser
  -B, --display-br-length
          With option -l, display branch length
  -c, --conf-file <CONF_FILE>
          Use configuration file
  -C, --gene-colors <GENE_COLORS>
          Define colors for gene trees. For example: "red,violet,#4A38C4,orange
  -d, --gene-fontsize <GENE_FONTSIZE>
          Set font size for gene trees
  -D, --species-fontsize <SPECIES_FONTSIZE>
          Set font size for species trees
  -e, --free-living-sup
          "free living" option : nodes associated to FREE_LIVING are drawned in an external tree and superposed in case of multiple genes
  -E, --free-living-shi
          "free living" option : nodes associated to FREE_LIVING are drawned in an external tree and shifted in case of multiple genes
  -f, --input-file <INPUT_FILE>
          Input tree file (accepted format: newick, phyloXML, recPhyloXML)
  -F, --format <FORMAT>
          Force format phyloXML/recPhyloXML
  -g, --nested <NESTED>
      1st level input file (for example a gene-symbiote file with -f defining a 2nd level symbiote-host file)
  -G, --gene-phylo <GENE_PHYLO>
          Display the gene number <GENE_PHYLO> in phyloxml style (no species tree)
  -H, --height <HEIGHT>
          Height:  multiply the tree height by factor <HEIGHT>
  -i, --internal-gene-node
          Display internal gene node names
  -I, --internal-species-node
          Display internal species node names
  -J, --display-transfers-abundance
          With option -t, display the abundance of redudant transfers
  -k, --symbol-size <SYMBOL_SIZE>
          Size of the circles, crosses, squares, etc
  -K, --bezier <BEZIER>
          Bezier parameter: curvature of the transfers and branches leading to free living organisms
  -l, --branch-length <BRANCH_LENGTH>
          Use branch length, multiplied by the given factor
  -L, --landscape
          Display as landscape
  -m, --multiple
          The input file (-f) is a list of recphyloxml files
  -M, --midway
          Display duplication node at midway in the branch
  -n, --gene-tree-list <GENE_TREE_LIST>
          List of the indexes of the gene trees to be displayed. For example: 1,2,6,9
  -N, --ending-node <ENDING_NODE>
          Display transfers ending to this node only
  -o, --output <OUTPUT>
          Set the name of the output file or the prefix of the output files
  -O, --optimise
          Switching nodes in order to minimise transfer crossings (under development)
  -p, --uniform
          Species tree uniformisation. All the branches of species have the same width
  -q, --node-colors <NODE_COLORS>
          Nodes to be coloured : the descendants of each nodes will be drawn with a different colour. For example: "m3,m25,m36"
  -Q, --background <BACKGROUND>
          Background colour
  -r, --ratio <RATIO>
          Set the ratio between width of species and gene tree. Default is 1.0, you usualy do not need to change it
  -s, --species-only
          Display species tree only in phyloxml style
  -S, --node-support
          Display node support
  -t, --threshold <THRESHOLD>
          Redudant transfers are displayed as one, with opacity according to abundance and only if abundance is higher than <THRESHOLD>.
          Only one gene is displayed
  -T, --threshold-select <THRESHOLD_SELECT>
          With option -t, select the index of the gene to display. If set to 0, no gene is displayed
  -u, --threshold-nested <THRESHOLD_NESTED>
          With -g, same as -t, but apply to the '-f' input file, and -t will apply to the '-g' file
  -U, --threshold-nested-select <THRESHOLD_NESTED_SELECT>
          Same as -T with -t, but for -u
  -v, --verbose
          Verbose mode
  -W, --width <WIDTH>
          Width:  multiply the tree height by factor <WIDTH>
  -x, --tidy
          Tidy mode (non-layered tidy tree layout)
  -X, --tidy-clean
          Tidy mode, avoiding leave names superposition
  -z, --gene-thickness <GENE_THICKNESS>
          Thickness of the gene tree
  -Z, --species-thickness <SPECIES_THICKNESS>
          Thickness of the species tree
  -h, --help
          Print help
  -V, --version
          Print version

Police size and colour, symbol size, tree colour and thickness, etc.

It is possible to specify police and symbol sizes with the -d, -D and -k optional arguments. Thickness of trees can be defined with the -z and -Z optional arguments. Curvature of Bezier curves can be defined with the -K optional argument.

Default colours, opacities and Bezier parameters can be defined by default in a configuration file (default is config_default.txt). Default police sizes are defined in the configuration file too.

Input files

Although Thirdkind is mainly dedicated to recPhyloXML files, it can handle several types of file and format:

  • One newick file.
  • One phyloXML file.
  • One recPhyloXML file
  • One file describing a set of recPhyloXML files
  • Two “nested” recPhyloXML files

Output files

Output are svg files which can be visualised with any web browser. You may need to define the default program associated to these files, which can usually be done with a right clik on the file.

Examples of use

All the examples given in the following chapters are available here : wiki examples, recphyloxml examples, paramecium generax examples, phyloxml examples and newick examples or in the thirdkind directory if you have cloned/downloaded the repository (https://github.com/simonpenel/thirdkind).

Warning: if you are using your browser to download examples, note that XML files should be saved as raw text and not html!

newick and phyloXML files

Newick is a parenthesed format used for phylogenetic trees. Thirdkind will handle a tree stored in newick format only if the tree is rooted, and NHX tags will not be considered. When using a Newick file, the svg output file will display a single tree. PhyloXML is a xml format dedicated to phylogenetic trees allowing to describe evolution events. The svg output file will display a single tree, with a symbol at each node for each evolution event (a circle for a speciation, a square for a duplication, a cross for a loss, a diamond for transfer) and the branch between the 2 nodes involved a transfer will be spotted lines. This style will be called “phyloxml svg style” throughout this document.

phyloxml svg style:

thirdkind -f xml_examples/FAM036542_gene.xml -b -F phyloxml -k 15

Figure 1

recPhyloXML files

RecPhyloXML is a xml format inspired from phyloXML dedicated to reconciled phylogenetic trees. A recPhlyloXML file contains at least one species tree and one reconciled gene tree mapped to (one of) the species tree(s). In recPhyloXML, a clade (i.e. a node or leaf in the tree) presents several tags, among which a name, a location, a type of event, etc. Each node of the gene tree(s) should present a “location” tag, the value of which should be the same than the value of the “name” tag of one of the clades in the species tree(s). It is possible to have multiple gene trees, and multiples species tree in a single file. The svg consists of one ore several reconciled gene trees mapped inside one or several species tree. In this paper the trees which are mapped (here the gene trees) will be called 'lower' trees and the trees on which the 'lower' trees are mapped (here the species trees) will be called 'upper' trees. The svg output file will this display the specie tree(s) as 'upper trees' containing the 'lower' gene trees with symbols at their nodes. Duplication nodes are represented as squares, speciation nodes as circle and losses as crosses. Leaves are red squares. The transfers are bezier spotted lines ending with an arrow. If there is more than 1 gene trees, the 'lower' trees will have different colours. This style will be called “recphyloxml svg style” throughout this document.

recphyloxml svg style:

thirdkind -f recphylo_examples/FAM001051_FAM000799_reconciliated.xml -k 8 -b

Figure 2

selecting gene trees and choosing colours

You can choose colours for gene trees and/or select the gene trees to be displayed: for example select genes 2 and 3 with colours #17387A and #8E3B8B

thirdkind -f recphylo_examples/recgs_dtl.recphyloxml -b -n 2,3 -C "#17387A,#8E3B8B" -k8

Figure 2bis

You can choose colours for parts of gene trees and/or select the gene trees to be displayed and/or choosing colours: for example select gene 3 with colours #17387A, #8E3B8B and green to highlight duplication leading to nodes named "5" and "6":

thirdkind -f recphylo_examples/recgs_dtl.recphyloxml -b -n 3 -C "#17387A,#8E3B8B,green" -k8 -q "5,6"

Figure 2ter

You can choose the background colour and the thickness of the species tree

thirdkind -f recphylo_examples/recgs_dtl.recphyloxml -b -n 3 -C "#17387A,#8E3B8B,green" -k12 -q "5,6" -Q "black" -Z 1

Figure 2quarto

landscape view

thirdkind -f example_wiki/recgs_dtl.recphyloxml -b -L Figure 3bis

multiple 'upper' (spTree) trees

thirdkind -f recphylo_examples/ex7comp.recphyloxml -b

Figure 2bis

real length branchs: (option -l)

Use real length branch with a factor given by -l

thirdkind -f xml_examples/apaf.xml -b -F phyloxml -l 5 -W 0.5

Branch length in the svg are the real branch lengths of the phyloxml tree multplied by 5: Figure 2ter

thirdkind -f recphylo_examples/hote_parasite_page4_BL.recphylo -b -l 1

Branch lengths of the pipe tree in the svg are the real branch lengths of the 'upper' tree in recphyloxml multiplied by 1

Figure 2quatro

use compressed ("tidy") mode: (option -x/-X)

The "tidy" mode allows to use a non-layered tidy tree layout, as described here https://onlinelibrary.wiley.com/doi/10.1002/spe.2213 (van der Ploeg, 2014) and here 10.1093/molbev/msac204 (Penel and de Vienne, 2022).

The [-x/-X] option will compress the tree following a non-layered tidy tree layout, and divide by 2 the space between nodes in order to increase the compression:

thirdkind -f newick_examples/virus.nhx -l 4 -b -X

You can visualise the specific effect of non-layered tidy tree compression by comparing with the tree obtained without the [-x/-X] option and a scaling of 0.5

thirdkind -f newick_examples/virus.nhx -l 4 -b -W 0.5

Figure 2cinquoop

The -x option compress the tree whatever the length of leave names, the -X option takes the length of the leave names in order to avoid the superposition of the names.

In a "recphyloxml svg style" context, only the -x option is available.

position of duplication nodes in upper tree (option -M)

By convention, duplication nodes of lower tree of are located into the associated node into upper tree. The option -M allows to locate the duplicated node in the middle of the branch leading to the associated node.

thirdkind -f recphylo_examples/example_dupli.recphylo -b -H 3

default position

thirdkind -f recphylo_examples/example_dupli.recphylo -b -H 3 -M

middle of branch

multiple recPhyloXML files (option -m)

It is possible to use a list of recPhyloXML files instead a single recPhyloXML file. This will give the same results as a single file with species trees and gene trees of the first file and all the gene trees of the other files. This option is useful to handle large sets of reconciliations, in combination with -t option. Thirdkind is able to handle more than 10,000 gene families.

dealing with redundant transfers (options -t , -T and -J)

In case of multiples gene histories, it may be interesting to focus on the gene transfers, especially on redundant transfers. Typically it is useful to enlighten frequent transfers. Option -t will draw only 1 gene history and will draw in red all the transfers according to their abundance, i.e. the number of times the transfer is present in the gene histories: only the transfers with a abundance higher that the threshold given by option -t will be drawn, and the opacity of the transfer reflects its abundance. The option -T allows to choose the gene to display. The option -J will display the abundance of the transfer. This may be useful to deal with Generax output for example.

thirdkind -f paramecium_data/liste.txt -t 1 -m -b

Transfer redundancy in 1000 gene histories: Figure 3

thirdkind -f paramecium_data/liste.txt -t 25 -m -b -J

Transfer redundancy in 1000 gene histories, display only transfers with an abundance higher than 25: Figure 4

dealing with ‘free living’ symbionts (options -e or -E )

When reconciliating a symbiont with its host, it may happened that a part of the symbiont tree is not mapped with the host tree. For example if in the history of an organism, some taxon may be free living species and some taxon ma have evolved to be a symbiont of a host. In this case, free living organism should have a “Location” tag indicating “FREE LIVING” instead the name of a host. Thirdkind will draw the free living part of the symbiont 'lower' tree outside the host 'upper' tree.

thirdkind -f example_wiki/free_living_reconciliated.recphylo -e -b

Free living organisms: Figure 5

When there are several symbiont histories the -e option will superpose the free-living parts and the option -E will separate them.

modify the curvature of transfers and branches leading to free living organisms (option -K )

thirdkind -f example_wiki/free_living_reconciliated.recphylo -e -K 4 -b Figure 5bis

thirdkind -f example_wiki/free_living_reconciliated.recphylo -e -K 8 -b

Figure 5bis

nested recPhyloXML files (options -g and -f )

It is possible to combine 2 reconciliations as for example a gene/species reconciliation and a symbiont/host reconcilitaion, in which the symbiont of the second reconciliation is the species of the first one. This is done with the option -g which indicate the gene/species file, -f indicating the symbiont/host file. The software will generate several svg files: one “recphyloxml svg style” for each of the two input, a “phyloml svg style” of the reconciled symbiont tree (from -f file), a “phyloml svg style” of each reconciled gene trees (from -g file), a simple tree of the host, and 3 “mapped” svg files describing the gene/symbiont/host reconciliation.

The first “mapped” svg file is a modified version of the recphyloxml style svg of the gene/symbiont reconiciliation: the 'upper' tree of the symbiont presents features describing its reconciliation with the host: a big square for a duplication node, an additional branch coloured in black for a loss and the the segments between the start and end of a transfer are coloured in green.

The second “mapped” svg file is a modified version of the recphyloxml style svg of the symbiont/host reconciliation in which gene transfers are mapped to the host nodes and displayed in red: For example if there is a gene transfer between the symbiont “C” present in host “3” and the symbiont “E” present in host “4” you will get a red bezier path between node “3” and “4” in the 'upper' host tree.

The third “mapped” is a mapping of the gene trees over the host tree through the symbiont: For example if genes “B1” and “B2” are associated to the symbiont “B”, and the symbiont B is associated to host “4”, the genes B1, B2 are associated to host “4” in the svg. If a gene is transferred between hosts via a symbiont transfer, the transfer start with a yellow diamond and the stippling is different. A gene transfer across symbionts which is not affected by a transfer of the symbiont across hosts is displayed as a classic gene transfer.

thirdkind -f example_wiki/publi/parasite_hote.recphylo -g example_wiki/publi/gene_parasite.recphylo -e -b

Figure3


UPDATE IN VERSION 3.3.6: Now the colours are different : violet for host upper tree and pink for symbiot phyloxml style.


Figure3bis

Real example:

thirdkind -f example_wiki/thirdlevel/hote_parasite_page2.recphylo -g example_wiki/thirdlevel/gene_parasite_page2.recphylo -b

(Origin of data : https://doi.org/10.1038/s41396-019-0533-6)

Gene-Symbiont Reconciliation: Gene-Symbiont Reconciliation

Symbiont-Host Reconciliation: Symbiont-Host Reconciliation

Reconcilied Symbiont in phyloxml svg style:

Reconcilied Symbiont simple view

Mapped 1: Mapped 1

Mapped 2: Mapped 2

Mapped 3: Mapped 3

Another real example: multiple symbionts

thirdkind -f example_wiki/multi_symbiotes/rechp_dtl.recphyloxml -g example_wiki/multi_symbiotes/recgs_mult_host_dtl.recphyloxml -b

Mapped 1:

Mapped 1

Mapped 2:

Mapped 2

Mapped 3:

Mapped 3

Other Options and Default Configuration

The software has many options described in the help message. Among them, in addition of the previously described options, the following are the most useful:

  • -c configfile: use a configuration file
  • -d fontsize: set font size for gene trees
  • -D fontsize: set font size for species trees
  • -k size: set the size of event symbols (crosses, circles, squares, etc.)
  • -G : (recphyloXML format only) draw only the gene #n in phyloxml svg style
  • -i : display internal gene nodes
  • -I : display internal species nodes
  • -p : build a phylogram
  • -P : (recphyloXML format only) 'upper' species tree uniformisation
  • -s : (recphyloXML format only) drawing only the species tree

It is possible to configure some default features with a configuration file.

thirdkind -f recphylo_examples/FAM000600_reconciliated_big.recphylo -c my_config.txt -b

Contents of the default configuration file: config_default.txt

Converting files into recPhyloXML format

The XML format “recPhyloXML” has been proposed as a standard to describe phylogenetic reconciliations and is now produced directly or via translation scripts by a majority of reconciliation software. Translation scripts are available here: https://github.com/WandrilleD/recPhyloXML

Execution time

Thirdkind was able to process 5,000 reconciled trees of 50 nodes in 2 seconds and to process a tree of 7,000 nodes in 1 second.