SCORPiOs is a synteny-guided gene tree correction pipeline for clades that have undergone a whole-genome duplication event. SCORPiOs identifies gene trees where the whole-genome duplication is missing or incorrectly placed, based on the genomic locations of the duplicated genes across the different species. SCORPiOs then builds an optimized gene tree consistent with the known WGD event, the species tree, local synteny context, as well as gene sequence evolution.
SCORPiOs is implemented as a Snakemake pipeline. SCORPiOs takes as input either gene trees or multiple alignments, and outputs the corresponding optimized gene trees.
To learn how to use SCORPiOs, take a look at SCORPiOs documentation!
If you use SCORPiOs, please cite:
Parey E, Louis A, Cabau C, Guiguen Y, Roest Crollius H, Berthelot C, Synteny-guided resolution of gene trees clarifies the functional impact of whole genome duplications, Molecular Biology and Evolution, msaa149, https://doi.org/10.1093/molbev/msaa149.
Below is a quick start guide to using SCORPiOs, we recommend reading SCORPiOs documentation for detailed instructions.
The Miniconda3 package management system manages all SCORPiOs dependencies, including python packages and other software.
To install Miniconda3:
-
Download Miniconda3 installer for your system here
-
Run the installation script:
bash Miniconda3-latest-Linux-x86_64.sh
orbash Miniconda3-latest-MacOSX-x86_64.sh
, and accept the defaults -
Open a new terminal, run
conda update conda
and pressy
to confirm updates
-
Clone the repository and go to SCORPiOs root folder
git clone https://github.com/DyogenIBENS/SCORPIOS.git cd SCORPIOS
-
Create the main conda environment.
We recommend using Mamba for a faster installation:
conda install -c conda-forge mamba mamba env create -f envs/scorpios.yaml
Alternatively, you can use conda directly (solving dependencies may take a while, ~ up to an hour):
conda env create -f envs/scorpios.yaml
Before any SCORPiOs run, you should:
- go to SCORPiOs root folder,
- activate the conda environment with
conda activate scorpios
.
Before using SCORPiOs on your data, we recommend running a test with our example data to ensure that installation was successful and to get familiar with the pipeline, inputs and outputs.
SCORPiOs uses a YAML configuration file to specify inputs and parameters for each run. An example configuration file is provided: config_example.yaml. This configuration file executes SCORPiOs on toy example data located in data/example/, that you can use as reference for input formats.
The only required snakemake arguments to run SCORPiOs are --configfile
and the --use-conda
flag. Optionally, you can specify the number of threads via the --cores
option. For more advanced options, you can look at the Snakemake documentation.
To run SCORPiOs on example data:
snakemake --configfile config_example.yaml --use-conda --cores 4
The following output should be generated:
SCORPiOs_example/SCORPiOs_output_0.nhx
.
SCORPiOs can run in iterative mode: SCORPiOs improves the gene trees a first time, and then uses the corrected set of gene trees again as input for a new correction run, until convergence. Correcting gene trees improves orthologies accuracy, which in turn makes synteny conservation patterns more informative, improving the gene tree reconstructions after successive runs. Usually, a small number of iterations (2-3) suffice to reach convergence.
To run SCORPiOs in iterative mode on example data, execute the wrapper bash script iterate_scorpios.sh
:
bash iterate_scorpios.sh --snake_args="--configfile config_example.yaml"
Command-line arguments:
Required
--j=JOBNAME, specifies scorpios jobname, should be the same as in config
--snake_args="SNAKEMAKE ARGUMENTS", should at minimum contain --configfile
Optional
--max_iter=MAXITER, maximum number of iterations to run, default=5.
--min_corr=MINCORR, minimum number of corrected sub-trees to continue to the next iteration, default=1.
--starting_iter=ITER, starting iteration, to resume a run at a given iteration, default=1.
The following output should be generated: SCORPiOs_example/SCORPiOs_output_2_with_tags.nhx
.
To run SCORPiOs on your data, you have to create a new configuration file for your SCORPiOs run. You will need to format your input data adequately and write your configuration file, using the provided example config_example.yaml as a guide.
- Copy the example config file
cp config_example.yaml config.yaml
- Open and edit
config.yaml
to specify paths, files and parameters for your data
To check your configuration, you can execute a dry-run with -n
.
snakemake --configfile config.yaml --use-conda -n
Finally, you can run SCORPiOs as described above:
snakemake --configfile config.yaml --use-conda
or in iterative mode:
bash iterate_scorpios.sh --snake_args="--configfile config.yaml"
- Elise Parey
- Alexandra Louis
- Hugues Roest Crollius
- Camille Berthelot
This code may be freely distributed and modified under the terms of the GNU General Public License version 3 (GPL v3)
SCORPiOs uses the following tools to build and test gene trees:
-
ProfileNJ: Noutahi et al. (2016) Efficient Gene Tree Correction Guided by Genome Evolution. PLOS ONE, 11, e0159559.
-
RAxML: Stamatakis (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30, 1312–1313.
-
PhyML: Guindon et al. (2010) New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Syst Biol, 59, 307–321.
-
TreeBeST: Vilella et al. (2009) EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res., 19, 327–335.
-
CONSEL: Shimodaira and Hasegawa (2001) CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics, 17, 1246–1247.