Splicer scales-up phylogenetic placement with EPA-ng and pplacer, and makes it applicable to datasets with millions of reference sequences. Splicer performs placement in sub-linear time using a decomposition approach without losing accuracy on very large datasets. Additionally, splicer can automatically classify new sequences via your pre-defined clades file.
We recommend using a conda environment to install splicer together with epa-ng and pplacer.
If you haven't already, configure bioconda.
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
Then, create a new environment with required dependencies and install splicer inside that environment.
git clone https://github.com/flu-crew/splicer.git
cd splicer
conda create -n splicer-env --file conda-requirements.txt
conda activate splicer-env
pip install .
<Run splicer to place/classify sequences>
conda deactivate
To run splicer you need
- A reference tree (a maximum-likelihood or Bayesian tree with your reference sequences)
- A reference alignment file in FASTA format
- A substitution model file from RAxML. If you did not infer your tree with RAxML or do not have a log file saved, we provide an option in splicer to infer the substitution model over your reference tree (see "
splicer model -h" for more details). - (Optional) a clades definition file in tab-separated format of type "<reference-name>\t<clade-name>". E.g., "CY040559<tab>1A.3.3". See below for recommended clade-naming convention.
The first step is to decompose you reference dataset into smaller subtrees and create a scaffold tree. This is performed using the decomp command in splicer:
splicer decomp -t reference.tre -s reference.fasta --clades myclades.tsv -n mydecomposition
The name of the decomposition specified with the -n flag will then be used in subsequent placement steps to tell splicer which decomposition to use.
Now, you are ready to place your new sequences with splicer (e.g., stored in file query.fasta).
splicer place -n mydecomposition -q query.fasta -m epa-ng -s raxml-info.log
If you prefer pplacer, you can use -m pplacer instead.
After the placements are complete, you will see information about each query sequence in standard output, and you can find the final JPLACE file in <mydecomposition>/splicer.jplace.
If you provide a clades-definition file to Splicer at the decomposition step, it will be able to immediately classify your new sequences upon placement. We recommend a PANGO-like clade naming format, where clades have names of type 1B.1.1.2 or BA.1.2. In this convention, clade BA is ancestral to clade BA.1, and BA.1 is ancestral to BA.1.2, etc. Splicer then infers clade-names for the ancestral nodes according to this example:

