Splicer

Splicer scales-up phylogenetic placement with EPA-ng and pplacer, and makes it applicable to datasets with millions of reference sequences. Splicer performs placement in sub-linear time using a decomposition approach without losing accuracy on very large datasets. Additionally, splicer can automatically classify new sequences via your pre-defined clades file.

Installation

We recommend using a conda environment to install splicer together with epa-ng and pplacer.

If you haven't already, configure bioconda.

conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

Then, create a new environment with required dependencies and install splicer inside that environment.

git clone https://github.com/flu-crew/splicer.git
cd splicer
conda create -n splicer-env --file conda-requirements.txt
conda activate splicer-env
pip install .
<Run splicer to place/classify sequences>
conda deactivate

Usage

To run splicer you need

A reference tree (a maximum-likelihood or Bayesian tree with your reference sequences)
A reference alignment file in FASTA format
A substitution model file from RAxML. If you did not infer your tree with RAxML or do not have a log file saved, we provide an option in splicer to infer the substitution model over your reference tree (see "splicer model -h" for more details).
(Optional) a clades definition file in tab-separated format of type "<reference-name>\t<clade-name>". E.g., "CY040559<tab>1A.3.3". See below for recommended clade-naming convention.

The first step is to decompose you reference dataset into smaller subtrees and create a scaffold tree. This is performed using the decomp command in splicer:

splicer decomp -t reference.tre -s reference.fasta --clades myclades.tsv -n mydecomposition

The name of the decomposition specified with the -n flag will then be used in subsequent placement steps to tell splicer which decomposition to use.

Now, you are ready to place your new sequences with splicer (e.g., stored in file query.fasta).

splicer place -n mydecomposition -q query.fasta -m epa-ng -s raxml-info.log

If you prefer pplacer, you can use -m pplacer instead.

After the placements are complete, you will see information about each query sequence in standard output, and you can find the final JPLACE file in <mydecomposition>/splicer.jplace.

Sequence classification with Splicer

If you provide a clades-definition file to Splicer at the decomposition step, it will be able to immediately classify your new sequences upon placement. We recommend a PANGO-like clade naming format, where clades have names of type 1B.1.1.2 or BA.1.2. In this convention, clade BA is ancestral to clade BA.1, and BA.1 is ancestral to BA.1.2, etc. Splicer then infers clade-names for the ancestral nodes according to this example:

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figures		figures
splicer		splicer
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda-requirements.txt		conda-requirements.txt
setup.py		setup.py
splicer.py		splicer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Splicer

Installation

Usage

Sequence classification with Splicer

About

Uh oh!

Releases

Packages

Languages

License

flu-crew/splicer

Folders and files

Latest commit

History

Repository files navigation

Splicer

Installation

Usage

Sequence classification with Splicer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages