Skip to content

ewedell/BSCAMPP_code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 

Repository files navigation

BSCAMPP_code

New Code for EPA-ng-BSCAMPP

This is based on the code from EPA-ng-SCAMPP available at https://github.com/chry04/PLUSplacer.

The algorithm in described in detail at doi: https://doi.org/10.1101/2022.10.26.513936

It is recommended that EPA-ng-BSCAMPP be used with subtrees of size 2000 and with 5 votes based on current best results (especially if sequences are fragmentary). Defaults for the subtree size and number of votes are set to 2,000 and 5 respectively.

The only required arguments are the RAxML-ng info file (with typical suffix bestModel), the RAxML-ng tree (with suffix bestTree), a multiple sequence alignment in fasta format with both the aligned queries and reference sequences, and the desired output directory for the EPA-ng-BSCAMPP.jplace output file.

REQUIREMENTS

Python3, TreeSwift.

Treeswift can be installed using pip. See https://github.com/niemasd/TreeSwift for details.

USAGE

To get started there is a test tree and MSA available in the testing folder (originally from the RNAsim-VS datasets https://doi.org/10.1093/sysbio/syz063).

Simply cd into the BSCAMPP_code directory and use the command as follows:

python3 EPA-ng-BSCAMPP.py -i ./testing/aln_dna.fa.raxml.bestModel -t ./testing/backbone.tree -d ./ -a ./testing/aln_dna.fa -b 100

The output jplace file should appear in the BSCAMPP_code directory.

A more comprehensive usage is as follows:

usage: EPA-ng-BSCAMPP.py [-h] -i INFO -t TREE -d OUTDIR -a ALIGNMENT [-o OUTPUT] [-m MODEL] [-b SUBTREESIZE] [-V VOTES] [-s SUBTREETYPE] [-n TMPFILENBR] [-q QALIGNMENT] [-f FRAGMENTFLAG] [-v]

arguments:

  • -h, --help show this help message and exit
  • -i INFO, --info INFO Path to model parameters
  • -t TREE, --tree TREE Path to reference tree with estimated branch lengths
  • -d OUTDIR, --outdir OUTDIR Directory path for output
  • -a ALIGNMENT, --alignment ALIGNMENT Path for query and reference sequence alignment in fasta format
  • -o OUTPUT, --output OUTPUT Output file name
  • -m MODEL, --model MODEL Model used for edge distances
  • -b SUBTREESIZE, --subtreesize SUBTREESIZE Integer size of the subtree
  • -V VOTES, --votes VOTES Integer number of votes per query sequence
  • -s SIMILARITYFLAG, --similarityflag SIMILARITYFLAG boolean, False if maximizing sequence similarity instead of simple Hamming distance (ignoring gap sites in the query)
  • -n TMPFILENBR, --tmpfilenbr TMPFILENBR tmp file number
  • -q QALIGNMENT, --qalignment QALIGNMENT Path to query sequence alignment in fasta format (ref alignment separate)
  • -f FRAGMENTFLAG, --fragmentflag FRAGMENTFLAG boolean, True if queries contain fragments
  • -v, --version show the version number and exit