ProPhex is an efficient k-mer index with a small memory footprint. It uses the BWA implementation of the BWT-index. ProPhex is designed as a core computational component of ProPhyle, a phylogeny-based metagenomic classifier allowing fast and accurate read assignment.
git clone https://github.com/prophyle/prophex cd prophex && make -j
Alternative ways of installation
conda install prophex
# Build a ProPhex index ./prophex index -k 25 index.fa # Query reads from reads.fq for k=25 (with k-LCP) ./prophex query -k 25 -u -t 4 index.fa index.fq # Query reads from reads.fq for k=20 (with 4 threads and without k-LCP) ./prophex query -k 20 index.fa index.fq
Program: prophex (a lossless k-mer index) Version: 0.1.1 Authors: Kamil Salikhov, Karel Brinda, Simone Pignotti, Gregory Kucherov Contact: firstname.lastname@example.org Usage: prophex <command> [options] Command: index construct a BWA index and k-LCP query query reads against index klcp construct an additional k-LCP bwtdowngrade downgrade .bwt to the old, more compact format without Occ bwt2fa reconstruct FASTA from BWT
Usage: prophex index [options] <idxbase> Options: -k INT k-mer length for k-LCP -s construct k-LCP and SA in parallel -i sampling distance for SA -h print help message
Usage: prophex query [options] <idxbase> <in.fq> Options: -k INT length of k-mer -u use k-LCP for querying -v output set of chromosomes for every k-mer -p do not check whether k-mer is on border of two contigs, and show such k-mers in output -b print sequences and base qualities -l STR log file name to output statistics -t INT number of threads  -h print help message
Usage: prophex klcp [options] <idxbase> Options: -k INT length of k-mer -s construct k-LCP and SA in parallel -i sampling distance for SA -h print help message
Usage: prophex bwtdowngrade <input.bwt> <output.bwt> -h print help message
Usage: prophex bwt2fa <idxbase> <output.fa> -h print help message
Matches are reported in an extended Kraken format. ProPhex produces a tab-delimited file with the following columns:
- Category (unused,
Uas a legacy value)
- Sequence name
- Final decision (unused,
0as a legacy value)
- Sequence length
- Assigned k-mers. Space-delimited list of k-mer blocks with the same assignments. The list is of
the following format: comma-delimited list of sets (or
Afor ambiguous, or
0for no matches), colon, length. Example:
2157,393595:1 393595:1 0:16(the first k-mer assigned to the nodes
393595, the second k-mer assigned to
393595, the subsequent 16 k-mers unassigned)
- Bases (optional)
- Base qualities (optional)
Can I remove duplicate k-mers from the index in order to use less memory when querying?
Yes, duplicate k-mers can be removed using ProphAsm, which assembles contigs by greedy enumeration of disjoint paths in the associated de-Bruijn graph. BCalm is another tool that can be used with ProPhex. Compared to ProPhex, BCalm has a smaller memory footprint. On the other hand, the resulting FASTA file can be significantly bigger (when assemblying, BCalm stops at every branching k-mer).
Please use Github issues.
Kamil Salikhov <email@example.com>
Simone Pignotti <firstname.lastname@example.org>