Skip to content
ProphAsm – a tool for computing simplitigs from k-mer sets and for k-mer sets manipulation
C++ C Python Makefile Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Build Status


ProphAsm is a tool for computing simplitigs from k-mer sets and for k-mer set manipulation. Simplitigs are genomic sequences computed as disjoint paths in a bidirectional vertex-centric de Bruijn graph. Compared to unitigs, simplitigs provide an improvement in the total number of sequences and their cumulative length, while both representations contain exactly the same k-mers

Upon execution, ProphAsm first loads all specified datasets (see the -i param) and computes their k-mer sets (see the -k param). If the -x param is provided, ProphAsm then computes their intersection, subtracts the intersection from the individual k-mer sets and computes unitigs for the intersection. If output files are specified (see the -o param).


If you want to cite ProphAsm, please use the following reference:

Brinda K, Baym M, and Kucherov G. Simplitigs as an efficient and scalable representation of de Bruijn graphs . bioRxiv 2020.01.12.903443, 2020.

If you want to cite the concept of simplitigs, please include also the parallel manuscript (the same concept discovered independently and simultaneously):

Rahman A and Medvedev P. Representation of k-mer sets using spectrum-preserving string sets . bioRxiv 2020.01.07.896928, 2020.


  • GCC 4.8+ or equivalent
  • ZLib

Getting started

git clone
cd prophasm && make -j
./prophasm -k 15 -i tests/test1.fa -i tests/test2.fa -o _out1.fa -o _out2.fa -x _intersect.fa -s _stats.tsv


prophasm -k 31 -i input.fa -o simplitigs.fa  # compute simplitigs for a single dataset
prophasm -k 31 -i inset1.fa -i inset2.fa -o outset1.fa outset2.fa  # compute simplitigs for two datasets
prophasm -k 31 -i inset1.fa -i inset2.fa -x intersect.fa -o outset1.fa outset2.fa  # compute simplitigs for two datasets and subtract their intersection

Command line parameters

Program:  prophasm (a greedy assembler for k-mer set compression)
Version:  0.1.0
Contact:  Karel Brinda <>

Usage:    prophasm [options]

Examples: prophasm -k 15 -i f1.fa -i f2.fa -x fx.fa
             - compute intersection of f1 and f2
          prophasm -k 15 -i f1.fa -i f2.fa -x fx.fa -o g1.fa -o g2.fa
             - compute intersection of f1 and f2, and subtract it from them
          prophasm -k 15 -i f1.fa -o g1.fa
             - re-assemble f1 to g1

Command-line parameters:
 -k INT   K-mer size.
 -i FILE  Input FASTA file (can be used multiple times).
 -o FILE  Output FASTA file (if used, must be used as many times as -i).
 -x FILE  Compute intersection, subtract it, save it.
 -s FILE  Output file with k-mer statistics.
 -S       Silent mode.

Note that '-' can be used for standard input/output.


def extend_simplitig_forward (K, simplitig):
	extending = True
	while extending:
		extending = False
		q = simplitig[-k+1:]
		for x in [‘A’, ‘C’, ‘G’, ‘T’]:
			kmer = q + x
			if kmer in K:
				extending = True
				simplitig = simplitig + x
				S.remove (kmer)
				S.remove (reverse_complement (kmer))
	return S, s
def get_maximal_simplitig (K, initial_kmer):
	simplitig = initial_kmer
	K.remove (initial_kmer)
	K.remove (reverse_completement (initial_kmer))
	K, simplitig = extend_simplitig_forward (K, simplitig)
	simplitig = reverse_completent (simplitig)
	K, simplitig = extend_simplitig_forward (K, simplitig)
	return K, simplitig
def compute_simplitigs (kmers):
	K = set()
	for kmer in kmers:
		K.add (kmer)
		K.add (reverse_completement(kmer))
	simplitigs = set()
	while |K|>0:
		initial_kmer = K.random()
		K, simplitig = get_maximal_simplitig (K, initial_kmer)
		simplitigs.add (simplitig)
	return simplitigs

Similar programs


Please use Github issues.


See Releases.




Karel Brinda <>

You can’t perform that action at this time.