ProphAsm – a tool for computing simplitigs from k-mer sets and for k-mer sets manipulation
Build Status


ProphAsm is a tool for computing simplitigs from k-mer sets and for k-mer set manipulation. Simplitigs are genomic sequences computed as disjoint paths in a bidirectional vertex-centric de Bruijn graph. Compared to unitigs, simplitigs provide an improvement in the total number of sequences and their cumulative length, while both representations contain exactly the same k-mers

Upon execution, ProphAsm first loads all specified datasets (see the -i param) and computes their k-mer sets (see the -k param). If the -x param is provided, ProphAsm then computes their intersection, subtracts the intersection from the individual k-mer sets and computes unitigs for the intersection. If output files are specified (see the -o param).


If you want to cite ProphAsm, please use the following reference:

Brinda K, Baym M, and Kucherov G. Simplitigs as an efficient and scalable representation of de Bruijn graphs . bioRxiv 2020.01.12.903443, 2020.

If you want to cite the concept of simplitigs, please include also the parallel manuscript (the same concept discovered independently and simultaneously):

Rahman A and Medvedev P. Representation of k-mer sets using spectrum-preserving string sets . bioRxiv 2020.01.07.896928, 2020.


  • GCC 4.8+ or equivalent
  • ZLib

Getting started

git clone
cd prophasm && make -j
./prophasm -k 15 -i tests/test1.fa -i tests/test2.fa -o _out1.fa -o _out2.fa -x _intersect.fa -s _stats.tsv


prophasm -k 31 -i input.fa -o simplitigs.fa  # compute simplitigs for a single dataset
prophasm -k 31 -i inset1.fa -i inset2.fa -o outset1.fa outset2.fa  # compute simplitigs for two datasets
prophasm -k 31 -i inset1.fa -i inset2.fa -x intersect.fa -o outset1.fa outset2.fa  # compute simplitigs for two datasets and subtract their intersection

Command line parameters

Program:  prophasm (a greedy assembler for k-mer set compression)
Version:  0.1.0
Contact:  Karel Brinda <>

Usage:    prophasm [options]

Examples: prophasm -k 15 -i f1.fa -i f2.fa -x fx.fa
             - compute intersection of f1 and f2
          prophasm -k 15 -i f1.fa -i f2.fa -x fx.fa -o g1.fa -o g2.fa
             - compute intersection of f1 and f2, and subtract it from them
          prophasm -k 15 -i f1.fa -o g1.fa
             - re-assemble f1 to g1

Command-line parameters:
 -k INT   K-mer size.
 -i FILE  Input FASTA file (can be used multiple times).
 -o FILE  Output FASTA file (if used, must be used as many times as -i).
 -x FILE  Compute intersection, subtract it, save it.
 -s FILE  Output file with k-mer statistics.
 -S       Silent mode.

Note that '-' can be used for standard input/output.


def extend_simplitig_forward (K, simplitig):
	extending = True
	while extending:
		extending = False
		q = simplitig[-k+1:]
		for x in [‘A’, ‘C’, ‘G’, ‘T’]:
			kmer = q + x
			if kmer in K:
				extending = True
				simplitig = simplitig + x
				S.remove (kmer)
				S.remove (reverse_complement (kmer))
	return S, s
def get_maximal_simplitig (K, initial_kmer):
	simplitig = initial_kmer
	K.remove (initial_kmer)
	K.remove (reverse_completement (initial_kmer))
	K, simplitig = extend_simplitig_forward (K, simplitig)
	simplitig = reverse_completent (simplitig)
	K, simplitig = extend_simplitig_forward (K, simplitig)
	return K, simplitig
def compute_simplitigs (kmers):
	K = set()
	for kmer in kmers:
		K.add (kmer)
		K.add (reverse_completement(kmer))
	simplitigs = set()
	while |K|>0:
		initial_kmer = K.random()
		K, simplitig = get_maximal_simplitig (K, initial_kmer)
		simplitigs.add (simplitig)
	return simplitigs

Karel Brinda <>

