k-mer based assembly evaluation
Evaluate genome assemblies with k-mers and more

Often, genome assembly projects have illumina whole genome sequencing reads available for the assembled individual. The k-mer spectrum of this read set can be used for independently evaluating assembly quality without the need of a high quality reference. Merqury provides a set of tools for this purpose.


  • gcc 4.8 or higher
  • meryl
  • Java run time environment (JRE)
  • R with argparse, ggplot2, and scales (tested on R 3.6.1)
  • bedtools
  • samtools
  • igvtools


Get a working meryl in your PATH

Download meryl release:

If the binary doesn't work, download the source and compile:

cd meryl/src
make -j 24
export PATH=/path/to/meryl/…/bin:$PATH

See if we get help message for meryl.

Add a path variable MERQURY

git clone
cd merqury

Add the “export” part to your environment (~/.bash_profile or ~/.profile). Add installation dir paths for bedtools, samtools and igvtools to your enviroenment. source it.


  • !! Merqury assumes all meryl dbs (dirs) are named with .meryl. !!

On a single machine:

ln -s $MERQURY/		# Link merqury
./ <read-db.meryl> [<mat.meryl> <pat.meryl>] <asm1.fasta> [asm2.fasta] <out>

Usage: <read-db.meryl> [<mat.meryl> <pat.meryl>] <asm1.fasta> [asm2.fasta] <out>
	<read-db.meryl>	: k-mer counts of the read set
	<mat.meryl>		: k-mer counts of the maternal haplotype (ex. mat.only.meryl or mat.hapmer.meryl)
	<pat.meryl>		: k-mer counts of the paternal haplotype (ex. pat.only.meryl or pat.hapmer.meryl)
	<asm1.fasta>	: Assembly fasta file (ex. pri.fasta, hap1.fasta or maternal.fasta)
	[asm2.fasta]	: Additional fasta file (ex. alt.fasta, hap2.fasta or paternal.fasta)
	*asm1.meryl and asm2.meryl will be generated. Avoid using the same names as the hap-mer dbs
	<out>		: Output prefix

< > : required
[ ] : optional

1. I have one assembly (pseudo-haplotype or mixed-haplotype)

# I don't have the hap-mers
./ read-db.meryl asm1.fasta out_prefix

# I have the hap-mers
./ read-db.meryl mat.meryl pat.meryl asm1.fasta out_prefix

2. I have two assemblies (diploid)

# I don't have the hap-mers
./ read-db.meryl asm1.fasta asm2.fasta out_prefix

# I have the hap-mers
./ read-db.meryl mat.meryl pat.meryl asm1.fasta asm2.fasta out_prefix
  • Note there is no need to run merqury per-assemblies again. Give two fasta files. Merqury generates stats for each and combined.

How to parallelize

Merqury starts with eval/ When hap-mers are provided, merqury runs modules under trio/ in addition to eval/

The following can run at the same time. Modules with dependency are followed by arrows (->).

  • eval/ -> trio/
  • trio/
  • trio/ per assembly -> trio/

Meryl, the k-mer counter inside, uses the maximum cpus available. Set OMP_NUM_THREADS=24 for example to use 24 threads.

On slurm environment, simply run:

ln -s $MERQURY/	# Link merqury
./ <read-db.meryl> [<mat.meryl> <pat.meryl>] <asm1.fasta> [asm2.fasta] <out>

Change the sbatch to match your environment. (ex. partition)

Outputs from each modules

  • eval/ k-mer completeness, qv, spectra-cn and spectra-asm plots, asm-only .bed and .tdf for tracking errors
  • eval/ just get the qv stats and quit.
  • trio/ hap-mer level spectra-cn plots, hap-mer completeness
  • trio/ blob plots of the hap-mers in each contg/scaffold
  • trio/ phase block statistics, phase block N* plots, hap-mer tracks (.bed and .tdf files)
  • trio/ continuity plots (phase block N* or NG* plots, phase block vs. contig/scaffold plots)
  • trio/ this is run part of, however can be re-run with desired short-range switch parameters. Run trio/ along with it to get the associated plots.

Tips for helps

Run each script without any parameters if not sure what to do. For example, ./trio/ will give a help message and quit.

Following wiki pages have more detailed examples.

1. Prepare meryl dbs (details)

  1. Get the right k size
  2. Build k-mer dbs with meryl
  3. Build hap-mers for trios

2. Overall assembly evaluation (details)

  1. Reference free QV estimate
  2. k-mer completeness (recovery rate)
  3. Spectra copy number analysis
  4. Track error bases in the assembly

3. Phasing assessment with hap-mers (details)

  1. Inherited hap-mer plots
  2. Hap-mer blob plots
  3. Hap-mer completeness (recovery rate)
  4. Spectra copy number analysis per hap-mers
  5. Phased block statistics and switch error rates
  6. Track each haplotype block in the assembly

Available pre-built meryl dbs

Meryl dbs from Illumina WGS and hapmers are available here for

  • A. thaliana COL-0 x CVI-0 F1
  • NA12878 (HG001)
  • HG002

Citing merqury

Please use the following preprint to cite Merqury:

Arang Rhie, Brian P. Walenz, Sergey Koren, Adam M. Phillippy, Merqury: reference-free quality and phasing assessment for genome assemblies, bioRxiv (2020). doi:

