Skip to content
This repository

A WGS de novo assembler based on the FMD-index for large genomes

Octocat-spinner-32 misc no default filename for ZFile() April 18, 2012
Octocat-spinner-32 .gitignore updated run-fermi.pl February 23, 2012
Octocat-spinner-32 Makefile r735: speed up solid k-mer collection August 03, 2012
Octocat-spinner-32 Makefile.am Makefile.am June 11, 2013
Octocat-spinner-32 README.md r744: release fermi-1.1 August 22, 2012
Octocat-spinner-32 bcr.c ropebwt: cut at ambiguous bases instead of discard August 07, 2012
Octocat-spinner-32 bcr.h ropebwt: cut at ambiguous bases instead of discard August 07, 2012
Octocat-spinner-32 bprope6.c ropebwt: cut at ambiguous bases instead of discard August 07, 2012
Octocat-spinner-32 bprope6.h ropebwt: cut at ambiguous bases instead of discard August 07, 2012
Octocat-spinner-32 bubble.c updated to the latest ksw April 04, 2012
Octocat-spinner-32 build.c the unitig API December 29, 2011
Octocat-spinner-32 cmd.c r748: added `recode' command March 20, 2013
Octocat-spinner-32 cmp.c r690: symmetric contrast April 05, 2012
Octocat-spinner-32 configure.ac configure.ac June 11, 2013
Octocat-spinner-32 correct.c r740: increased default -k for correct August 09, 2012
Octocat-spinner-32 exact.c r735: speed up solid k-mer collection August 03, 2012
Octocat-spinner-32 example.c r747: optionally compile example main func January 14, 2013
Octocat-spinner-32 fermi.1 r744: release fermi-1.1 August 22, 2012
Octocat-spinner-32 fermi.h r751: change the version string July 05, 2013
Octocat-spinner-32 khash.h code clean up; declare kseq_*() as global December 29, 2011
Octocat-spinner-32 ksa.c rename sais.c to ksa.c to avoid conflictions August 19, 2011
Octocat-spinner-32 kseq.h bug in the latest kseq.h March 19, 2012
Octocat-spinner-32 ksort.h pop complex bubble seems working November 07, 2011
Octocat-spinner-32 kstring.h avoid unnecessary memory October 04, 2011
Octocat-spinner-32 ksw.c r688: bugfix - segfault in scaf April 05, 2012
Octocat-spinner-32 ksw.h updated to the latest ksw April 04, 2012
Octocat-spinner-32 kvec.h keep the graph structure October 10, 2011
Octocat-spinner-32 mag.c when breaking edges, skip all small tips March 20, 2012
Octocat-spinner-32 mag.h when breaking edges, skip all small tips March 20, 2012
Octocat-spinner-32 main.c r748: added `recode' command March 20, 2013
Octocat-spinner-32 merge.c r712-smart-rld: move some code in merge.c to rld.* May 26, 2012
Octocat-spinner-32 priv.h smart k-mer length determination August 02, 2012
Octocat-spinner-32 rld.c minor changes June 21, 2012
Octocat-spinner-32 rld.h r712-smart-rld: move some code in merge.c to rld.* May 26, 2012
Octocat-spinner-32 ropebwt.c bugfix: cutN not working! July 05, 2013
Octocat-spinner-32 run-fermi.pl r745: optionally skip error correction September 11, 2012
Octocat-spinner-32 scaf.c updated to the latest ksw April 04, 2012
Octocat-spinner-32 seq.c change the minimum fltuniq k-mer to 15 February 29, 2012
Octocat-spinner-32 seqsort.c r680: "contrast"; NOT working April 04, 2012
Octocat-spinner-32 smem.c run-fermi.pl: break contigs with PE reads February 27, 2012
Octocat-spinner-32 sub.c allow to get the complement subset August 06, 2012
Octocat-spinner-32 unitig.c unitig: more robust to certain input February 16, 2012
Octocat-spinner-32 utils.c a new error correction method November 15, 2011
Octocat-spinner-32 utils.h mog I/O working February 06, 2012
README.md

Getting Started

  1. Acquire the fermi source code from the download page and compile with (x.y is the version number):

    tar -jxf fermi-x.y.tar.bz2
    (cd fermi-x.y; make)
    
  2. Download the C. elegans reads SRR065390 from SRA and convert to the FASTQ format with the fastq-dump tool from the SRA toolkit:

    fastq-dump --split-spot SRR065390.lite.sra
    
  3. Perform assembly with:

    fermi-x.y/run-fermi.pl -ct8 -e fermi-x.y/fermi SRR065390.fastq > fmdef.mak
    make -f fmdef.mak -j 8 > fmdef.log 2>&1
    

The entire procedure takes about several hours with 8 CPU cores. File fmdef.p5.fq.gz contains the final contigs. The quality line in the FASTQ-like format gives the per-base read depth computed from non-redundant error-corrected reads.

FAQ

0. In addition to this FAQ, are there any other documentations?

The algorithms and evaluations are described in the fermi paper with the preprint available from arXiv. The detailed usage is documented in the fermi manpage.

1. What is fermi?

Fermi is a de novo assembler for Illumina reads from whole-genome short-gun sequencing. It also provides tools for error correction, sequence-to-read alignment and comparison between read sets. It uses the FMD-index, a novel compressed data structure, as the key data representation.

2. How is fermi different from other assemblers?

For small genomes, fermi is not much different from other assemblers in terms of performance. Nonetheless, for mammalian genomes, fermi is one of the few choices that can do the job in a relatively small memory footprint. It can assemble 35-fold human data in 90GB shared memory with an overall similar contiguity and accuracy to other mainstream assemblers.

In addition to de novo assembly, fermi ultimately aims to preserve all the information in the raw reads, in particular heterozygous events. SNP and INDEL calling can be achieved by aligning the fermi unitigs to the reference genome and has been shown to be advantageous over other approaches in some aspects (see also the preprint).

3. What is the relationship between fermi and SGA?

Fermi is substantially influenced by SGA. It follows a similar workflow, including the idea of contrasting read sets. On the other hand, the internal implementation of fermi is distinct from that of SGA. Fermi is based on a novel data structure and uses different algorithms for almost every step. As to the end results, fermi has a similar performance to SGA for features shared between them, and is arguably easier to use. In all, both fermi and SGA are viable options for de novo assembly and contrast variant calling.

4. Are there release notes?

Yes, below this FAQ.

5. How to install fermi?

You may clone the fermi github repository to get the latest source code, or acquire the source code of stable releases from the download page. You can compile fermi by invoking make in the source code directory. The only library dependency is zlib. After compilation, you may copy fermi and run-fermi.pl to your PATH or simply use the executables in the source code directory.

6. How to run fermi for de novo assembly?

The fermi manpage shows an example. Briefly, if you have Illumina short-insert paired-end reads read1.fq.gz and read2.fq.gz, you can run:

run-fermi.pl -Pe ./fermi -t12 read1.fq.gz read2.fq.gz > fmdef.mak
make -f fmdef.mak -j 12

to perform assembly using 12 CPU cores. The fmdef.p5.fq.gz gives the final contigs using the paired-end information. If you only want to correct errors, you may use

make -f fmdef.mak -j 12 fmdef.ec.fq.gz

7. What is contrast assembly? How can I use it?

The idea of contrast assembly was first proposed and has been implemented by Jared Simpson and Richard Durbin. It works by assembling reads containing a k-mer that is present in one set of reads but absent from another set of reads. The contigs we get this way will span variants, including mutations and breakpoints, only seen from the first set of reads. Mapping the contigs back provides the locations. This approach directly focuses on the differences between read sets and helps to reduce the complication of structural variations and the imperfect reference genome.

To perform contrast assembly given two sets of reads, we need to generate error-corrected FMD-index for both sets, use the contrast command to pick reads unique to one read set, and then apply the sub command to extract the FMD-index of selected reads. The following shows an example:

# error correction for sample1; paired reads are interleaved in sample1.fq.gz
run-fermi.pl -ct12 -p sample1 sample1.fq.gz > sample1.mak
make -f sample1.mak -j 12 sample1.ec.rank
# error correction for sample2
run-fermi.pl -ct12 -p sample2 sample2.fq.gz > sample2.mak
make -f sample2.mak -j 12 sample2.ec.rank
# identify reads unique to one sample
fermi contrast -t12 sample1.ec.fmd sample1.ec.rank sample1.sub sample2.ec.fmd sample2.ec.rank sample2.sub
# generate the FMD-index for reads unique to sample1; similar applied to sample2
fermi sub -t12 sample1.fmd sample1.sub > sample1.sub.fmd
# assemble unique reads and perform graph simplification
fermi unitig -l50 -t12 sample1.sub.fmd > sample1.sub.mag
fermi clean -CA -l150 sample1.sub.mag > sample1-cleaned.sub.mag

We can align the resulting contigs sample1-cleaned.sub.mag to the reference genome with BWA-SW to pinpoint the mutations and break points. It is also possible to compare one sample to multiple samples by intersecting selected reads using the bitand command and then performs the assembly.

A more convenient command-line interface is likely to be added in future.

Release Notes

Release 1.1 (2012-08-22)

This release reduces the runtime of assembly by introducing an improved version of the BCR algorithm for constructing FMD-index and by deploying heuristics in error correction. On two human data sets, fermi takes 30% less wall-clock time and produces slightly longer scaftigs, though at the cost of marginally increased assembly break points in comparison to release 1.0.

(1.1: 2012-08-22, r744)

Release 1.0 (2012-04-09)

This is the first public release of fermi, a de novo assembler and analysis tool for whole-genome shot-gun sequencing. Source code can be acquired from the download page. Please read the manpage and the FAQ for detailed usage.

(1.0: 2012-04-09, r700)

Something went wrong with that request. Please try again.