Skip to content
Fast multi-line FASTA/Q reader in several programming languages
C Python Lua Perl
Find file
Latest commit 4fb7660 Apr 10, 2013 @lh3 Merge pull request #5 from drio/master
go version
Failed to load latest commit information.
README Removing tabs. Apr 10, 2013
kseq_test.c added an example for kseq.h Aug 31, 2011
readfq.go Adding explanation in readme. Apr 10, 2013
readfq.lua more comments Aug 31, 2011 minor cleanup Aug 31, 2011 str.partition is quicker than str.split when one only wants the first… Apr 10, 2013
seq.txt C and Lua implementation Aug 31, 2011


Readfq is a collection of routines for parsing the FASTA/FASTQ format. It
seamlessly parses both FASTA and multi-line FASTQ with a simple interface.

Readfq is first implemented in a single C header file and then ported to Lua,
Perl and Python as a single function less than 50 lines. For users of scripting
languages, I encourage to copy-and-paste the function instead of using readfq
as a library. It is always good to avoid unnecessary library dependencies.

Readfq also strives for efficiency. The C implementation is among the fastest
(if not the fastest). The Python and Perl implementations are several to tens
of times faster than the official Bio* implementations. If you can speed up
readfq further, please let me know. I am not good at optimizing programs in
scripting languages. Thank you.

As to licensing, the C implementation is distributed under the MIT license.
Implementations in other languages are released without a license. Just copy
and paste. You do not need to acknowledge me. The following shows a brief
example for each programming language:

  # Perl
  my @aux = undef; # this is for keeping intermediate data
  while (my ($name, $seq, $qual) = readfq(\*STDIN, \@aux)) { print "$seq\n"; }

  # Python: generator function
  for name, seq, qual in readfq(sys.stdin): print seq

  -- Lua: closure
  for name, seq, qual in readfq(io.stdin) do print seq end

  /* Go */
  package main

  import (

  func main() {
    var fqr fasta.FqReader
    fqr.Reader = bufio.NewReader(os.Stdin)
    for r, done := fqr.Iter(); !done; r, done = fqr.Iter() {

  /* C */
  #include <zlib.h>
  #include <stdio.h>
  #include "kseq.h"
  KSEQ_INIT(gzFile, gzread)

  int main() {
    gzFile fp;
    kseq_t *seq;
    fp = gzdopen(fileno(stdin), "r");
    seq = kseq_init(fp);
    while (kseq_read(seq) >= 0) puts(seq->seq.s);
    return 0;

Some naive benchmarks. To convert a FASTQ containing 25 million 100bp reads to FASTA,
FASTX-Toolkit (parsing 4-line FASTQ only) takes 325.0 CPU seconds and EMBOSS' seqret
247.8 seconds. My seqtk, which uses the kseq.h library, finishes the task in 24.6
seconds, 10X faster. For retrieving 25k sequences by name from the same FASTQ,
BioPython takes 963 seconds, while takes 136 seconds; BioPerl takes more
than 40 minutes (killed), while 273 seconds. Seqtk takes 29 seconds.

Something went wrong with that request. Please try again.