A lightweight and high-performance bioinformatics package in Golang
Go R Perl Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
benchmark
featio
seq
seqio
.gitignore
LICENSE
README.md

README.md

bio

GoDoc Go Report Card

A lightweight and high-performance (see seqkit benchmark) bioinformatics package.

FASTA/Q parsing

This package has high performance close to the famous C lib kseq.h.

To test the performance, three datasets are used:

  • dataset_A, bacteria genomes, 2.7G
  • dataset_B, human genome, 2.9G
  • dataset_C, Illumina reads, 2.2G

Summary by seqkit:

file           seq_format   seq_type   num_seqs   min_len        avg_len       max_len
dataset_A.fa   FASTA        DNA          67,748        56       41,442.5     5,976,145
dataset_B.fa   FASTA        DNA             194       970   15,978,096.5   248,956,422
dataset_C.fq   FASTQ        DNA       9,186,045       100            100           100

seqtk (Version 1.1-r92-dirty, using kseq.h) and seqkit (Version v0.3.1.1, using this package) were used to test. Note that seqtk does not support wrapped (fixed line width) ouputing, so seqkit uses -w 0 to disable outputing wrapping. Script memusg is used to assess running time and peak memory usage.

Commands

Tests were repeated 5 times and average time and memory usage were computed.

Results:

benchmark.tsv.png

Install

This package is "go-gettable", just:

go get -u github.com/shenwei356/bio

More

See the README of sub package.

Documentation

See documentation on godoc for more detail.

Copyright (c) 2013-2016, Wei Shen (shenwei356@gmail.com)

MIT License