Skip to content

michieldhadamus/bio

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bio

GoDoc Go Report Card

A lightweight and high-performance (see seqkit benchmark) bioinformatics package.

FASTA/Q parsing

This package has high performance close to the famous C lib kseq.h.

To test the performance, three datasets are used:

  • dataset_A, bacteria genomes, 2.7G
  • dataset_B, human genome, 2.9G
  • dataset_C, Illumina reads, 2.2G

Summary by seqkit:

file           seq_format   seq_type   num_seqs   min_len        avg_len       max_len
dataset_A.fa   FASTA        DNA          67,748        56       41,442.5     5,976,145
dataset_B.fa   FASTA        DNA             194       970   15,978,096.5   248,956,422
dataset_C.fq   FASTQ        DNA       9,186,045       100            100           100

seqtk (Version 1.1-r92-dirty, using kseq.h) and seqkit (Version v0.3.1.1, using this package) were used to test. Note that seqtk does not support wrapped (fixed line width) ouputing, so seqkit uses -w 0 to disable outputing wrapping. Script memusg is used to assess running time and peak memory usage.

Commands

Tests were repeated 5 times and average time and memory usage were computed.

Results:

benchmark.tsv.png

Install

This package is "go-gettable", just:

go get -u github.com/shenwei356/bio

More

See the README of sub package.

Documentation

See documentation on godoc for more detail.

Copyright (c) 2013-2016, Wei Shen (shenwei356@gmail.com)

MIT License

About

A lightweight and high-performance bioinformatics package in Golang

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 90.0%
  • R 6.8%
  • Perl 2.6%
  • Other 0.6%