A lightweight and high-performance (see seqkit benchmark) bioinformatics package.
This package has high performance close to the famous C lib
To test the performance, three datasets are used:
- dataset_A, bacteria genomes, 2.7G
- dataset_B, human genome, 2.9G
- dataset_C, Illumina reads, 2.2G
file seq_format seq_type num_seqs min_len avg_len max_len dataset_A.fa FASTA DNA 67,748 56 41,442.5 5,976,145 dataset_B.fa FASTA DNA 194 970 15,978,096.5 248,956,422 dataset_C.fq FASTQ DNA 9,186,045 100 100 100
using this package) were used to test.
seqtk does not support wrapped (fixed line width) ouputing, so
-w 0 to disable outputing wrapping.
memusg is used to assess running time
and peak memory usage.
Tests were repeated 5 times and average time and memory usage were computed.
This package is "go-gettable", just:
go get -u github.com/shenwei356/bio
See the README of sub package.
Copyright (c) 2013-2016, Wei Shen (email@example.com)