Skip to content

Commit

Permalink
prepare for v0.11.0
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Sep 24, 2019
1 parent 9fbbf99 commit a2f4343
Show file tree
Hide file tree
Showing 4 changed files with 179 additions and 18 deletions.
2 changes: 1 addition & 1 deletion LICENSE
@@ -1,6 +1,6 @@
The MIT License (MIT)

Copyright © 2016 Wei Shen
Copyright © 2016-2019 Wei Shen, 2019 Oxford Nanopore Technologies.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
13 changes: 10 additions & 3 deletions README.md
Expand Up @@ -51,6 +51,7 @@ enable researchers to rapidly accomplish common FASTA/Q file manipulations.
- [Usage && Examples](#usage--examples)
- [Benchmark](#benchmark)
- [Citation](#citation)
- [Contributors](#contributors)
- [Acknowledgements](#acknowledgements)
- [Contact](#contact)
- [License](#license)
Expand Down Expand Up @@ -115,17 +116,17 @@ enable researchers to rapidly accomplish common FASTA/Q file manipulations.

## Subcommands

28 functional subcommands in total.
32 functional subcommands in total.

**Sequence and subsequence**

- [`seq`](https://bioinf.shenwei.me/seqkit/usage/#seq) transform sequences (revserse, complement, extract ID...)
- [`subseq`](https://bioinf.shenwei.me/seqkit/usage/#subseq) get subsequences by region/gtf/bed, including flanking sequences
- [`sliding`](https://bioinf.shenwei.me/seqkit/usage/#sliding) sliding sequences, circular genome supported
- [`stats`](https://bioinf.shenwei.me/seqkit/usage/#stats) simple statistics of FASTA/Q files
- [`watch`](https://bioinf.shenwei.me/seqkit/usage/#watch) monitoring and online histograms of sequence features
- [`sana`](https://bioinf.shenwei.me/seqkit/usage/#sana) sanitize broken single line fastq files
- [`faidx`](https://bioinf.shenwei.me/seqkit/usage/#faidx) create FASTA index file and extract subsequence
- [`watch`](https://bioinf.shenwei.me/seqkit/usage/#watch) monitoring and online histograms of sequence features
- [`sana`](https://bioinf.shenwei.me/seqkit/usage/#sana) sanitize broken single line fastq files

**Format conversion**

Expand Down Expand Up @@ -382,6 +383,12 @@ FASTQ:
**W Shen**, S Le, Y Li\*, F Hu\*. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
***PLOS ONE***. [doi:10.1371/journal.pone.0163962](https://doi.org/10.1371/journal.pone.0163962).

## Contributors

- [Wei Shen](https://github.com/shenwei356)
- [Botond Sipos](https://github.com/bsipos) for commands: bam, fish, sana, watch.
- [others](https://github.com/shenwei356/seqkit/graphs/contributors)

## Acknowledgements

We thank [Lei Zhang](https://github.com/jameslz) for testing of SeqKit,
Expand Down
38 changes: 25 additions & 13 deletions doc/docs/download.md
Expand Up @@ -6,13 +6,19 @@ SeqKit is implemented in [Go](https://golang.org/) programming language,

## Latest Version

- [SeqKit v0.10.2](https://github.com/shenwei356/seqkit/releases/tag/v0.10.2)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/seqkit/v0.10.2/total.svg)](https://github.com/shenwei356/seqkit/releases/tag/v0.10.2)
- `seqkit`: fix bug of parsing sequence ID delimited by tab (`\t`). [#78](https://github.com/shenwei356/seqkit/issues/78)
- `seqkit grep`: better logic of `--delete-matched`.
- `seqkit common/rmdup/split`: use xxhash to replace MD5 when comparing with sequence, discard flag `-m/--md5`.
- `seqkit stats`: new flag `-b/--basename` for outputting basename instead of full path.

- [SeqKit v0.11.0](https://github.com/shenwei356/seqkit/releases/tag/v0.11.0)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/seqkit/v0.11.0/total.svg)](https://github.com/shenwei356/seqkit/releases/tag/v0.11.0)
- `seqkit`: fix hanging when reading from truncated gzip file.
- new commands:
- `seqkit amplicon`: retrieve amplicon (or specific region around it) via primer(s).
- [new commands by @bsipos](https://github.com/shenwei356/seqkit/pull/81):
- `seqkit watch`: monitoring and online histograms of sequence features.
- `seqkit sana`: sanitize broken single line fastq files.
- `seqkit fish`: look for short sequences in larger sequences using local alignment.
- `seqkit bam`: monitoring and online histograms of BAM record features.
- `seqkit grep/locate`: reduce memory occupation when using flag `-m/--max-mismatch`.
- `seqkit seq`: fix panic of computing complement sequence for long sequences containing illegal letters without flag `-v` on. [#84](https://github.com/shenwei356/seqkit/issues/84)

### Please cite

- **W Shen**, S Le, Y Li\*, F Hu\*. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
Expand All @@ -28,12 +34,12 @@ SeqKit is implemented in [Go](https://golang.org/) programming language,

OS |Arch |File, 中国镜像 |Download Count
:------|:---------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Linux |32-bit |[seqkit_linux_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_linux_386.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_linux_386.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_linux_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_linux_386.tar.gz)
Linux |**64-bit**|[**seqkit_linux_amd64.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_linux_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_linux_amd64.tar.gz)
OS X |32-bit |[seqkit_darwin_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_darwin_386.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_darwin_386.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_darwin_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_darwin_386.tar.gz)
OS X |**64-bit**|[**seqkit_darwin_amd64.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_darwin_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_darwin_amd64.tar.gz)
Windows|32-bit |[seqkit_windows_386.exe.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_windows_386.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_windows_386.exe.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_windows_386.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_windows_386.exe.tar.gz)
Windows|**64-bit**|[**seqkit_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_windows_amd64.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.10.2/seqkit_windows_amd64.exe.tar.gz)
Linux |32-bit |[seqkit_linux_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_linux_386.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_linux_386.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_linux_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_linux_386.tar.gz)
Linux |**64-bit**|[**seqkit_linux_amd64.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_linux_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_linux_amd64.tar.gz)
OS X |32-bit |[seqkit_darwin_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_darwin_386.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_darwin_386.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_darwin_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_darwin_386.tar.gz)
OS X |**64-bit**|[**seqkit_darwin_amd64.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_darwin_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_darwin_amd64.tar.gz)
Windows|32-bit |[seqkit_windows_386.exe.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_windows_386.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_windows_386.exe.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_windows_386.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_windows_386.exe.tar.gz)
Windows|**64-bit**|[**seqkit_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_windows_amd64.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.11.0/seqkit_windows_amd64.exe.tar.gz)


## Installation
Expand Down Expand Up @@ -103,6 +109,12 @@ Howto:

## Release History

- [SeqKit v0.10.2](https://github.com/shenwei356/seqkit/releases/tag/v0.10.2)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/seqkit/v0.10.2/total.svg)](https://github.com/shenwei356/seqkit/releases/tag/v0.10.2)
- `seqkit`: fix bug of parsing sequence ID delimited by tab (`\t`). [#78](https://github.com/shenwei356/seqkit/issues/78)
- `seqkit grep`: better logic of `--delete-matched`.
- `seqkit common/rmdup/split`: use xxhash to replace MD5 when comparing with sequence, discard flag `-m/--md5`.
- `seqkit stats`: new flag `-b/--basename` for outputting basename instead of full path.
- [SeqKit v0.10.1](https://github.com/shenwei356/seqkit/releases/tag/v0.10.1)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/seqkit/v0.10.1/total.svg)](https://github.com/shenwei356/seqkit/releases/tag/v0.10.1)
- `seqkit fx2tab`: new option `-q/--avg-qual` for outputting average read quality. [#60](https://github.com/shenwei356/seqkit/issues/60)
Expand Down
144 changes: 143 additions & 1 deletion doc/docs/usage.md
Expand Up @@ -14,6 +14,8 @@
- [sliding](#sliding)
- [stats](#stats)
- [faidx](#faidx)
- [watch](#watch)
- [sana](#sana)

**Format conversion**

Expand All @@ -26,8 +28,13 @@

- [grep](#grep)
- [locate](#locate)
- [fish](#fish)
- [amplicon](#amplicon)

**BAM processing and monitoring**

- [bam](#bam)

**Set operations**

- [head](#head)
Expand Down Expand Up @@ -165,7 +172,7 @@ reproduced in different environments with same random seed.
``` text
SeqKit -- a cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Version: 0.10.2
Version: 0.11.0
Author: Wei Shen <shenwei356@gmail.com>
Expand All @@ -177,11 +184,14 @@ Usage:
seqkit [command]
Available Commands:
amplicon retrieve amplicon (or specific region around it) via primer(s)
bam monitoring and online histograms of BAM record features
common find common sequences of multiple files by id/name/sequence
concat concatenate sequences with same ID from multiple files
convert convert FASTQ quality encoding between Sanger, Solexa and Illumina
duplicate duplicate sequences N times
faidx create FASTA index file and extract subsequence
fish look for short sequences in larger sequences using local alignment
fq2fa convert FASTQ to FASTA
fx2tab convert FASTA/Q to tabular format (with length/GC content/GC skew)
genautocomplete generate shell autocompletion script
Expand All @@ -196,6 +206,7 @@ Available Commands:
restart reset start position for circular genome
rmdup remove duplicated sequences by id/name/sequence
sample sample sequences by number or proportion
sana sanitize broken single line fastq files
seq transform sequences (revserse, complement, extract ID...)
shuffle shuffle sequences
sliding sliding sequences, circular genome supported
Expand All @@ -207,6 +218,7 @@ Available Commands:
tab2fx convert tabular format to FASTA/Q format
translate translate DNA/RNA to protein sequence (supporting ambiguous bases)
version print version information and check for update
watch monitoring and online histograms of sequence features
Flags:
--alphabet-guess-seq-length int length of sequence prefix of the first FASTA record based on which seqkit guesses the sequence type (0 for whole seq) (default 10000)
Expand Down Expand Up @@ -780,6 +792,60 @@ Example
file format type num_seqs sum_len min_len avg_len max_len
- FASTA RNA 1,881 154,002 41 81.9 180

## watch

Usage

``` text
monitoring and online histograms of sequence features
Usage:
seqkit watch [flags]
Flags:
-B, --bins int number of histogram bins (default -1)
-W, --delay int sleep this many seconds after online plotting (default 1)
-y, --dump print histogram data to stderr instead of plotting
-f, --fields string target fields (default "ReadLen")
-h, --help help for watch
-O, --img string save histogram to this PDF/image file
-H, --list-fields print out a list of available fields
-L, --log log10(x+1) transform numeric values
-x, --pass pass through mode (write input to stdout)
-p, --print-freq int print/report after this many records (-1 for print after EOF) (default -1)
-b, --qual-ascii-base int ASCII BASE, 33 for Phred+33 (default 33)
-Q, --quiet-mode supress all plotting to stderr
-R, --reset reset histogram after every report
-v, --validate-seq validate bases according to the alphabet
-V, --validate-seq-length int length of sequence to validate (0 for whole seq) (default 10000)
```

Examples



## sana

Usage

``` text
sanitize broken single line fastq files
Usage:
seqkit sana [flags]
Flags:
-h, --help help for sana
-b, --qual-ascii-base int ASCII BASE, 33 for Phred+33 (default 33)
```

Examples


## fq2fa

Usage
Expand Down Expand Up @@ -1452,6 +1518,40 @@ Examples
seq ACGA ACGA + 1 4 ACGA
seq ACGA ACGA + 7 10 ACGA

## fish

Usage

``` text
look for short sequences in larger sequences using local alignment
Usage:
seqkit fish [flags]
Flags:
-a, --all search all
-p, --aln-params string alignment parameters in format "<match>,<mismatch>,<gap_open>,<gap_extend>" (default "4,-4,-2,-1")
-h, --help help for fish
-i, --invert print out references not matching with any query
-q, --min-qual float minimum mapping quality (default 5)
-b, --out-bam string save aligmnets to this BAM file (memory intensive)
-x, --pass pass through mode (write input to stdout)
-g, --print-aln print sequence alignments
-D, --print-desc print full sequence header
-f, --query-fastx string query fasta
-F, --query-sequences string query sequences
-r, --ranges string target ranges, for example: ":10,30:40,-20:"
-s, --stranded search + strand only
-v, --validate-seq validate bases according to the alphabet
-V, --validate-seq-length int length of sequence to validate (0 for whole seq) (default 10000)
```

Examples


## amplicon

Usage
Expand Down Expand Up @@ -1564,6 +1664,48 @@ Examples
$ echo -ne ">seq\nacgcccactgaaatga\n" \
| seqkit amplicon -F aaa -f -r 2:5 -s

## bam

Usage

``` text
monitoring and online histograms of BAM record features
Usage:
seqkit bam [flags]
Flags:
-B, --bins int number of histogram bins (default -1)
-c, --count string count reads per reference and save to this file
-W, --delay int sleep this many seconds after plotting (default 1)
-y, --dump print histogram data to stderr instead of plotting
-e, --exec-after string execute command after reporting
-E, --exec-before string execute command before reporting
-f, --field string target fields
-h, --help help for bam
-C, --idx-count fast read per reference counting based on the BAM index
-i, --idx-stat fast statistics based on the BAM index
-O, --img string save histogram to this PDF/image file
-H, --list-fields list all available BAM record features
-L, --log log10(x+1) transform numeric values
-q, --map-qual int minimum mapping quality
-x, --pass passthrough mode (forward filtered BAM to output)
-F, --prim-only filter out non-primary alignment records
-p, --print-freq int print/report after this many records (-1 for print after EOF) (default -1)
-Q, --quiet-mode supress all plotting to stderr
-M, --range-max float discard record with field (-f) value greater than this flag (default NaN)
-m, --range-min float discard record with field (-f) value less than this flag (default NaN)
-R, --reset reset histogram after every report
-s, --stat print BAM satistics of the input files
-@, --top-bam string save the top -? records to this bam file
-?, --top-size int size of the top-mode buffer (default 100)
```

Examples


## duplicate

Usage
Expand Down

0 comments on commit a2f4343

Please sign in to comment.