Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 9 revisions

Biopiece: read_kiss

Description

KISS .

The KISS format (Keep it Simple Stupid) is a text based data format for describing generic feature information in a simple format with one feature per line in 12 tab-separated columns:

  1. S_ID: Subject ID - e.g. chr12.
  2. S_BEG: Begin position of a feature relating to the subject sequence. 0-based.
  3. S_END: End position of a feature relating to the subject sequence.
  4. Q_ID: Query ID - e.g. a Solexa read ID e.g. a3_2VCOjxwXsN1
  5. SCORE: A float that can describe e.g. a BLAT score.
  6. STRAND: Denotes which strand a feature relates to. + or -.
  7. HITS: Number of times a feature is found in the subject sequence.
  8. ALIGN: Comma-separated list of alignment descriptors for mismatches, insertions, and deletions *).
  9. BLOCK_COUNT: Number of blocks in a feature (i.e. exons).
  10. BLOCK_BEGS: Comma-separated list of block begin positions. Offset is S_BEG.
  11. BLOCK_LENS: Comma-separated list of block lengths.
  12. BLOCK_TYPE: Comma-separated list of block types (0=Gap,1=Non-gap,2=CDS,3=5'UTR,4=3'UTR).

Values in fields 4-12 are optional and empty fields must contain a '.'.

*) Alignment descriptors:

  • mismatch: (offset:S-base>Q-base) - e.g. 0:C>T,13:G>C
  • insertion: (offset:->Q-base) - e.g. 8:->G,18:->A
  • deletions: (offset:S-base>-) - e.g. 5:A>-,16:T>-

The offset position is based on S_BEG and do not change with insertions or deletions. Alignment descriptors are based on the + strand.

Descriptors should be sorted by offset postion.

Read more about the KISS format here:

http://code.google.com/p/biopieces/wiki/KissFormat

Usage

read_kiss [options] -i <KISS file(s)>

Options

[-?          | --help]               #  Print full usage description.
[-i <files!> | --data_in=<files!>]   #  Comma separated list of files or glob expression to read.
[-n <uint>   | --num=<uint>]         #  Limit number of records to read.
[-I <file!>  | --stream_in=<file!>]  #  Read input stream from file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output stream to file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

To read all KISS entries from a file:

read_kiss -i test.kiss

To read in only 10 records from a KISS file:

read_kiss -n 10 -i test.kiss

To read all KISS entries from multiple files:

read_kiss -i test1.kiss,test2.kiss

To read KISS entries from multiple files using a glob expression:

read_kiss -i '*.kiss'

See also

write_kiss

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

October 2009

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

read_kiss is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally