-
Notifications
You must be signed in to change notification settings - Fork 23
read_kiss
KISS .
The KISS format (Keep it Simple Stupid) is a text based data format for describing generic feature information in a simple format with one feature per line in 12 tab-separated columns:
- S_ID: Subject ID - e.g. chr12.
- S_BEG: Begin position of a feature relating to the subject sequence. 0-based.
- S_END: End position of a feature relating to the subject sequence.
- Q_ID: Query ID - e.g. a Solexa read ID e.g. a3_2VCOjxwXsN1
- SCORE: A float that can describe e.g. a BLAT score.
- STRAND: Denotes which strand a feature relates to. + or -.
- HITS: Number of times a feature is found in the subject sequence.
- ALIGN: Comma-separated list of alignment descriptors for mismatches, insertions, and deletions
*
). - BLOCK_COUNT: Number of blocks in a feature (i.e. exons).
- BLOCK_BEGS: Comma-separated list of block begin positions. Offset is S_BEG.
- BLOCK_LENS: Comma-separated list of block lengths.
- BLOCK_TYPE: Comma-separated list of block types (0=Gap,1=Non-gap,2=CDS,3=5'UTR,4=3'UTR).
Values in fields 4-12 are optional and empty fields must contain a '.'.
*
) Alignment descriptors:
- mismatch: (offset:S-base>Q-base) - e.g. 0:C>T,13:G>C
- insertion: (offset:->Q-base) - e.g. 8:->G,18:->A
- deletions: (offset:S-base>-) - e.g. 5:A>-,16:T>-
The offset position is based on S_BEG
and do not change with insertions or deletions. Alignment descriptors are based on the + strand.
Descriptors should be sorted by offset postion.
Read more about the KISS format here:
http://code.google.com/p/biopieces/wiki/KissFormat
read_kiss [options] -i <KISS file(s)>
[-? | --help] # Print full usage description.
[-i <files!> | --data_in=<files!>] # Comma separated list of files or glob expression to read.
[-n <uint> | --num=<uint>] # Limit number of records to read.
[-I <file!> | --stream_in=<file!>] # Read input stream from file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output stream to file - Default=STDOUT
[-v | --verbose] # Verbose output.
To read all KISS entries from a file:
read_kiss -i test.kiss
To read in only 10 records from a KISS file:
read_kiss -n 10 -i test.kiss
To read all KISS entries from multiple files:
read_kiss -i test1.kiss,test2.kiss
To read KISS entries from multiple files using a glob expression:
read_kiss -i '*.kiss'
Martin Asser Hansen - Copyright (C) - All rights reserved.
October 2009
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
read_kiss is part of the Biopieces framework.