blat_seq

Biopiece: blat_seq

Description

blat_seq uses the UCSC Genome Browser's BLAT to search all sequences in the stream for matches in a specified database . The sequence type of the query sequences is guessed automagically.

Resulting records look like this:

S_BEGS: 83755,
Q_ID: 5_gECOjxwXsN1/1
S_LEN: 159109
Q_LEN: 35
REPMATCHES: 0
MATCHES: 34
S_ID: M1_c17
NCOUNT: 1
SPAN: 34
Q_END: 34
STRAND: -
SCORE: 34
BLOCK_LENS: 35,
REC_TYPE: PSL
QNUMINSERT: 0
Q_BEG: 0
S_BEG: 83755
MISMATCHES: 0
SBASEINSERT: 0
Q_BEGS: 0,
SNUMINSERT: 0
BLOCK_COUNT: 1
QBASEINSERT: 0
S_END: 83789
---

BLAT must be installed in order for blat_seq to work.

Usage

... | blat_seq [options] -d <database> | -g <genome>

Options

[-?          | --help]                 #  Print full usage description.
[-d <file>   | --database=<file>]      #  BLAT against FASTA file.
[-g <genome> | --genome=<genome>]      #  BLAT against genome.
[-f          | --fast_map]             #  Fast DNA/DNA mapping with high %ID and without introns.
[-c          | --ooc]                  #  Use overused tile file (faster, but less sensitive).
[-i <uint>   | --intron_max=<uint>]    #  Maximum intron size                       -  Default=750000
[-t <uint>   | --tile_size=<uint>]     #  Size of match that triggers an alignment  -  Default=11
[-s <uint>   | --step_size=<uint>]     #  Spacing between tiles                     -  Default=11
[-m <uint>   | --min_identity=<uint>]  #  Minimum sequence identity in percent      -  Default=90
[-M <uint>   | --min_score=<uint>]     #  Minimum score                             -  Default=0
[-N          | --allow_N_blocks]       #  Allow alignment extension through N blocks.
[-o <uint>   | --one_off=<uint>]       #  Allows one mismatch in tile               -  Default=0
[-I <file!>  | --stream_in=<file!>]    #  Read input from stream file               -  Default=STDIN
[-O <file>   | --stream_out=<file>]    #  Write output to stream file               -  Default=STDOUT
[-v          | --verbose]              #  Verbose output.

Examples

To BLAT sequences against a FASTA file do:

read_fasta -i query_sequences.fna | blat_seq -d subject_sequences.fna

To BLAT sequences against a genome previously formatted with format_genome, do:

read_fasta -i query_sequences.fna | blat_seq -g <genome>

Use write_psl to output data in BLATs native format.

To list avalible genomes use list_genomes.

Author

mail@maasha.dk

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

blat_seq is part of the Biopieces framework.

http://www.biopieces.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly