Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 8 revisions

Biopiece: read_bed

Description

The BED (Browser Extensible Data) format is a tabular format for data pertaining to one of the Eukaryotic genomes in the UCSC genome brower. The BED format consists of up to 12 columns, where the first three are mandatory.

  1. CHR - the name of the chromosome.
  2. CHR_BEG - the chromosome begin position.
  3. CHR_END - the chromosome end position.
  4. Q_ID - the name of the feature.
  5. SCORE - a score between 0 and 1000.
  6. STRAND - the orientation of the feature.
  7. THICK_BEG - begin position of 'thick' drawing used for UTRs.
  8. THICK_END - end position of 'thick' drawing used for UTRs.
  9. ITEMRGB - RGB color code for feature.
  10. BLOCKCOUNT - number of exon blocks.
  11. BLOCKSIZES - list of block sizes.
  12. Q_BEGS - list of block begins.

Furthermore, an extra three helper columns are added to the record by read_bed:

  1. REC_TYPE - the type of record, here BED.
  2. BED_LEN - the length of the entire feature.
  3. BED_COLS - the number of BED columns (for speed).

So a typical 12 column BED record looks like this:

STRAND: -
Q_ID: AA695812
CHR_END: 31601
THICK_END: 31601
SCORE: 0
CHR_BEG: 31176
BED_LEN: 426
REC_TYPE: BED
BLOCKCOUNT: 1
CHR: chr4
THICK_BEG: 31176
Q_BEGS: 0,
BLOCKSIZES: 426,
ITEMRGB: 0
BED_COLS: 12
---

For more about the BED format:

http://genome.ucsc.edu/FAQ/FAQformat#format1

Usage

read_bed [options] -i <BED file(s)>

Options

[-?          | --help]               #  Print full usage description.
[-i <files!> | --data_in=<files!>]   #  Comma separated list of files or glob expression to read.
[-c <uint>   | --cols=<uint>]        #  Number of columns to read.
[-n <uint>   | --num=<uint>]         #  Limit number of records to read.
[-C          | --check]              #  Check integrity of BED entries.
[-I <file!>  | --stream_in=<file!>]  #  Read input stream from file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output stream to file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

To read all BED entries from a file:

read_bed -i test.bed

To read in only 10 records from a BED file:

read_bed -n 10 -i test.bed

To read in only 3 columns from a BED file:

read_bed -c 3 -i test.bed

To check the integrity of the BED entries use the -C switch, which will raise an error if the BED entry is malformatted:

read_bed -C -i test.bed

To read all BED entries from multiple files:

read_bed -i test1.bed,test2.bed

To read BED entries from multiple files using a glob expression:

read_bed -i '*.bed'

See also

write_bed

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

read_bed is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally