-
Notifications
You must be signed in to change notification settings - Fork 23
read_tab
Tabular input can be read with read_tab which will read in chosen rows and chosen columns (separated by a given delimiter) from a table in ASCII text format.
If no --keys
are given and there is a comment line beginning with #
the fields
here will be used as keys.
read_tab [options] -i <table file(s)>
[-? | --help] # Print full usage description.
[-i <files!> | --data_in=<files!>] # Read tabular data from file.
[-d <string> | --delimit=<string>] # Changes delimiter - Default='\s+'
[-c <string> | --cols=<list>] # Comma separated list of cols to read in that order.
[-k <string> | --keys=<list>] # Comma separated list of keys to use for each column.
[-s <uint> | --skip=<uint>] # Skip number of initial records - Default=0.
[-n <uint> | --num=<uint>] # Limit number of records to read.
[-I <file!> | --stream_in=<file!>] # Read input stream from file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output stream to file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following table from the file from the file test.tab
:
Organism Sequence Count
Human ATACGTCAG 23524
Dog AGCATGAC 2442
Mouse GACTG 234
Cat AAATGCA 2342
Reading the entire table:
read_tab -i test.tab
The above command will result in 5 records, one for each row, where the keys V0, V1, V2 are the default keys for the columns:
V0: Organism
V2: Count
V1: Sequence
---
V0: Human
V2: 23524
V1: ATACGTCAG
---
V0: Dog
V2: 2442
V1: AGCATGAC
---
V0: Mouse
V2: 234
V1: GACTG
---
V0: Cat
V2: 2342
V1: AAATGCA
---
However, if the first line is a comment line that can be skipped using the -s
switch which
will skip a specified number of lines before reading. So to get the rows with data do:
read_tab -i test.tab -s 1
V0: Human
V2: 23524
V1: ATACGTCAG
---
V0: Dog
V2: 2442
V1: AGCATGAC
---
V0: Mouse
V2: 234
V1: GACTG
---
V0: Cat
V2: 2342
V1: AAATGCA
---
To explicitly name the columns (or the keys) use the -k
switch:
read_tab -i test.tab -s 1 -k ORGANISM,SEQ,COUNT
SEQ: ATACGTCAG
ORGANISM: Human
COUNT: 23524
---
SEQ: AGCATGAC
ORGANISM: Dog
COUNT: 2442
---
SEQ: GACTG
ORGANISM: Mouse
COUNT: 234
---
SEQ: AAATGCA
ORGANISM: Cat
COUNT: 2342
---
It is possible to select a subset of columns to read by using the -c
switch which takes
a comma separated list of columns numbers (first column is designated 0) as argument.
So to read in only the sequence and the count so that the count comes before the sequence do:
read_tab -i test.tab -s 1 -c 2,1
V0: 23524
V1: ATACGTCAG
---
V0: 2442
V1: AGCATGAC
---
V0: 234
V1: GACTG
---
V0: 2342
V1: AAATGCA
---
It is also possible to rename the columns with the -k
switch:
read_tab -i test.tab -s 1 -c 2,1 -k COUNT,SEQ
SEQ: ATACGTCAG
COUNT: 23524
---
SEQ: AGCATGAC
COUNT: 2442
---
SEQ: GACTG
COUNT: 234
---
SEQ: AAATGCA
COUNT: 2342
---
Last, if we change the first line in the ´test.tab´ line to include a ´#´ like this:
#Organism Sequence Count
Human ATACGTCAG 23524
Dog AGCATGAC 2442
Mouse GACTG 234
Cat AAATGCA 2342
...then the fields in this line will be used as keys:
read_tab -i test.tab
Organism: Human
Count: 23524
Sequence: ATACGTCAG
---
Organism: Dog
Count: 2442
Sequence: AGCATGAC
---
Organism: Mouse
Count: 234
Sequence: GACTG
---
Organism: Cat
Count: 2342
Sequence: AAATGCA
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
read_tab is part of the Biopieces framework.