Skip to content
Martin Asser Hansen edited this page Oct 1, 2015 · 5 revisions

Biopiece: uniq_vals

Description

uniq_vals selects records from the stream by checking values of a given key. If a duplicate record exists based on the given key, it will only output one record (the first). Thus, uniq_vals does not locate records where the value to the specified key is located only once (see count_vals). If the -i switch is used, then non-unique records are located.

Usage

... | uniq_vals [options]

Options

[-?          | --help]               #  Print full usage description.
[-k <string> | --key=<string>]       #  Key for which the value is checked for uniqueness.
[-i          | --invert]             #  Display non-unique records.
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following two column table in the file test.tab:

Human   H1
Human   H2
Human   H3
Dog     D1
Dog     D2
Mouse   M1

To locate all unique values of the first columen we use read_tab and pipe the result to uniq_vals:

read_tab -i test.tab | uniq_vals -k V0

V0: Human
V1: H1
---
V0: Dog
V1: D1
---
V0: Mouse
V1: M1
---

The result is three records, one unique for each V0.

If we instead want the non-unique records we use the -i switch with uniq_vals:

read_tab -i test.tab | uniq_vals -k V0 -i

V0: Human
V1: H2
---
V0: Human
V1: H3
---
V0: Dog
V1: D2
---

... and the result shows those records which duplicate values to V0.

So, how do we get the non-duplicated record with the Mouse? That is in fact not a job for uniq_vals, but rather for count_vals and grab.

read_tab -i test.tab | count_vals -k V0 | grab -e 'V0_COUNT=1'

V0: Mouse
V1: M1
V0_COUNT: 1
---

However, if we use both count_vals and uniq_vals we can obtain a list of how many times each of the records were duplicated based on the first column:

read_tab -i test.tab | count_vals -k V0 | uniq_vals -k V0_COUNT

V0: Human
V1: H1
V0_COUNT: 3
---
V0: Dog
V1: D1
V0_COUNT: 2
---
V0: Mouse
V1: M1
V0_COUNT: 1
---

See also

read_tab

count_vals

grab

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

uniq_vals is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally