-
Notifications
You must be signed in to change notification settings - Fork 23
uniq_vals
uniq_vals selects records from the stream by checking values of a given key. If a duplicate
record exists based on the given key, it will only output one record (the first). Thus, uniq_vals does not locate records
where the value to the specified key is located only once (see count_vals). If the -i
switch
is used, then non-unique records are located.
... | uniq_vals [options]
[-? | --help] # Print full usage description.
[-k <string> | --key=<string>] # Key for which the value is checked for uniqueness.
[-i | --invert] # Display non-unique records.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following two column table in the file test.tab
:
Human H1
Human H2
Human H3
Dog D1
Dog D2
Mouse M1
To locate all unique values of the first columen we use read_tab and pipe the result to uniq_vals:
read_tab -i test.tab | uniq_vals -k V0
V0: Human
V1: H1
---
V0: Dog
V1: D1
---
V0: Mouse
V1: M1
---
The result is three records, one unique for each V0.
If we instead want the non-unique records we use the -i
switch with uniq_vals:
read_tab -i test.tab | uniq_vals -k V0 -i
V0: Human
V1: H2
---
V0: Human
V1: H3
---
V0: Dog
V1: D2
---
... and the result shows those records which duplicate values to V0.
So, how do we get the non-duplicated record with the Mouse
? That is in fact not a job
for uniq_vals, but rather for count_vals and grab.
read_tab -i test.tab | count_vals -k V0 | grab -e 'V0_COUNT=1'
V0: Mouse
V1: M1
V0_COUNT: 1
---
However, if we use both count_vals and uniq_vals we can obtain a list of how many times each of the records were duplicated based on the first column:
read_tab -i test.tab | count_vals -k V0 | uniq_vals -k V0_COUNT
V0: Human
V1: H1
V0_COUNT: 3
---
V0: Dog
V1: D1
V0_COUNT: 2
---
V0: Mouse
V1: M1
V0_COUNT: 1
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
uniq_vals is part of the Biopieces framework.