Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: random_records

Description

random_records can be used to sample a number of random records from the stream, which may be useful for inspection of large datasets. All records in the stream are written to a temporary file. Next a list of random record numbers is created, and the temporary file is read and records matching the list are emitted. This is robust, but time consuming for big data sets.

random_records does not randomize the order of records in the stream.

Usage

... | random_records [options]

Options

[-?         | --help]               #  Print full usage description.
[-n <uint>  | --num=<uint>]         #  Number of random records to select  -  Default=10
[-I <file!> | --stream_in=<file!>]  #  Read input from stream file         -  Default=STDIN
[-O <file>  | --stream_out=<file>]  #  Write output to stream file         -  Default=STDOUT
[-v         | --verbose]            #  Verbose output.

Examples

To obtain 100 randomly chosen records from the stream, do:

... | random_records -n 100

See also

shuffle_seq

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

August 2007

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

random_records is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally