-
Notifications
You must be signed in to change notification settings - Fork 23
random_records
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
random_records can be used to sample a number of random records from the stream, which may be useful for inspection of large datasets. All records in the stream are written to a temporary file. Next a list of random record numbers is created, and the temporary file is read and records matching the list are emitted. This is robust, but time consuming for big data sets.
random_records does not randomize the order of records in the stream.
... | random_records [options]
[-? | --help] # Print full usage description.
[-n <uint> | --num=<uint>] # Number of random records to select - Default=10
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
To obtain 100 randomly chosen records from the stream, do:
... | random_records -n 100
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
random_records is part of the Biopieces framework.