Skip to content
/ randlines Public

Print out random number of lines from a line oriented file.

License

Notifications You must be signed in to change notification settings

miku/randlines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

randlines

WIP: currently broken.

crates.io

Print out random number of lines from a line oriented file. Pick up where shuf gets killed.

Installation

$ cargo install randlines

Usage

$ randlines -h
randlines 0.1.1

Emit a random subset of lines from a file. This is a probabilistic program, you
will not get exactly `n` lines.

Typically, you can use shuf(1) which uses reservoir sampling and is very
efficient. However, if we want to extract 10M random lines from a file of 100M
lines, shuf(1) might be killed. However, randlines will not shuffle lines, just
skip over random number of lines.

USAGE:
    randlines [OPTIONS] [input]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -n <n>                          [default: 16]
    -s, --size-hint <size-hint>

ARGS:
    <input>

Emit a random subset of lines from a file. This is a probabilistic program, you will not get exactly n lines.

Typically, you can use shuf(1) which uses reservoir sampling and is very efficient. However, if we want to extract 10M random lines from a file of 100M lines, shuf(1) might be killed. However, randlines will not shuffle lines, just skip over random number of lines.

TODO

  • compress temporary output when reading from stdin
  • make --size-hint actually work

About

Print out random number of lines from a line oriented file.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published