Skip to content

SteadBytes/samplr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

samplr

samplr is a CLI tool to randomly sample data; generating a fixed size sample of input lines with uniform probabilities.

Installation

Source

Requires Rust to be installed.

git clone https://github.com/SteadBytes/sample.git
cd sample
cargo install --path .

Examples

Sample 15 lines from a file:

sample -n 15 things.txt

Sample 15 lines from standard input:

<things.txt | sample -n 15

Sample 15 lines from multiple files:

sample -n 15 things.txt other_things.txt

Sampling Algorithm

samplr uses a Reservoir Sampling algorithm to generate fixed size samples from an input stream of unknown length. For more details, see the implementation and the linked blog article.

Development

Tests

Run unit tests:

cargo test

Run all tests (including potentially CPU intensive statistical tests):

cargo test --all-features --release

About

CLI tool to randomly sample data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages