Repository for stream sampler for implementing sampling technique to get a fixed-sized sample from a dataset with unknown size, using fast reservior sampling techinque.
- Python 3.7.4
- Unix/Linux
- Pytest
Executing tests:
pytest tests
Executing from within Python on a sample string "THEQUICKBROWNFOXJUMPSOVERTHELAZYDOG"
:
python3 run.py
Executing stream-sampler for values piped from a file:
cat line_letters.txt | ./stream-sampler.py 5
- Environment path is assumed to be
/usr/local/bin/python3
. - Encoding is set to
UTF-8
. - Files are assumed to have strings seperated by line-break.
- Character/word spaces in the same line aren't trimmed or ignored.