pip install -r requirements.txt
huggingface-cli login
python prepare_imagefolder.py
python prepare_webdataset.py
python prepare_arrow.py
Each script prepares the data in a certain format:
prepare_imagefolder
downloads 10k images from imagenet into one folderprepare_webdataset
prepares the 10k images in 4 webdataset shards (TAR) using JPEG/quality=100 from the images fromprepare_imagefolder
prepare_arrow
prepares the 10k images in 4datasets
arrow shards using JPEG/quality=100 from the images fromprepare_imagefolder
python benchmark_imagefolder.py
python benchmark_webdataset.py
python benchmark_arrow.pypy
python benchmark_from_generator.py
Each benchmark computes the average examples/sec using a single process:
benchmark_imagefolder
usesdatasets
and the local imagesbenchmark_webdataset
uses the webdataset databenchmark_arrow
usesdatasets
and the arrow databenchmark_from_generator
usesdatasets
IterableDataset.from_generator and the TAR data
TBA