Use HSC1k dataset for DataCache ASV benchmark#674
Conversation
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Co-authored-by: mtauraso <31012+mtauraso@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #674 +/- ##
=======================================
Coverage 62.95% 62.95%
=======================================
Files 58 58
Lines 5782 5782
=======================================
Hits 3640 3640
Misses 2142 2142 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR updates the DataCache ASV benchmarks to use the HSC1k sample dataset instead of CIFAR, enabling realistic cache performance measurements in CI. The change addresses an issue where the original benchmarks weren't working properly on main.
Changes:
- Switch benchmark dataset from HyraxCifarDataset to HSCDataSet
- Add pooch download logic with HSC1k archive constants (URL, hash, directory name)
- Update configuration to use HSC1k data directory and appropriate fields (image only, no label field)
- Rename benchmark methods from
*_cifarto*_hsc1k
Click here to view all benchmarks. |
1901c10 to
54a5407
Compare
drewoldag
left a comment
There was a problem hiding this comment.
Looks good, only one small comment.
| ) | ||
|
|
||
| self.h.config["general"]["results_dir"] = str(data_dir) | ||
| self.h.config["general"]["data_dir"] = str(hsc_data_dir) |
There was a problem hiding this comment.
Minor point - if ['general']['data_dir'] isn't otherwise needed, we could step back from using it.
|
General question, is the benefit of using HSC1k just to provide a realistic dataset for benchmarking? Are there additional reasons? |
|
AFAICT, this succeeds simply because the dataset is smaller. I was looking into several other reasons why the old version wasn't resulting in numbers...none of which ultimately lead anywhere. This idea to switch CIFAR -> HSC1k was just to make the problem smaller in case the underlying issue was somehow resources. |
Change Description
DataCache benchmarks on main needed a working dataset download path; this switches the benchmark to the HSC1k sample so CI can measure realistic cache preload timing and memory.
Solution Description
HSCDataSet.Code Quality
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.