-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests in experimental/regression_suite/
are using a cache incorrectly
#18336
Comments
Reading through the code more, I have a fix in mind... but it's going to be tricky to test. We can keep using the Bugs are still possible if we change the remote file contents though. Using a cache implementation like huggingface's (git LFS) would be better - that downloads versions from git hashes and then creates symlinks into the refs. |
Alternate approach (closer to huggingface): always use a workspace-relative location, but create symlinks from the cache into that directory before the tests are run. |
that would be neat - especially if the cache paths are uniqued per hash such that you can be updating the cache with live symlinks to older versions concurrently (similar to what bazel does when linking from the sandbox to its storage) |
See what huggingface does here: https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache
They have tools for choosing where you want files to appear: https://huggingface.co/docs/huggingface_hub/en/guides/download#download-files-to-a-local-folder . The recommended options use symlinks with all the details about refs and blobs hidden from the user. |
Sketches of what a fix could look like here: https://github.com/iree-org/iree/compare/main...ScottTodd:infra-regression-suite-fix?expand=1. This is messy though - needs some deeper thinking. I also haven't tried running these tests locally - setup steps are complicated x_x. We've been bouncing these tests back and forth between this repo and https://github.com/nod-ai/SHARK-TestSuite/ (https://github.com/iree-org/iree-test-suites could also be an option, if the Azure paths were replaced with something easier to modify for contributors). The repository-specific things are compile flags / spec files and expected dispatch counts / benchmark metrics. |
We can also just rework this a little bit: https://github.com/iree-org/iree/blob/main/experimental/regression_suite/ireers_tools/fixtures.py. Instead of making it output a produced artifact, it can just save the vmfb in the local dir. We basically don't use ProducedArtifact at all. I can get a fix out for that. Should be pretty easy and clean. |
Okay yeah, I think that sounds good. |
We probably do need a better longterm solution for the mlirs, weights changing in the persistent cache (huggingface seems good), but we can be strategic about when we update the mlirs and weights in Azure for now (also doesn't happen often) |
Here's another failure: https://github.com/iree-org/iree/actions/runs/10530365837/job/29180735869#step:6:180
Maybe another run started compiling to the same path, overwriting/deleting the file that this run tried to use. |
What happened?
Based on logs at https://github.com/iree-org/iree/actions/runs/10527576602/job/29171566486?pr=18152#step:9:40, it seems that we now have multiple jobs reading and writing into a single persistent cache without watching for collisions.
Steps to reproduce your issue
What component(s) does this issue relate to?
Other
Version information
Tip of tree (e.g. a0945cc), since we added multiple runners.
Additional context
Test compiles a .vmfb file into the location referenced in the
IREE_TEST_FILES
env var:iree/experimental/regression_suite/shark-test-suite-models/sdxl/test_unet.py
Lines 123 to 128 in a0945cc
iree/experimental/regression_suite/ireers_tools/artifacts.py
Lines 40 to 43 in a0945cc
Benchmark (which could be running minutes later) assumes that the test already ran and put files in that location
iree/experimental/benchmarks/sdxl/benchmark_sdxl_rocm.py
Lines 18 to 22 in a0945cc
iree/experimental/benchmarks/sdxl/benchmark_sdxl_rocm.py
Lines 69 to 73 in a0945cc
These tests/benchmarks need to be refactored to be hermetic. They can read from a shared cache (checking that the files match expected hashes) and could write into a shared cache using carefully constructed keys/namespaces/hashes/subdirectories. What is usually safer is for tests to write into a build/test directory, not a cache.
Having the benchmarks reuse outputs from tests is also sketchy. I suspect those tests/scripts aren't easily runnable by hand if they require some specific sequencing and environment variables, and that is what we should optimize for first.
The text was updated successfully, but these errors were encountered: