Once we have the test_kernel_bench running well, we also need to add a benchmarking version.
The idea is to run this in the public CI as a test (make sure it still works, single run, CHECK output) and inside Intel as a benchmark (report and track performances).
It's hard to track performance using just LIT tests, so this should be run manually and just print the GFLOPS on the test result, like we have the tpp-mlir benchmarks. This should also run on different architectures so depends on #141 to be a good tracker.
Once we have the XeGPU version, we can do the same in our cluster. If others want to add Arm, AMD, NV testing on their own local infra, we should have the schedules and tests upstream, just running them elsewhere.
Once we have the
test_kernel_benchrunning well, we also need to add a benchmarking version.The idea is to run this in the public CI as a test (make sure it still works, single run, CHECK output) and inside Intel as a benchmark (report and track performances).
It's hard to track performance using just LIT tests, so this should be run manually and just print the GFLOPS on the test result, like we have the tpp-mlir benchmarks. This should also run on different architectures so depends on #141 to be a good tracker.
Once we have the XeGPU version, we can do the same in our cluster. If others want to add Arm, AMD, NV testing on their own local infra, we should have the schedules and tests upstream, just running them elsewhere.