-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
This issue outlines the required steps to add an XeGPU matrix multiplication benchmark.
mlir-genshould be a Python module.- Move to
pythondir tree as a module, e.g.,python/lighthouse/payload_generator/payload_generator.py - Add command line interface, e.g.,
python/lighthouse/payload_generator/cli.py - Map to an executable script, e.g.,
mlir-geninpyproject.toml:[project.scripts] mlir-gen = "lighthouse.python.ingress.payload_generator:cli"
- Move to
- Generic infrastructure to define and execute workloads
- A
Workloadobject to hold the payload IR and fixed metadata (e.g. problem size). - Provides methods for getting payload IR, schedule IR, input arguments for calling payload, (optionally) correctness verification, etc.
- A generic
execution_enginewrapper that can execute aWorkloadand can, e.g., run it in a timer loop for benchmarking.
- A
- XeGPU matmul Workload and benchmarking tools
- Currently supports only matmul + elementwise post ops.
- Location:
python/lighthouse/benchmark/xegpu/matmul.py(?) - Generates payload, defines lowering schedule, compiles, executes, measures performance.
- Uses existing functionality: workload, payload generator, execution toos, etc.
- Exposed as another CLI command, e.g.,
benchmark_xegpu_matmul
- Installation mechanism for XeGPU support:
- Compile LLVM with LevelZero runtime and necessary flags.
- Hook up LLVM Python bindings with Lighthouse.
- Simply set
PYTHONPATH="$LLVM_INSTALL_PATT/python_packages/mlir_core/"
- Simply set
- Provide an easy-to-use install mechanism.
- Build instructions in README(?).
- Or a generic build script, invoked on install(?).
mlir-python-bindingsPython package is not installed in this case(?).
- When executing kernels, raise a descriptive error if Xe GPU device/drivers are missing.
Examples of the benchmark command line tool usage:
$ benchmark_xegpu_matmul
sizes=4096,4096,4096 dt=f16,f32 wg-tile=256,256 sg-tile=32,32 ... time(ms): 1.76 GFLOPS: 78072$ benchmark_xegpu_matmul --sizes 4096 2048 1024 --wg-tile-size 256 256 ...
...$ benchmark_xegpu_matmul --dump-kernel {intial,tiled,vectorized,bufferized,xegpu-wg,...}
<prints payload IR at this stage of lowering>$ benchmark_xegpu_matmul --dump-kernel xegpu-wg --dump-schedule
<also dumps transform schdule used to lower to this level>Metadata
Metadata
Assignees
Labels
No labels