-
Notifications
You must be signed in to change notification settings - Fork 7
Description
We need a generic mechanism for defining new workloads and executing them through an unified execution engine interface.
With such an interface, it should be easy to get workload performance metrics, define benchmark suites, hook up workloads to CI and/or an autotuner, etc.
A workload is defined by the specific payload kind (e.g., a matmul, or MLP) with a fixed problem size. For the time being, we can assume that all input shapes are static, implying that the payload IR is fixed. A workload may involve an arbitrary lowering schedule (potentially with parameters, e.g., for tile sizes) and target arbitrary hardware. As such, they may require custom tools, such as custom methods to allocate/deallocate the payload input arrays before/after execution.
The Workload object could define methods:
- Constructor: Sets problem size and other payload options (e.g., elementwise post-processing ops). These are fixed.
payload_module(): Returns a new module with the payload function and necessary MLIR utility functions (if any). This module will be lowered in-place.schedule_module(parameters): Returns a new transform schedule module. The provided parameters sets all optional schedule parameters (if any).get_input_arrays(execution_engine): Returns handles to payload function input arrays. The execution engine is provided in case MLIR helper functions need to be called.allocate(execution_engine): A context manager that does necessary allocation/deallocation of resources for execution (if any).verify(execution_engine): Runs correctness test, e.g., against a reference solution. The execution engine is provided in case MLIR helper functions need to be called.requirements(): Returns a list of ExecutionEngine requirements, e.g. MLIR libs or GPU runtime lib.get_complexity(): Returns the computational complexity of the workload, e.g. number of floating points operations and number of bytes read/written from/to memory.
The execution engine interface could provide high-level methods such as:
lower_payload(workload, schedule_parameters): Returns the payload IR lowered using the transform schedule and its given parameters.execute(workload, schedule_parameters, check_correctness=True): Lowers the payload IR, allocates resources, executes it, (optionally) checks correctness, and deallocates resources.benchmark(workload, schedule_parameters, check_correctness=True): Executes the workload in a timer loop and returns measured timings. Theget_complexitymethod can be used to compute throughput etc metrics.
One useful property is that the same Workload object can be re-used between execution calls, and therefore we can optionally cache some data in the object. For example, in some cases it's possible to cache the payload input arrays and the reference solution, which speeds up autotuning workflows, for example, where the same workload may be executed thousands of times.
Addresses point 2. of #15.