Add deferred calls to the sequencer IR and runtime #40
Labels
compiler/dialects
Relating to the IREE compiler dialects (flow, hal, vm)
enhancement ➕
New feature or request
Projects
A deferred call allows for explicit compile-time indication of which parts of the sequencer execution graph are optimal for coalescing and possible batching. To start we can use heuristics to identify candidates (large conv/matmul/etc) while in the future we can add cost analysis and profile-guided annotation. The runtime can trigger fiber yielding and manage the policy used to flush pending deferred calls.
Dynamic shapes will be required to effectively perform batching, however coalescing should be possible even with fully static shapes. Ideally we would be able to loosen static shaping of call trees to allow batching even when the input HLO is fully shaped by either inserting dynamic dimensions or making outer dimensions dynamic when it would cause no observable changes.
The text was updated successfully, but these errors were encountered: