-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finish support for devirtualizing the HAL API. #6641
Comments
Hello, I had a look under the (note: I am new the runtime world so I hope you bear with the possibly stupid questions I might have :)) |
Did some tests with the current style and the struct handle style: https://godbolt.org/z/3Wr6Ko5ET We'd need to analyze our usage to see if this optimization is able to kick in - if it does work then we're done 🎉 |
The update: c++ needs it to be extern; I can fix up all the C impls though |
The extra validation can still be toggled with the build option to remove the binary size overhead but now is enabled by default unless requested per-command buffer. This avoids one additional allocation and the additional indirect jump that was defeating devirtualization analysis. Progress on #6641.
The extra validation can still be toggled with the build option to remove the binary size overhead but now is enabled by default unless requested per-command buffer. This avoids one additional allocation and the additional indirect jump that was defeating devirtualization analysis. Progress on #6641.
The extra validation can still be toggled with the build option to remove the binary size overhead but now is enabled by default unless requested per-command buffer. This avoids one additional allocation and the additional indirect jump that was defeating devirtualization analysis. Progress on #6641.
Not going to pursue this for now. Making the vtables smaller would be more worthwhile. |
The HAL C interface was designed to be devirtualized in size-optimized builds where only a single HAL backend is linked in. The implementation was not completed, though, and needs finishing.
The intent is that all of the vtables used within the HAL like iree_hal_buffer_vtable_t are stripped and dispatches to the vtables that all route through IREE_HAL_VTABLE_DISPATCH become direct function calls.
Instead of having a vtable that references some implementation function like
iree_hal_heap_buffer_map_range
, an alias foriree_hal_buffer_map_range_direct
that points atiree_hal_heap_buffer_map_range
would be created. Then any dispatch to the vtable through that macro becomes a direct call totype_prefix##method_name#_direct
. The vtable is then never used, stripped, and LTO takes care of stripping unused HAL implementations and such.This would reduce total binary size as only those HAL methods used would be required, improves call overhead as direct functions are used vs indirections through the vtables, and reduces binary size as no per-type vtables need to be stored.
There's a few gotchas that were never finished - like we still may need distinct vtable-like identifiers for type checking - but they can be unique byte memory locations vs. a full vtable with function pointers such that simple pointer equality works. We'll also need something that emits the
_direct
aliases - either automatically, via copy/paste, or macro magic around vtable definition. There may be some easier techniques for that too like making the vtable static and having_direct
calls go through the vtable - in theory a compiler should be plenty smart to devirtualize that for us (so we still have the vtable struct but it's only ever used from within the compilation unit).The text was updated successfully, but these errors were encountered: