Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vulkan phase2] Vulkan Runtime #6924

Merged
merged 176 commits into from
Apr 25, 2023
Merged

[vulkan phase2] Vulkan Runtime #6924

merged 176 commits into from
Apr 25, 2023

Conversation

derek-gerstmann
Copy link
Contributor

@derek-gerstmann derek-gerstmann commented Aug 5, 2022

This set of changes is Phase2 for adding Vulkan support, which adds the runtime class interfaces for interacting with the Vulkan API. Specifically, it adds the following files:

src/CodeGen_Vulkan_Dev.cpp
src/runtime/HalideRuntimeVulkan.h
src/runtime/mini_vulkan.h
src/runtime/vulkan.cpp
src/runtime/vulkan_context.h
src/runtime/vulkan_extensions.h
src/runtime/vulkan_functions.h
src/runtime/vulkan_interface.h
src/runtime/vulkan_internal.h
src/runtime/vulkan_memory.h
src/runtime/vulkan_resources.h
src/runtime/windows_vulkan.cpp

The forthcoming Phase3 will address testing and CI modifications.

@derek-gerstmann derek-gerstmann marked this pull request as draft August 5, 2022 19:37
@steven-johnson
Copy link
Contributor

C:\build_bot\worker\halide-testbranch-main-llvm16-x86-64-windows-cmake\halide-source\src\CodeGen_Vulkan_Dev.cpp(1082): error C2398: Element '3': conversion from 'SpvBuiltIn_' to '_Ty' requires a narrowing conversion
with
[
_Ty=uint32_t
]

Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks very nice; my comments are mostly nits but a few actionable items. My main concern here is that there are a lot of TODO comments (and also some if 0 unimplemented sections) in Codegen_Dev, and it's not clear to me which/how-many of them are "need to complete before this will work" vs "this should be improved someday but isn't blocking a functional initial release" -- clarifying those would IMHO make this easier for other folks to read and grasp the current state.

Makefile Show resolved Hide resolved
src/CodeGen_Vulkan_Dev.cpp Outdated Show resolved Hide resolved
src/CodeGen_Vulkan_Dev.cpp Outdated Show resolved Hide resolved
src/CodeGen_Vulkan_Dev.cpp Outdated Show resolved Hide resolved
src/CodeGen_Vulkan_Dev.cpp Outdated Show resolved Hide resolved
src/runtime/vulkan_context.h Outdated Show resolved Hide resolved
src/runtime/vulkan_extensions.h Outdated Show resolved Hide resolved
src/runtime/vulkan_functions.h Outdated Show resolved Hide resolved
src/runtime/vulkan_memory.h Outdated Show resolved Hide resolved
src/runtime/vulkan_memory.h Outdated Show resolved Hide resolved
@derek-gerstmann
Copy link
Contributor Author

C:\build_bot\worker\halide-testbranch-main-llvm16-x86-64-windows-cmake\halide-source\src\CodeGen_Vulkan_Dev.cpp(1082): error C2398: Element '3': conversion from 'SpvBuiltIn_' to '_Ty' requires a narrowing conversion with [ _Ty=uint32_t ]

Oooh, thanks! I'll fix this!

@derek-gerstmann
Copy link
Contributor Author

Generally looks very nice; my comments are mostly nits but a few actionable items. My main concern here is that there are a lot of TODO comments (and also some if 0 unimplemented sections) in Codegen_Dev, and it's not clear to me which/how-many of them are "need to complete before this will work" vs "this should be improved someday but isn't blocking a functional initial release" -- clarifying those would IMHO make this easier for other folks to read and grasp the current state.

Thanks! Okay, cool! Yes, the CodeGen_Dev is in the process of being refactored to use the SPIRV_IR classes and to address most of the TODO comments. Once this is in place, I'll update the docs and status for what's working, etc!

@steven-johnson
Copy link
Contributor

Monday Morning Review Ping -- where does this PR stand?

@derek-gerstmann
Copy link
Contributor Author

Monday Morning Review Ping -- where does this PR stand?

Apologies … I’m on vacation but will be back later this week! Most of the requested changes are pushed … remaining items are on the codegen, and docs!

@derek-gerstmann
Copy link
Contributor Author

Understandable, but... if a device doesn't have the uint8/uint16 extensions, then what other alternative is there?

With the devices I have at hand (Snapdragon 855/865), they support the feature shaderInt16 which allows them to use 16 bit integers in shader code. They are not able to use 16 bit buffers for input/output. Additionally, they support the extension VK_KHR_shader_float16_int8 which allows me to read 32 bit ints into 2 float16 and get better throughput.

Hmmm. I'm not sure how this might be added as a feature ... we'd need to know that you intend to pass a buffer of 32bit integers with the intent of indexing it as two float16s? And we'd have to know at CodeGen time, which would imply that you'd need some set of feature flags to trigger this. Feel free to add additional suggestions and use cases to the ticket I created:
#7478

Derek Gerstmann added 8 commits April 7, 2023 10:38
Cleanup constant value declarations with template helper methods
Add comments on workgroup size usage
Add nearest_multiple constraint to block/region allocator
Add nearest_multiple constrating to vulkan memory allocatori
+ fixes correctness/multiple_outputs test
Add vkCreateBuffer/vkDestroyBuffer debug output i
+ for gpu_object_lifetime_tracker
Cleanup shutdown for shader_module destruction
@derek-gerstmann derek-gerstmann dismissed steven-johnson’s stale review April 24, 2023 20:33

Changes applied. Ready to merge

@derek-gerstmann derek-gerstmann requested review from steven-johnson and removed request for steven-johnson April 24, 2023 20:34
@derek-gerstmann
Copy link
Contributor Author

@steven-johnson I don't have write access to do the merge ... could you approve this PR when you get a chance?

@steven-johnson
Copy link
Contributor

@steven-johnson I don't have write access to do the merge ... could you approve this PR when you get a chance?

Sure, but did the other reviewers approve it? It still says "needs 1 approving review"

@derek-gerstmann
Copy link
Contributor Author

@steven-johnson I don't have write access to do the merge ... could you approve this PR when you get a chance?

Sure, but did the other reviewers approve it? It still says "needs 1 approving review"

Nope. Nobody that has made comments has hit the approval button, even though all comments were addressed so far.

@steven-johnson
Copy link
Contributor

@rootjalex @shoaibkamil One of you, please approve this so we can land it :-)

Copy link
Member

@rootjalex rootjalex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback!

@derek-gerstmann derek-gerstmann merged commit 4d86539 into main Apr 25, 2023
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
…ide#7347)

In PyBind11, if you specify a default argument for a method, it is evaluated when the Python module is initialized, *not* when the method is called (as you might expect in C++). For defaults that are just constants/literals, this is no big deal, but when calling get_*_target_from_environment, this means it is called at module init time -- also normally not a big deal (since the values ~never change at runtime anyway), with one big exception (no pun intended): if the function throws an exception (e.g. via calling user_assert() or similar), that exception is thrown at Module-initialization time, which is a much more inscrutable crash, and one that is very hard to recover from.

This may seem unlikely, but can happen pretty easily if you set (say) HL_JIT_TARGET=host-cuda (or other gpu) and the given GPU runtime isn't present on the given system; the current behavior is basically "make if impossible for the libHalidePython bindings to run", whereas what we want is "runtime exception thrown when you call the method".

This changes the relevant methods to use `Target()` as the default value, and inside the method wrapper, if the value passed equals `Target()`, it replaces the value with the righ `get_*_target_from_environment()` call.

(This turned up while doing some testing of halide#6924 on a system without Vulkan available)
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
* Import Vulkan runtime changes from personal branch

* Fix build to work with latest changes in main

* Hookup Vulkan into Target, DeviceInterface and OffloadGPULoops

* Add Vulkan runtime to Makefile

* Add Vulkan target to Python bindings

* Add runtime linker support to target Vulkan CodeGen

* Add Vulkan windows decorator to runtime targets

* Wrap debug messages for internal runtime classes with DEBUG_INTERNAL
Error on failed string termination

* Silence clang-tidy warnings for redundant expressions on Vulkan enum values

* Clang tidy & format pass

* Fix formatting for single line statements

* Move Vulkan option to top-level CMakeLists.txt and enable SPIR-V as needed

* Fix Vulkan & SPIRV dependencies for makefile

* Add Halide version info to Makefile
Add HALIDE_VERSION compiler definitions to compilation

* Add HL_VERSION_FLAGS to RUNTIME_CXX_FLAGS

* Finish refactoring of Vulkan CodeGen to use SpirV-IR.
Added splitmix64 based hashing scheme for types and constants.
Numerous fixes to instruction packing.
Added debug symbols to all variables.

* Clang tidy/format pass.

* Fix formatting

* Remove leftover ifdef

* Fix build error for clang OSX for mismatched type comparison

* Refactor loops and conditionals to use blocks

* Clang tidy/format pass

* Add detailed comments for acquire context parameters

* Add comments describing loader method exports and dynamically resolved function pointers
Other minor cleanups

* Change aborts to debug asserts for context parameters.
Add error handling to acquire context.

* Cache Vulkan descriptor sets and other shader module objects in compilation cache for reuse

* Replace platform specific strncpy for grabbing Extension strings with StringUtils::copy_upto

* Enable device features for selected device

* Fix alignment constraints for to match Vulkan buffer memory requirements.
Add env vars to control Vulkan Memory Allocator config.

* Add Vulkan to list of supported APIs in README.md
Add Vulkan specific README_vulkan.md

* Clang tidy/format pass

* Fix conform_alignment to handle zero values

* Fix declaration of custom_allocation_callbacks to be static.
Change to constexpr for invalid values

* Whitespace change to trigger build.

* Handle Vulkan kernels that don't require storage buffers.
Updated test status. Fixes 7 test cases.

* Add src/mini_vulkan.h Apache 2.0 license requirements to License file

* Add descriptor set binding info as pre-amble to SPIR-V code module
Fix shared memory allocation to use global variables in workgroup storage space
Add extern calls for spirv and glsl builtins
Add memory fence call to gpu thread barrier
Add missing visitors to Vulkan CodeGen
Add scalar index & vector index methods for load/store

* Clang tidy & format pass

* Update test results for Vulkan docs. Passing: 326 Failing: 39

* Fix formatting

* Remove extraneous parentheses for is_array_type()

* Add Vulkan library to linkage fo Halide generator helpers

* Add SPIR-V formatted output (for debugging)

* Only declare SIMT intrinics that are actually used.
Cleanup & refactor add_kernel method.

* Add Vulkan handler to test targets

* Clang format/tidy pass

* Add doc-strings to SPIR-V interface

* Adjust runtime array to widest vector width based on alignment and dense vector loads/stores
Fix scalar and vector load/stores
Fix casts for vectors
Add missing nan, inf, neg_inf, is_finite builtins

* Add missing bitwise and logical and methods.
Cleanups.

* Add comments about necessary packages on Ubuntu v22.04 vs earlier versions

* Clang tidy & format pass.

* Update Vulkan test results. Pass: 329 Fail: 36

* Remove unused Produce/Consume visitor method

* Fix Molten VK initialization to work with v1.3+ loader
Add support for direct casts for same-size types
Add missing mux, mix, lerp, sinh, tanh, etc intrinsics
Add explicit storage access for variables
Add a macro to enable debug messages in Vulkan Memory Allocator

* Disable dynamic shared memory portion of test for Vulkan (since its not supported yet)

* Disable uncached portion of test for Vulkan (since it may OOM)

* Disable float64 support in Type::supports_type() for Vulkan target since it's not widely supported

* Fix Shuffle to handle all known cases
Hookup VulkanMemoryAllocator to gpu allocation cache.
Fix if_then_else to allow calls and statements to be used
Fix loop counter comparison, and don't allow dynamic loops to be unrolled.
Fix scalarize to use CompositeInsert instead of VectorInsertDynamic
Fix FMod to use FRem (cause SPIR-V's FMod doesn't do what you'd expect ... but FRem does?!)
Use exact same sematics for barriers as GLSL Compute ... still not passing everything
Fix SPIR-V block termination checks, keys for null constants, and other cleanups

* Clang tidy & format pass

* Update correctness test results.  PASS: 338, FAIL: 27

* Move counter inside debug #define to fix build

* Relax tolerance for newton's method to match other GPU APIs
Skip gpu dynamic shared testfor Vulkan (since dynamic shared allocations aren't supported yet)
Update correctness test status. PASS: 340, FAIL: 25

* Clang format/tidy pass

* Skip Vulkan for float64 for correctness test round (since f64 is optional)

* Skip Vulkan for tests that rely upon device crop, and slice.

* Only test small vector widths for Vulkan (since widths >=8 are optional)

* Caninicalize gpu vars for Vulkan

* Fix loop initialization, and increments
Add all explicit types, and fix constant declarations
Add missing fast intrinsics
Convert results of logical ops into expected types (instead of bools)

* Add SpvInstruction::add_operands(), add_immediates() and template based
append()
Make integer logical operations explicit.
Better handling of constant data.

* Clang format & tidy pass

* Fix windows build ... refactor convert_to_bool to use std::vectors
rather than dynamic fixed sized arrays

* Skip asyn_device_copy, device_buffer_copy, device_crop, and device_slice
tests for Vulkan (for now).

* Don't test large vector widths for Vulkan (since they are optionally
supported)

* Clear Vulkan buffer allocations prior to use (tbd if this is necessary)

* Skip Vulkan for async copy chain test

* Skip Vulkan for interpreter test

* Clang tidy/format pass

* Fix formatting

* Fix build ... use error messages for errors

* Separate shared memory resources by element type for Vulkan.

* Add Vulkan to conditional for fusing gpu loops

* Reorder reset method to match declaration ordering.

* Cleanup debug log messages for Vulkan resources

* Assert alignment is power of two

* Only split regions that have already been freed.
Add more debug messages to log

* Explicitly cleanup Vulkan command buffers as after they are used
Avoid recreating descriptor sets
Tidy up Vulkan debug messages

* Fix Div, Mod, and div_round_to_zero for integer cases
Cleanup reset method

* Skip Vulkan for async_copy_chain

* Skip 64-bit values on Vulkan since they are optionally supported

* Skip interleave_rgb for Vulkan (which doesn't support cropping)

* Skip interpreter for Vulkan (which doesn't support dynamic allocation of
shared mem).

* Clang Tidy/Format pass

* Handle calls to pow with negative values for Vulkan
Add integer and float constant helpers to SPIRV

* Only test real numbers for pow with Vulkan

* Clang tidy/format pass

* Fix logic so a region request of an entire block matches if exactly the same size as an empty block

* Create a zero size buffer to check for alignment
Return null handles after freeing

* Add more verbose debug output for malloc

* Fix UConvert logic to avoid narrowing an integer type less than 8 bits
Remove optimization path for division which seems to fail worse than DIV
Cleanup DIV and MOD operators

* Clang format/tidy pass

* Fix SConvert & UConvert ops

* Add retain semantics to block allocator interface
Update test to validate retain/release/reclaim functionality

* Implement device_crop, device_slice and release_crop for Vulkan.
Re-enable device_crop, device_slice and interleave_rgb tests.

* Clang format/tidy pass

* Implement device copy for Vulkan.
Enable device copy test.

* Clang format/tidy pass

* Fix signed mod operator and use euclidean identity (just like glsl)

* Clang format/tidy pass

* Fix to handle Mod on vectors (use vector constant for bitwise and)

* Fix pow operator for Vulkan, and re-enable math test to full range.

* Add error checking for return types for conditionals
Use bool types for ops that require them, and adapt to expected return
types

* Handle deallocation for existing regions prior to coalescing.
Cleanup region allocator logic for availability.
Augment block_allocator test to cover allocation reuse.

* Clang tidy/format pass

* Fix reserved accounting for regions

* Add more details to Windows specific Vulkan build config

* Update SPIR-V headers to v1.6

* Add support for dynamic shared memory allocations for Vulkan
Add dynamic workgroup dispatching to Vulkan
Add optional feature flags for Vulkan capabilities
Add Vulkan API version flags for target features
Enable v1.3 path if requested
Re-enable tests for added features
Update Vulkan docs with status updates and feature flags

* Enable Vulkan asyc_device_copy test.

* Disable Vulkan performance test for async gpu (for now).

* Disable Vulkan from python AOT tests and tutorials (since it requires linkage against the vulkan loader system library).

* Update Vulkan readme with latest status.  Everything works!  More or less. =)

* Clang format pass

* Cleanup formatting for Halide version info in Makefile

* Fix typos and address review comments for Vulkan readme

* Change value casts to match Halide conventions

* Fix typos in comments

* Add static_assert to rotl to make compilation errors clearer (instead of using enable_if)
Fix debug(3) formatting to avoid super long messages
Use lookup table for SPIR-V op code names

* Fix typos and logic for Vulkan capabilities

* Remove leftover debug ifdef

* Fix typo in comments

* Rename copy_upto(...) method to be copy_up_to(...)

* Handle error case for uninitialized buffer allocation (rather than abort)
Fix typos in comments

* Support any arbitary number of devices and queues for context creation
Fix typos in comments

* Add get/set alloc_config methods and API hooks for configuring the VulkanMemoryAllocator

* Remove leftover debug ifdef

* Hookup API methods for get/set alloc_config when initializing the VulkanMemoryAllocator

* Remove empty lines in main

* Add required capability flags for 8-bit and 16-bit uniform and storage buffer access
Handle casts for GLSL ops (spec requires all args to be the same type as the return type)

* Add VkPhysicalDevice8BitStorageFeaturesKHR and related constants

* Query for 8-bit and 16-bit uniform and storage access support.
Enable these as part of the device feature query chain.

* Use VK_WHOLE_SIZE for setting buffer (to pass validation ... otherwise size has to be a multiple of alignment)
Remove useless debug asserts for static variables
Fix debug logging messages for allocations of scalars (which may not have a dim array)

* Query for device limits to enforce min alignment constraints for storage and uniform buffers

* Fix shutdown sequence to iterate over descriptor sets
Avoid bug in validation layer by reordering destruction sequence

* Clang format & tidy pass

* Fix logic for locating entry point shader binding ... assume exact match for entry point name
Cleanup entry point binding variables and clarify usage

* Remove accidentally uncommented debug statements

* Cleanup debug output for buffer related updates

* Fix split and allocate methods in region allocator to fix issues with alignment constraints
- discovered a hang if requested size couldn't be fulfilled after adjusting to aligned sizes
- cause was incorrect splitting of existing regions
Cleanup region allocator iteration, cleanup and shutdown
Added maximum_pool_size configuration option to Vulkan Memory Allocator to restrict pool sizes

* Added notes about TARGET_VULKAN=ON being the default now
Added links to LunarG MoltenVK SDK installer, and brew packages

* Fix markdown formatting

* Fix error code handling in Vulkan runtime and internal datastructures.
Refactor all (well nearly all) return values to use halide error codes.
Reduce the usage of abort_if() for recoverable errors.

* Fix typo in error message

* Fix typo in readme

* Skip GPU allocation cache test on MacOSX since MoltenVK only supports 30
buffers to be allocated

* Skip widening reduction test on Vulkan for Mac OSX/IOS since MoltenVK
fails to translate calls with vector types for builtins like min/max. etc

* Skip doubles in vector cast test on Vulkan for Mac OSX/IOS since Molten
doesn't support them

* Skip gpu_dynamic_shared and gpu_specialize test for Vulkan on Mac
OSX/IOS since MoltenVK doesn't support the dynamic shared memory
allocation or dynamic grid size.

* Clang format / tidy pass

* Resolve conflicts for mini_webgpu.h ... revert to main

* Use unique intrinsic var names for each kernel
Cleanup constant value declarations with template helper methods
Add comments on workgroup size usage

* Wrap debug output under ifdef DEBUG_RUNTIME_INTERNAL macro guard
Add nearest_multiple constraint to block/region allocator

* Add vk_clear_device_buffer utility method
Add nearest_multiple constrating to vulkan memory allocatori
+ fixes correctness/multiple_outputs test
Add vkCreateBuffer/vkDestroyBuffer debug output i
+ for gpu_object_lifetime_tracker
Cleanup shutdown for shader_module destruction

* Add note about nearest_multiple constraint for vulkan memory allocator

* Hookup gpu_object_lifetime_tracker with Vulkan debug statements

* Skip dynamic shared memory portion of test for Vulkan on iOS/OSX.

* Fix stale comment for float type support.
Fix incorrect lowering for intrinsic.

---------

Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com>
Co-authored-by: Steven Johnson <srj@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants