Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppVeyor: try to produce a better build version #1

Merged

Conversation

tangent-vector
Copy link
Contributor

I don't want to have to try and manually keep a version number in appveyor.yml up to date with any versioning for releases, etc.
This change tries to derive a reasonable version name from a Git tag, a pull request number, or a branch name (in that order).
I then append the AppVeyor build number to the end to try to ensure that we always have something unique (which is a requirement for AppVeyor).

I don't want to have to try and manually keep a version number in `appveyor.yml` up to date with any versioning for releases, etc.
This change tries to derive a reasonable version name from a Git tag, a pull request number, or a branch name (in that order).
I then append the AppVeyor build number to the end to try to ensure that we always have something unique (which is a requirement for AppVeyor).
- Also add link to build "badge" so that I can more easily watch build status.
@tangent-vector tangent-vector merged commit 4d63b6f into shader-slang:master Jun 12, 2017
tangent-vector added a commit to tangent-vector/slang that referenced this pull request May 2, 2018
This change adds support for specifying explicit register spaces, like:

```hlsl
// Bind to texture register shader-slang#2 in space shader-slang#1
Texture2D t : register(t2, space1);
```

I added a test case to confirm that the register space is properly propagated through the Slang reflection API.

This change also adds proper error messages for some error/unsupported cases that weren't being diagnosed:

* Specifying a completely bogus register "class" (e.g., `register(bad99)`)
* Failing to specify a register index (`register(u)`)
* Specifying a component mask (`register(t0.x)`)
* Using `packoffset` bindings

I added test cases to cover all of these, as well as the new errors around support for register `space` bindings.

In order to get the existing tests to pass, I had to remove explicit `packoffset` bindings from some DXSDK test shaders.
None of these `packoffset` bindings were semantically significant (they matched what the compiler would do anyway, for both Slang and the standard HLSL compiler). Removing them is required for Slang now that we give an explicit error about our lack of `packoffset` support.
In a future change we might add logic to either detect semantically insignificant `packoffset`s, or to just go ahead and support them properly (as a general feature on `struct` types).
tangent-vector pushed a commit that referenced this pull request May 2, 2018
This change adds support for specifying explicit register spaces, like:

```hlsl
// Bind to texture register #2 in space #1
Texture2D t : register(t2, space1);
```

I added a test case to confirm that the register space is properly propagated through the Slang reflection API.

This change also adds proper error messages for some error/unsupported cases that weren't being diagnosed:

* Specifying a completely bogus register "class" (e.g., `register(bad99)`)
* Failing to specify a register index (`register(u)`)
* Specifying a component mask (`register(t0.x)`)
* Using `packoffset` bindings

I added test cases to cover all of these, as well as the new errors around support for register `space` bindings.

In order to get the existing tests to pass, I had to remove explicit `packoffset` bindings from some DXSDK test shaders.
None of these `packoffset` bindings were semantically significant (they matched what the compiler would do anyway, for both Slang and the standard HLSL compiler). Removing them is required for Slang now that we give an explicit error about our lack of `packoffset` support.
In a future change we might add logic to either detect semantically insignificant `packoffset`s, or to just go ahead and support them properly (as a general feature on `struct` types).
arquelion pushed a commit to arquelion/slang that referenced this pull request Jan 5, 2022
ArielG-NV referenced this pull request in ArielG-NV/slang Feb 20, 2024
…entation + changed incorrect HLSL.meta assumptions

(#1)`property` declaration as *non member* implementation change/fix (all of the changes to `slang-lower-to-ir.cpp`)

using (#1), implemented subgroup builtin's for GLSL/SPIR-V; did not implement built'ins completly for HLSL/CUDA due to non trivial implementations. CPP has no implementation due to missing support of system values

changed some incorrect HLSL.meta subgroup implementation assumptions of type usage (bit casting 8bit->32bit, wrong capabilities causing errors)

dumping ast crash with spir-v when using builtin's fixed by adding the `builtin` spirv case (all of the changes to `slang-ast-dump.cpp`)

[ForceInline] addition to functions missing it

return instead of spirv_asm when empty blocks are used
ArielG-NV referenced this pull request in ArielG-NV/slang Feb 20, 2024
…entation + changed incorrect HLSL.meta assumptions

(#1)`property` declaration as *non member* implementation change/fix (all of the changes to `slang-lower-to-ir.cpp`)

using (#1), implemented subgroup builtin's for GLSL/SPIR-V; did not implement built'ins completly for HLSL/CUDA due to non trivial implementations. CPP has no implementation due to missing support of system values

changed some incorrect HLSL.meta subgroup implementation assumptions of type usage (bit casting 8bit->32bit, wrong capabilities causing errors)

dumping ast crash with spir-v when using builtin's fixed by adding the `builtin` spirv case (all of the changes to `slang-ast-dump.cpp`)

[ForceInline] addition to functions missing it

return instead of spirv_asm when empty blocks are used
csyonghe added a commit that referenced this pull request Feb 27, 2024
…3548 (#3580)

* Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548

Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548

GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt

Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing.

GL_KHR_shader_subgroup_basic{
**Partially implemented**

Implementation:
    * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411.

    * Functions were reimplemented despite nearly mirrored HLSL functions due to:
        * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup:
            * `__syncwarp` vs `__syncthreads`
            * `SubgroupMemory` vs `WorkgroupMemory`
            * etc.
        * hlsl.meta implementations target broader SPIR-V memory targets to block on:
            * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory
        * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches

Testing:
tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang`
    * these tests do not test functionality since not implemented yet

tests for the functions -- `tests/glsl/shader-subgroup-basic.slang`
    * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory
        * due to testing tools avaible there are no tests for ImageMemory
    * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation.

}

GL_KHR_shader_subgroup_vote{
**Fully implemented**

Implementation:
    * 3/3 functions are using the hlsl.meta implementation

Testing:
`tests/glsl/shader-subgroup-vote.slang`
    * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct

}

GL_KHR_shader_subgroup_ballot{
**Partially implemented**

Implementation:
    There are 10/10 functions that are implemented:
    * 3 are using hlsl.meta implementation
    * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA
        * These implementations do not exist in hlsl.meta, so they were added
        * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated:
            * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that `   (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1`
            * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs.
            * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot`
        * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit

Testing:
`tests/glsl/shader-subgroup-ballot.slang`
    * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test)

}

GL_KHR_shader_subgroup_arithmetic{
**Partially implemented**

Implementation:
    * There are 21 functions to implement:
        * 14 functions are using the hlsl.meta implementation
        * 7 functions are new implementations -- only implemented for GLSL and SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required
            * CUDA, CPP, HLSL are out of scope for the commit

Testing:
`tests/glsl/shader-subgroup-arithmetic.slang`
    * all tests silently kill the shader; outputted GLSL was checked, could not see an issue
    * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit]

}

GL_KHR_shader_subgroup_shuffle{
**Partially implemented**

Implementation:
    * There are 2 functions to implement:
        * 1 function is using the existing hlsl.meta implmentation
        * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required

Testing:
`tests/glsl/shader-subgroup-shuffle.slang`
    * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit]
    * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called

}

GL_KHR_shader_subgroup_shuffle_relative{
**Partially implemented**

Implementation:
    * There are 2 functions to implement:
        * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required

Testing:
`tests/glsl/shader-subgroup-shuffle-relative.slang`
    * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit]

}

GL_KHR_shader_subgroup_clustered{
**Partially implemented**

Implementation:
    * There are 7 functions to implement:
        * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required

Testing:
`tests/glsl/shader-subgroup-shuffle-clustered.slang`
    * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit]

}

GL_KHR_shader_subgroup_quad{
**Partially implemented**

Implementation:
    * There are 4 functions to implement:
        * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL

Testing:
`tests/glsl/shader-subgroup-shuffle-quad.slang`
    * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit]

}

---------
Failing tests and why:

Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){
    tests/glsl/shader-subgroup-arithmetic.slang.3
    tests/glsl/shader-subgroup-arithmetic.slang.4
    tests/glsl/shader-subgroup-ballot.slang.4
    tests/glsl/shader-subgroup-basic.slang.3
    tests/glsl/shader-subgroup-basic.slang.4
    tests/glsl/shader-subgroup-quad.slang.3
    tests/glsl/shader-subgroup-quad.slang.4
    tests/glsl/shader-subgroup-vote.slang.3
    tests/glsl/shader-subgroup-vote.slang.4
}

Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{
    tests/glsl/shader-subgroup-shuffle.slang.4
}

Note: due to a unknown silent error the following will fail [could not spot an error in the generated glsl and spir-v]{
    tests/glsl/shader-subgroup-arithmetic.slang.5 (vk)
    tests/glsl/shader-subgroup-arithmetic.slang.6 (vk)
}

Other notes of worthy:{

    * only a few types are checked currently in tests due to equality templates not allowing freely casting to int/uint, meaning to test types en-mass is not trivial and will most likley be completly replaced once templates can cast & check equality more freely.

    * did not implement vector types for any functions that may use them (mostly in reference to SPIR-V, since many may accept scalar or vector inputs); applicable to subgroup-shuffle, subgroup-shuffle-relative, subgroup-arithmetic, subgroup-shuffle, subgroup_clustered, subgroup_quad

    * did not implement checks for half floats

    * CUDA, CPP, HLSL implementations were largly out of scope and if not implemented, this is due to the implementation not being trivial

}

Random fixes encountered:{
    * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll
}

* added vector types and tests;

Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548

GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt

GL_KHR_shader_subgroup_* & GLSL ref:
    * https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt
    * https://www.khronos.org/blog/vulkan-subgroup-tutorial
    * https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf

HLSL ref:
    * https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions
    * https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics

CUDA ref:
    * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

SPIR-V ref:
    * https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_memory_semantics_id

Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing.

GL_KHR_shader_subgroup_basic{
**Partially implemented**

Implementation:
    * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411.

    * Functions were reimplemented despite nearly mirrored HLSL functions due to:
        * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup:
            * `__syncwarp` vs `__syncthreads`
            * `SubgroupMemory` vs `WorkgroupMemory`
            * etc.
        * hlsl.meta implementations target broader SPIR-V memory targets to block on:
            * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory
        * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches

Testing:
tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang`
    * these tests do not test functionality since not implemented yet

tests for the functions -- `tests/glsl/shader-subgroup-basic.slang`
    * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory
        * due to testing tools avaible there are no tests for ImageMemory
    * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation.

}

GL_KHR_shader_subgroup_vote{
**Fully implemented**

Implementation:
    * 3/3 functions are using the hlsl.meta implementation

Testing:

`tests/glsl/shader-subgroup-vote.slang`
    * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct

}

GL_KHR_shader_subgroup_ballot{
**Partially implemented**

Implementation:
    There are 10/10 functions that are implemented:
    * 3 are using hlsl.meta implementation
    * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA
        * These implementations do not exist in hlsl.meta, so they were added
        * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated:
            * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that `   (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1`
            * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs.
            * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot`
        * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit

Testing:
`tests/glsl/shader-subgroup-ballot.slang`
    * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test)

}

GL_KHR_shader_subgroup_arithmetic{
**Partially implemented**

Implementation:
    * There are 21 functions to implement:
        * 14 functions are using the hlsl.meta implementation
        * 7 functions are new implementations -- only implemented for GLSL and SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required
            * CUDA, CPP, HLSL are out of scope for the commit

Testing:
`tests/glsl/shader-subgroup-arithmetic.slang`
    * all tests silently kill the shader; outputted GLSL was checked, could not see an issue
    * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]

}

GL_KHR_shader_subgroup_shuffle{
**Partially implemented**

Implementation:
    * There are 2 functions to implement:
        * 1 function is using the existing hlsl.meta implmentation
        * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required

Testing:
`tests/glsl/shader-subgroup-shuffle.slang`
    * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
    * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called

}

GL_KHR_shader_subgroup_shuffle_relative{
**Partially implemented**

Implementation:
    * There are 2 functions to implement:
        * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required

Testing:
`tests/glsl/shader-subgroup-shuffle-relative.slang`
    * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]

}

GL_KHR_shader_subgroup_clustered{
**Partially implemented**

Implementation:
    * There are 7 functions to implement:
        * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required

Testing:
`tests/glsl/shader-subgroup-shuffle-clustered.slang`
    * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]

}

GL_KHR_shader_subgroup_quad{
**Partially implemented**

Implementation:
    * There are 4 functions to implement:
        * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL

Testing:
`tests/glsl/shader-subgroup-shuffle-quad.slang`
    * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]

}

---------
Failing tests and why:

Note: test numbers are assuming none of the existing tests are toggled off

Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){
    tests/glsl/shader-subgroup-arithmetic.slang.3
    tests/glsl/shader-subgroup-arithmetic.slang.4
    tests/glsl/shader-subgroup-ballot.slang.4
    tests/glsl/shader-subgroup-basic.slang.3
    tests/glsl/shader-subgroup-basic.slang.4
    tests/glsl/shader-subgroup-quad.slang.3
    tests/glsl/shader-subgroup-quad.slang.4
    tests/glsl/shader-subgroup-vote.slang.3
    tests/glsl/shader-subgroup-vote.slang.4
}

Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{
    tests/glsl/shader-subgroup-shuffle.slang.4
    tests/glsl/shader-subgroup-shuffle-relative.slang.4
    tests/glsl/shader-subgroup-basic.slang.4
}

Note: due to a unknown silent error the following will fail [could not spot an error in the generated glsl and spir-v]{
    tests/glsl/shader-subgroup-arithmetic.slang.5 (vk)
    tests/glsl/shader-subgroup-arithmetic.slang.6 (vk)
}

Other notes of worthy:{

    * only a few types are checked currently in arithmetic test; this is due to the test silently failing, meaning I can't actually test anything implemented

    * did not implement checks for half floats

    * CUDA, CPP, HLSL implementations were largly out of scope and not implemented, this is due to the implementation being non trivial for many functions

}

Random fixes encountered:{
    * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll
}

* Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548

Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548

GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt

GL_KHR_shader_subgroup_* & GLSL ref:
    * https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt
    * https://www.khronos.org/blog/vulkan-subgroup-tutorial
    * https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf

HLSL ref:
    * https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions
    * https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics

CUDA ref:
    * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

SPIR-V ref:
    * https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_memory_semantics_id

Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing.

GL_KHR_shader_subgroup_basic{
**Partially implemented**

Implementation:
    * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411.

    * Functions were reimplemented despite nearly mirrored HLSL functions due to:
        * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup:
            * `__syncwarp` vs `__syncthreads`
            * `SubgroupMemory` vs `WorkgroupMemory`
            * etc.
        * hlsl.meta implementations target broader SPIR-V memory targets to block on:
            * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory
        * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches

Testing:
tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang`
    * these tests do not test functionality since not implemented yet

tests for the functions -- `tests/glsl/shader-subgroup-basic.slang`
    * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory
        * due to testing tools avaible there are no tests for ImageMemory
    * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation.

}

GL_KHR_shader_subgroup_vote{
**Fully implemented**

Implementation:
    * 3/3 functions are using the hlsl.meta implementation

Testing:

`tests/glsl/shader-subgroup-vote.slang`
    * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct

}

GL_KHR_shader_subgroup_ballot{
**Partially implemented**

Implementation:
    There are 10/10 functions that are implemented:
    * 3 are using hlsl.meta implementation
    * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA
        * These implementations do not exist in hlsl.meta, so they were added
        * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated:
            * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that `   (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1`
            * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs.
            * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot`
        * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit

Testing:
`tests/glsl/shader-subgroup-ballot.slang`
    * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test)

}

GL_KHR_shader_subgroup_arithmetic{
**Partially implemented**

Implementation:
    * There are 21 functions to implement:
        * 14 functions are using the hlsl.meta implementation
        * 7 functions are new implementations -- only implemented for GLSL and SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required
            * CUDA, CPP, HLSL are out of scope for the commit

Testing:
`tests/glsl/shader-subgroup-arithmetic.slang`
    * all tests silently kill the shader; outputted GLSL was checked, could not see an issue
    * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]

}

GL_KHR_shader_subgroup_shuffle{
**Partially implemented**

Implementation:
    * There are 2 functions to implement:
        * 1 function is using the existing hlsl.meta implmentation
        * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required

Testing:
`tests/glsl/shader-subgroup-shuffle.slang`
    * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
    * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called

}

GL_KHR_shader_subgroup_shuffle_relative{
**Partially implemented**

Implementation:
    * There are 2 functions to implement:
        * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required

Testing:
`tests/glsl/shader-subgroup-shuffle-relative.slang`
    * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]

}

GL_KHR_shader_subgroup_clustered{
**Partially implemented**

Implementation:
    * There are 7 functions to implement:
        * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
            * GLSL & SPIR-V both use their related functions, no emulation required

Testing:
`tests/glsl/shader-subgroup-shuffle-clustered.slang`
    * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]

}

GL_KHR_shader_subgroup_quad{
**Partially implemented**

Implementation:
    * There are 4 functions to implement:
        * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL

Testing:
`tests/glsl/shader-subgroup-shuffle-quad.slang`
    * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]

}

---------
Failing tests and why:

Note: test numbers are assuming none of the existing tests are toggled off

Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){
    tests/glsl/shader-subgroup-arithmetic.slang.3
    tests/glsl/shader-subgroup-arithmetic.slang.4
    tests/glsl/shader-subgroup-ballot.slang.4
    tests/glsl/shader-subgroup-basic.slang.3
    tests/glsl/shader-subgroup-basic.slang.4
    tests/glsl/shader-subgroup-quad.slang.3
    tests/glsl/shader-subgroup-quad.slang.4
    tests/glsl/shader-subgroup-vote.slang.3
    tests/glsl/shader-subgroup-vote.slang.4
}

Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{
    tests/glsl/shader-subgroup-shuffle.slang.4
    tests/glsl/shader-subgroup-shuffle-relative.slang.4
    tests/glsl/shader-subgroup-basic.slang.4
}

Other notes of worthy:{

    * added preamble function and macros for implementing subgroup functionality (and tests) to make it possible to iterate on the functionality with reasonable effort in the future

    * CUDA, CPP, HLSL implementations were largly out of scope and not implemented, this is due to the implementation being non trivial for many functions

    * doubles cause a silent crash on most subgroup functions tested (silent shader hang)

    * __requireGLSLExtension does not work as intended inside glsl.meta; as a result half, int16, int64 int8, all are ommited from testing

}

Random fixes encountered:{
    * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll
    * hlsl.meta incorrectly uses for WaveMaskPrefixBitOr (SPIR-V) OpGroupNonUniformBitwiseAnd intead of OpGroupNonUniformBitwiseOr; this was fixed
}

* redesign tests under suggestions that they should be smaller, more maintainable, and test the most amount of data reasonabley possible (balance with fast iterations);

optional double testing

varying parameter testing

most tests chain results now

* fix missing impl and merge conflict resolutions

* reundant test code cleanup and organization

move tests to proper location (glsl-intrinsic)

clean up redundant code (input buffers)

* add missing logical operands support (and remove hlsl/cuda code reuse due to the functional differences) under all And, Or, Xor ops

redesign tests to conform to a better testing paradigm

* testing code style change to not use white space as a toggle for tests

* provided crash reason for doubles (intel iris gpu's crash in glsl with doubles due to missing support in device caps [as per vulkan validation layer)

uncommented the `__requireGLSLExtension` code so once it is fixed int16/8/64/half wil work with subgroup not requiring future intervention

* fixing some vk validation layer errors (OpMemoryBarrier, Shuffle operations)

modified style of tests; removed redundancy (extra code that does nothing); fixed some incorrect run targets; added error reasons for all encountered problems (and if needed, a #define/#if toggle)

* remove comments of important tests inplace of #define over the broken feature of extended shader_subgroup types

* removed macros inside glsl.meta

removed erroneous __target_switch to directly call hlsl.meta function

added elaboration on the problem with __requireGLSLExtension

changed WaveMaskPrefixBit[or|and|xor] to support the expected type of <int> only as per `HLSL Shader Model 6.5` specs

removed "precision highp" since it does not affect tests

* changes some hlsl.meta functions used to be more appropriate (as per suggested)
WaveMask -> WaveActive.*
WaveMaskPrefix.* -> WavePrefix.*

remove __target_switch case's for unimplemented case's of intrinsics

fix _getLaneId() being removed from some regex used earlier

* fix usage of __target_intrinsic instead of __intrinsic_asm; silently would cause only arguments to be emmitted as return

changed usage of `__requireGLSLExtension` because now it causes a crash from the missing intrinsic (instead of a silent error)

* fix shader subgroup extended types support for GLSL and SPIR-V:
1. seperate intrinsic/__requireGLSL generating functionality of shader_subgroup_preamble into child function calls due to otherwise `__requireGLSLExtension` being ignored if the calling function of shader_subgroup_preamble calls an `__intrinsic_asm`
2. fixed HLSL.meta logic for wave operations (Add, Mul, exclusiveAdd, exclusiveMul) to no longer cast the input type T into a uint due to cost-of-op & crash.
    * Int8_t bit casted into uint32_t crashed the compiler. As per SPIR-V spec, OpGroupNonUniformI.* work on uint and int types meaning the function has no need to cast to a unit.
3. removed erroneous __target_switch for subgroupShuffle

* 1. ignore tests gracefully
2. remove un-needed SPIRV capability specifying (with OpCapability)
3. clean up structure of  typeRequireChecks_shader_subgroup_GLSL
4. explain why HLSL/CUDA are not targeted for shader-subgroup-arithmetic.slang

* syntax changes + `property` declaration fix + builtin var glsl implementation + changed incorrect HLSL.meta assumptions

(#1)`property` declaration as *non member* implementation change/fix (all of the changes to `slang-lower-to-ir.cpp`)

using (#1), implemented subgroup builtin's for GLSL/SPIR-V; did not implement built'ins completly for HLSL/CUDA due to non trivial implementations. CPP has no implementation due to missing support of system values

changed some incorrect HLSL.meta subgroup implementation assumptions of type usage (bit casting 8bit->32bit, wrong capabilities causing errors)

dumping ast crash with spir-v when using builtin's fixed by adding the `builtin` spirv case (all of the changes to `slang-ast-dump.cpp`)

[ForceInline] addition to functions missing it

return instead of spirv_asm when empty blocks are used

* syntax & organization of tests adjustment (specifically how if'def's are managed)

* figuring out where ci fails

* figuring out where ci fails -- testing with enclusive & regular

* testing CI with exclusive, regular, inclusive

* remove unneeded white space

test CI inconsistency issues further with arithmetic.slang

* testing if the ci run fails due to some timeout/recovery issue

* split up arithmetic tests and push to test with CI

---------

Co-authored-by: Yong He <yonghe@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant