-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sycl::reduction for float not working on AMD gpu #6054
Comments
I could reproduce the error, which may be related to the support of floating-point atomic operations. I asked a question here |
Thanks for your response :-) Now I would like to provide an additional information: I also have a HIP version for the same code which uses thrust::transform_reduce (get sum of absolutes) and it works without any issues
I was thinking maybe sycl::reduction when running on AMD GPUs uses something similar internally so I am not sure if the issue is on the HIP side or on the SYCL/DPC++ side. Can you please have a look? |
@mgrabban @zjin-lcf I was able to successfully test the above with the latest commit Environment: I've simplified the above test a bit, (a) headers, (b) since now sycl::range, sycl::item is supported now #include <sycl/sycl.hpp>
int main()
{
const int N = 8;
sycl::queue q {sycl::gpu_selector{}};
{
auto signal = sycl::malloc_shared<int>( N, q);
auto sum = sycl::malloc_shared<int>( 1, q);
for (unsigned int i = 0; i < N; ++i) {
signal[i] = i;
}
sum[0] = 0;
q.submit([&](sycl::handler& cgh) {
auto sumReduction = sycl::reduction(sum, std::plus<int>(), sycl::property::reduction::initialize_to_identity{});
cgh.parallel_for(sycl::range<1>(N),
sumReduction,
[=](sycl::id<1> idx, auto& sumresult) {
sumresult.combine( signal[idx] );
});
});
q.wait();
std::cout << "sum of int array: " << sum[0] << std::endl;
}
//----------------------------------------------------------------------------
{
auto signal = sycl::malloc_shared<float>(N, q);
auto sum = sycl::malloc_shared<float>(1, q);
for (unsigned int i = 0; i < N; ++i) {
signal[i] = (float)i;
}
sum[0] = 0.0;
q.submit([&](sycl::handler& cgh) {
auto sumReduction = sycl::reduction(sum, std::plus<float>(), sycl::property::reduction::initialize_to_identity{});
cgh.parallel_for(sycl::range<1>(N),
sumReduction,
[=](sycl::id<1> idx, auto& sumresult) {
sumresult.combine( signal[idx] );
});
});
q.wait();
std::cout << "sum of float array: " << sum[0] << std::endl;
}
return 0;
} To Compile: abagusetty@amdgpu02:~/soft/sycl2020_reductions$ $DPCPP_HOME/llvm_hip/build_rocmopt/bin/clang++ -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend --offload-arch=gfx908 sycl_test.cpp -o sycl_test_hip.exe
warning: linking module '/gpfs/jlse-fs0/users/abagusetty/compilers/llvm_hip/build_rocmopt/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-amdgcn--amdhsa.bc': Linking two modules of different target triples: '/gpfs/jlse-fs0/users/abagusetty/compilers/llvm_hip/build_rocmopt/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-amdgcn--amdhsa.bc' is 'amdgcn-unknown-amdhsa' whereas 'sycl_test.cpp' is 'amdgcn-amd-amdhsa'
[-Wlinker-warnings]
1 warning generated. To Run: abagusetty@amdgpu02:~/soft/sycl2020_reductions$ LD_LIBRARY_PATH=$DPCPP_HOME/llvm_hip/build_rocmopt/lib:$LD_LIBRARY_PATH SYCL_DEVICE_FILTER=hip:gpu ./sycl_test_hip.exe
sum of int array: 28
sum of float array: 28 |
Closing this as it looks like it was fixed a long time ago, please feel free to re-open if the issue persists. |
Describe the bug
sycl::reduction for float data type is not working on AMD (MI100) GPU
Summing int array works but summing float array fails.
To Reproduce
I used this code to test it. Block above dashed line does int array summing while that below dashed line does float array summing.
Sum of int array should be 28 while sum of float array should be 28.0
Compile command and output:
test_reduction/src $ clang++ test4.cpp -O3 -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend --offload-arch=gfx908
warning: linking module '/usr/DPA/tools/syclos_amd/20220406/sycl_workspace/llvm/build/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-amdgcn--amdhsa.bc': Linking two modules of different target triples: '/usr/DPA/tools/syclos_amd/20220406/sycl_workspace/llvm/build/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-amdgcn--amdhsa.bc' is 'amdgcn-unknown-amdhsa' whereas 'test4.cpp' is 'amdgcn-amd-amdhsa'
[-Wlinker-warnings]
1 warning generated.
Run command and output:
test_reduction/src $ SYCL_DEVICE_FILTER=hip:gpu ./a.out
sum of int array: 28
terminate called after throwing an instance of 'cl::sycl::runtime_error'
what(): Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)
Aborted (core dumped)
Environment (please complete the following information):
test_reduction/src $ clang++ --version
clang version 15.0.0 (https://github.com/intel/llvm 433a073)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/DPA/tools/syclos_amd/20220406/sycl_workspace/llvm/build/bin
The text was updated successfully, but these errors were encountered: