how to set gridSize and groupSize? #8404

wangzy0327 · 2023-02-20T07:37:21Z

Describe the bug
As I use the gridSize and groupSize occurs Error as follow.

exception code : sycl:4 exception message : Non-uniform work-groups are not supported by the target device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)

How to solve the problem ? how to set gridSize and groupSize?

To Reproduce

CMakeLists.txt

cmake_minimum_required(VERSION 2.8.12)

set(DPCPP_HOME "/home/wzy/sycl_workspace")
set(DPCPP_SYCL_HOME "${DPCPP_HOME}/build-cuda-2022-12")

set(CMAKE_C_COMPILER "${DPCPP_SYCL_HOME}/bin/clang")
set(CMAKE_CXX_COMPILER "${DPCPP_SYCL_HOME}/bin/clang++")
set(CMAKE_CXX_STANDARD 17)

project(test_sycl_kernel)

include_directories("${DPCPP_SYCL_HOME}/include/sycl")
include_directories("${DPCPP_SYCL_HOME}/include")

message(STATUS "dpcpp_home : ${DPCPP_HOME}")
message(STATUS "dpcpp_sycl_home : ${DPCPP_SYCL_HOME}")


message(STATUS "find library path : ${DPCPP_SYCL_HOME}/lib")
set(CMAKE_BUILD_RPATH "${DPCPP_SYCL_HOME}/lib;${CMAKE_BUILD_RPATH}")
message(STATUS "cmake build rpath : ${CMAKE_BUILD_RPATH}")


set(CMAKE_BUILD_TYPE "Debug")
set(CMAKE_CXX_FLAGS "-fsycl -fsycl-targets=nvptx64-nvidia-cuda")
set(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -ggdb -std=c++17 -Wno-sycl-target -Wno-linker-warnings")
set(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall -std=c++17")


link_directories("${DPCPP_SYCL_HOME}/lib")

aux_source_directory(. DIR_SRCS)
add_executable(test_sycl_kernel ${DIR_SRCS})
target_include_directories(test_sycl_kernel PRIVATE "${DPCPP_SYCL_HOME}/include/sycl")
target_include_directories(test_sycl_kernel PRIVATE "${DPCPP_SYCL_HOME}/include")
target_link_libraries(test_sycl_kernel PRIVATE sycl )

add_2.cc

#include <iostream>
#include <random>

#include <CL/sycl.hpp>
using namespace sycl;

void add_2(queue &myQueue, sycl::range<3> dimGrid, sycl::range<3> dimBlock, void** void_args) {
   float* device_input1 = (float*)(void_args[0]);
   float* device_input2 = (float*)(void_args[1]);
    try{
      myQueue.submit([&](sycl::handler& h){
        h.parallel_for<class test_kernel>(sycl::nd_range<3>{dimGrid,dimBlock},[=](nd_item<3> item){
            int i = item.get_global_linear_id();
            device_input2[i] = device_input1[i] + 2;
        });
      });
    }catch(sycl::exception& e){
      std::cout<<"exception code : "<<e.code()<<"  exception message : "<<e.what()<<std::endl;
    }
}


int main(){
    sycl::queue myQueue;
    auto myContext = myQueue.get_context();
    auto myDev = myQueue.get_device();

    sycl::range<3> dimGrid(1,1,16);
    sycl::range<3> dimBlock(1,1,64);

    int N = 1024;
    float VAL = 1.0f;
    float VAL2 = 2.0f;

    std::vector<float> host_input (N,VAL);
    std::vector<float> host_input2 (N,VAL2);
    std::vector<float> host_output (N);


    auto device_input1 = sycl::malloc_device<float>(N,myQueue);
    auto device_input2 = sycl::malloc_device<float>(N,myQueue);
    

    myQueue.memcpy(device_input1,host_input.data(),N*sizeof(float));
    myQueue.memcpy(device_input2,host_input2.data(),N*sizeof(float));

    void* data_device[2] = {device_input1,device_input2};


    add_2(myQueue,dimGrid,dimBlock,data_device);

    myQueue.wait();

    myQueue.memcpy(host_output.data(),device_input2,N*sizeof(float));

    myQueue.wait();

    for(int i = 0;i < N;i++){
      std::cout << "hostData[" << i << "] = " << host_output[i] << std::endl;
    }
}

Environment (please complete the following information):

OS: Ubuntu20.04
Target device and vendor: Nvidia GPU
DPC++ version: clang-16 2022-12
Dependencies version: cuda-11

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

AlexeySachkov · 2023-02-23T15:53:23Z

Hi @wangzy0327,

Thanks for the report. This seems like a bug to me:

llvm/sycl/source/detail/error_handling/enqueue_kernel.cpp

Lines 232 to 234 in 2d47764

    
           throw sycl::nd_range_error( 
        
               "Non-uniform work-groups are not supported by the target device", 
        
               PI_ERROR_INVALID_WORK_GROUP_SIZE);

If I read all those huge nested if-else branches correctly, that code throws an exception for any (?) combination of global and local sizes on non-OpenCL backends. The line was introduced almost two years ago 03ef819 and I feel like I'm missing something or is it really the case that we have had this dummy bug for almost two years now?

Pennycook · 2023-02-24T01:52:48Z

I think there is a bug in the code:

sycl::range<3> dimGrid(1,1,16);
sycl::range<3> dimBlock(1,1,64);

This is how CUDA specifies the range of a launch, but in SYCL you have to specify the total size of the range and the work-group size, as:

sycl::range<3> dimGrid(1, 1, 16 * 64);
sycl::range<3> dimBlock(1, 1, 64);

I think the DPC++ error message could definitely be clearer. It should probably say that the sizes don't divide evenly instead of referencing non-uniform work-group sizes.

zjin-lcf · 2023-02-24T19:34:14Z

@AlexeySachkov

Is it right that "LocalExceedGlobal" is an error for any backend ? Thanks.

if (HasLocalSize) {
      const bool LocalExceedsGlobal =
                          (NDRDesc.LocalSize[0] > NDRDesc.GlobalSize[0] ||
                            NDRDesc.LocalSize[1] > NDRDesc.GlobalSize[1] ||
                            NDRDesc.LocalSize[2] > NDRDesc.GlobalSize[2]);
}

wangzy0327 · 2023-02-25T09:43:53Z

I think there is a bug in the code:
sycl::range<3> dimGrid(1,1,16);
sycl::range<3> dimBlock(1,1,64);
This is how CUDA specifies the range of a launch, but in SYCL you have to specify the total size of the range and the work-group size, as:
sycl::range<3> dimGrid(1, 1, 16 * 64);
sycl::range<3> dimBlock(1, 1, 64);
I think the DPC++ error message could definitely be clearer. It should probably say that the sizes don't divide evenly instead of referencing non-uniform work-group sizes.

Yes. I fix the about the code .

sycl::range<3> dimGrid(1,1,16);
sycl::range<3> dimBlock(1,1,64);

The total size of the range is multiplication of the GridSize and the BlockSize.

try{
      myQueue.submit([&](sycl::handler& h){
        h.parallel_for<class test_kernel>(sycl::nd_range<3>{dimGrid,dimBlock},[=](nd_item<3> item){
            int i = item.get_global_linear_id();
            device_input2[i] = device_input1[i] + 2;
        });
      });

AlexeySachkov · 2023-03-03T19:45:50Z

I think there is a bug in the code:

Oh, thanks for pointing it out, @Pennycook. I don't have CUDA background, so I haven't even though to check which exact value passed where, as I assumed that bigger one is a global size

If I read all those huge nested if-else branches correctly, that code throws an exception for any (?) combination of global and local sizes on non-OpenCL backends. The line was introduced almost two years ago 03ef819 and I feel like I'm missing something or is it really the case that we have had this dummy bug for almost two years now?

I took another look at the code and it is definitely me missing the fact that the function is only called after native API failed and not before we call to native API. So everything is correct there.

@zjin-lcf,

Is it right that "LocalExceedGlobal" is an error for any backend ? Thanks.

That is correct. From 4.9.4.2. SYCL functions for invoking kernels

template <typename KernelName, int Dimensions, typename... Rest>
void parallel_for(nd_range<Dimensions> executionRange, Rest&&... rest)
Throws an exception with the errc::nd_range error code if the global size defined in the associated executionRange defines a non-zero index space which is not evenly divisible by the local size in each dimension.

If local exceeds global, it means that global is not evenly divisible by local, which is is incorrect in SYCL.

Yes. I fix the about the code .

@wangzy0327, was your problem resolved with that fix? Can we close the issue?

wangzy0327 added the bug Something isn't working label Feb 20, 2023

AlexeySachkov added good first issue Good for newcomers confirmed labels Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to set gridSize and groupSize? #8404

how to set gridSize and groupSize? #8404

wangzy0327 commented Feb 20, 2023 •

edited

AlexeySachkov commented Feb 23, 2023

Pennycook commented Feb 24, 2023 •

edited

zjin-lcf commented Feb 24, 2023

wangzy0327 commented Feb 25, 2023

AlexeySachkov commented Mar 3, 2023

how to set gridSize and groupSize? #8404

how to set gridSize and groupSize? #8404

Comments

wangzy0327 commented Feb 20, 2023 • edited

AlexeySachkov commented Feb 23, 2023

Pennycook commented Feb 24, 2023 • edited

zjin-lcf commented Feb 24, 2023

wangzy0327 commented Feb 25, 2023

AlexeySachkov commented Mar 3, 2023

wangzy0327 commented Feb 20, 2023 •

edited

Pennycook commented Feb 24, 2023 •

edited