Error message: "An item with the same key has been added" #51

Costigan · 2019-12-09T18:18:29Z

Hello. I'm using 0.7.0.0 and the Cuda accelerator. I'm getting a runtime error with the message "An item with the same key has been added". The error doesn't occur when using the CPU accelerator.

I don't think this is an ILGPU bug, but I'm uncertain how to debug my code given the lack of specificity in the error message.

The error is generated on this line, the first (and only) time an action is generated from my kernel.

var landing_probability_kernel = accelerator.LoadStreamKernel<Index2, ArrayView2D, ArrayView2D, int, int>(EstimateLandingProbability);

My question is 'how should I debug this message'? What sorts of things should I look for?
(I'm including the code for completeness, but I'm not asking you to look at it if you can suggest a direction to look.)

One possibility I'm unsure about is that this kernel allocates shared memory and create a 2D view of if. I don't know for sure whether creating 2D views is allowed inside kernels.

    static void EstimateLandingProbability(
        Index2 index,
        ArrayView2D<byte> global,
        ArrayView2D<short> counts,
        int start_row,
        int start_col)  // landing_radius
    {
        const int radius = landing_radius;
        var row = index.Y;
        var col = index.X;

        var shared_width = patch_width + radius * 2;
        var shared_length = shared_width * shared_width;
        Debug.Assert(shared_length <= 64000);
        var shared_linear = SharedMemory.Allocate<byte>(shared_length);
        var shared = shared_linear.As2DView(shared_width, shared_width);

        // Load the slope buffer.  This loads a square section of global memory into shared memory four times with offsets.
        var offset = radius + radius;
        var row0 = start_row - radius;
        var col0 = start_col - radius;
        LoadSquare(global, shared, index, 0, 0, row0, col0);                             // top left
        LoadSquare(global, shared, index, 0, offset, row0, col0 + offset);               // top right
        LoadSquare(global, shared, index, offset, 0, row0 + offset, col0);               // bottom left
        LoadSquare(global, shared, index, offset, offset, row0 + offset, col0 + offset); // bottom right

        // Count
        var sum = 0;
        var d2 = radius * radius;
        var high = radius + radius;
        for (var y = -radius; y <= high; y++)
            for (var x = -radius; x <= high; x++)
                sum += x * x + y * y <= d2 ? shared[row + y + radius, col + x + radius] : 0;

        // Write the resulting counts to global memory
        var r1 = start_row + row;
        var c1 = start_col + col;
        Debug.Assert(r1 >= 0);
        Debug.Assert(c1 >= 0);
        Debug.Assert(r1 < dem_width);
        Debug.Assert(c1 < dem_width);
        counts[start_row + row, start_col + col] = (short)sum;
    }

    static void LoadSquare(ArrayView2D<byte> global, ArrayView2D<byte> shared, Index2 index, int shared_row0, int shared_col0, int global_row0, int global_col0)
    {
        var row = index.Y;
        var col = index.X;
        var target_row = shared_row0 + row;
        var target_col = shared_col0 + col;
        var source_row = global_row0 + row;
        var source_col = global_col0 + col;
        shared[target_row, target_col] = source_row < 0 || source_row >= dem_height || source_col < 0 || source_col >= dem_width 
            ? (byte)0 
            : global[source_row, source_col];
    }

The text was updated successfully, but these errors were encountered:

m4rs-mt · 2019-12-10T00:14:14Z

@Costigan Thank you for submitting this bug report. It seems to be an internal ILGPU compiler issue that should not occur and is not related to an error in your program. I have already investigated the described problem and I guess that it has been caused by a synchronization issue in the ABI class (fixed in d149410). After fixing this issue I can load the kernel using the CudaAccelerator and CLAccelerator classes without any problems.

Can you test whether the latest commits solve this problem? Furthermore, rolling back to version 0.6 should circumvent this problem as well.

m4rs-mt · 2019-12-10T00:16:32Z

Some additional remarks: It looks like you are not using any Group synchronization instructions after loading data into shared memory. Is this intended?

Note that using shared memory in combination with implicitly grouped kernels can lead to undefined and unintended behavior. We should add an error message to ILGPU that simply forbids such kernels. Please use an explicitly grouped kernel to have fine-grained control regarding the number of threads per group.

m4rs-mt · 2019-12-23T21:01:54Z

I'll close this issue for now, since the above-mentioned commit solves this problem on all test machines. Should you encounter further problems, we can always take up this topic again.

Costigan · 2019-12-25T00:52:41Z

@m4rs-mt I'm sorry for not responding quickly. I ended up going with a multi-core solution, and performance was good enough. However, getting this implementation going will help me in the future. I'll get it running and add another comment confirming that. Wrt the synchronization instructions, you're right. I should have included one after loading shared memory. Thank you.

m4rs-mt self-assigned this Dec 9, 2019

m4rs-mt added the bug label Dec 9, 2019

m4rs-mt closed this as completed Dec 23, 2019

Costigan mentioned this issue Dec 8, 2020

What happens if shared memory is allocated and used in an implicitly grouped kernel? #348

Closed

MoFtZ mentioned this issue Dec 9, 2020

Prevent using SharedMemory in implicitly grouped kernels. #354

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error message: "An item with the same key has been added" #51

Error message: "An item with the same key has been added" #51

Costigan commented Dec 9, 2019

m4rs-mt commented Dec 10, 2019

m4rs-mt commented Dec 10, 2019

m4rs-mt commented Dec 23, 2019

Costigan commented Dec 25, 2019

Error message: "An item with the same key has been added" #51

Error message: "An item with the same key has been added" #51

Comments

Costigan commented Dec 9, 2019

m4rs-mt commented Dec 10, 2019

m4rs-mt commented Dec 10, 2019

m4rs-mt commented Dec 23, 2019

Costigan commented Dec 25, 2019