Skip to content

Creative solution for Debugging out of Kernels? Here is mine. Any interesting ideas you have come up with? #15

@jonmdev

Description

@jonmdev

Again, I can't thank you enough for this project and all your many replies and posts on StackExchange and even Reddit which I have found while researching how to do basic things. I now have a working implementation of my project, and only 8 days after I first started. To get up and running with OpenCL and convert a project over in only 8 days is a testament to your good design and explanations.

I just have one more question.

It is challenging to see what is happening inside the Kernels. For example, if you access whatever[i] and [i] is not in range, you will typically get errors in Visual Studio, but the Kernel says nothing if you do this inside it.

It is hard to also see what points and if etc. are being hit. My best idea was the following:

1) Create Memory & Memory objects and pass into Kernel:

For example, in Kernel design, add the following parameters:

//debug
global char* debugChar,
global float* debugFloat, 
global int* dbgIndexC,
global int* dbgIndexF,

int maxDebugChar

Here debugChar and debugFloat are Memory<char>(device, maxDebugChar) & Memory<float>(device, maxDebugChar).

dbgIndexC and dbgIndexF are Memory<int>(device, 1) and Memory<int>(device, 1) as indexes each initialized to 0 so you can increment globally an index with each new addition per kernel run.

2) Use inside Kernel:

I have found the debugFloat most helpful as it maintains chronology to just use one buffer and putting in strings is too hard as char. Putting in floats or ints as char is also too hard.

So for example, you can do:

const uint i = get_global_id(0); //gets index of array 0 to n
const uint g = get_group_id(0); //get workgroup
const uint groupNums = get_num_groups(0); //get num groups

 //USE ANY RANDOM NUMBER SO YOU CAN SEARCH FOR IT AS "START" OF DEBUG STATEMENT (here 1.333)
if (dbgIndexF[0] < maxDebugChar) { debugFloat[dbgIndexF[0]] = 1.333; dbgIndexF[0]++; }
if (dbgIndexF[0] < maxDebugChar) { debugFloat[dbgIndexF[0]] = i; dbgIndexF[0]++; }
if (dbgIndexF[0] < maxDebugChar) { debugFloat[dbgIndexF[0]] = g; dbgIndexF[0]++; }
if (dbgIndexF[0] < maxDebugChar) { debugFloat[dbgIndexF[0]] = groupNums; dbgIndexF[0]++; }
if (dbgIndexF[0] < maxDebugChar) { debugFloat[dbgIndexF[0]] = 4.222; dbgIndexF[0]++; } 
//USE ANY RANDOM NUMBER SO YOU CAN SEARCH FOR IT AS "END" OF DEBUG STATEMENT (here 4.222)

Or alternatively, you can try for char, but this is very tedious, and since you can't add floats/doubles/ints into the char array easily it is less useful:

if (dbgIndexC[0] < maxDebugChar) { debugChar[dbgIndexC[0]] = 'S'; dbgIndexC[0]++; }
if (dbgIndexC[0] < maxDebugChar) { debugChar[dbgIndexC[0]] = 'T'; dbgIndexC[0]++; }
if (dbgIndexC[0] < maxDebugChar) { debugChar[dbgIndexC[0]] = 'A'; dbgIndexC[0]++; } 
if (dbgIndexC[0] < maxDebugChar) { debugChar[dbgIndexC[0]] = 'R'; dbgIndexC[0]++; } 
if (dbgIndexC[0] < maxDebugChar) { debugChar[dbgIndexC[0]] = 'T'; dbgIndexC[0]++; } 

3) Get and print out the Debug Info:

After the Kernel runs, run a function to process and print out the debug info in whatever way the system needs. Like for example:

void readDebugData() {

	debugChar->read_from_device();
	std::string debugString = "";
	for (int i = 0; i < maxDebugChar; i++) { debugString += (*debugChar)[i];  /*clear it*/  (*debugChar)[i] = ' '; }
	DBG(debugString);

	debugFloat->read_from_device();
	debugString = "";
	for (int i = 0; i < maxDebugChar; i++) { debugString += " | " + cl_to_string((*debugFloat)[i]); /*clear it*/ (*debugFloat)[i] = 4; }
	DBG(debugString);

        //do something to clear the debug buffers
	debugChar->write_to_device(); //clear
	debugFloat->write_to_device(); //clear
	dbgIndexC->write_to_device(); //clear 
	dbgIndexF->write_to_device(); //clear
}

Ideas?

That was my best idea and it works at least okay. Without it I could never have figured out how to use the kernels or how they were allocating into workgroups etc.

However, it can also crash the Kernel causing it to hang for 400 ms which I presume is the device timeout and then OpenCL just stops responding to future requests. Ie. this is not being "workgroup safe."

I presume this is being triggered when multiple workgroups all try to write to the same debug index/array at once. So this method is not exactly good or needs to be improved though it is at least somewhat useful.

Additionally, besides compilation errors, there are still no obvious good ways I can think of to be alerted if you do something wrong, like outside range attempts to read something, and it is hard to find Kernel code mistakes. Eg. trying to read inside the kernel dbgIndexF[-1] (which doesn't exist) creates no error. Interestingly, this returns 0 for me when I try to debug out the value using the method above, ie:

	int testBreak = dbgIndexF[-1]; //doesn't exist
	if (dbgIndexF[0] < maxDebugChar) { debugFloat[dbgIndexF[0]] = testBreak; dbgIndexF[0]++; }
	//this returns 0 when I read it outside the kernel

However, I presume this is just "undefined behavior". I only caught some mistakes I made by copying my kernel out and rephrasing it into regular code and running it on the CPU to see what would happen.

You have obviously been at this longer than me and understand the system better.

I am just wondering if you have come up with any different or better methods for (1) Debugging things out, and (2) Catching Kernel code errors.

Thanks for any thoughts as usual, and thanks again for letting me get into GPU work so quickly and (relatively) painlessly. 🙂

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions