Skip to content
Permalink
Browse files

Add a minimally intrusive and easy-to-use kernel execution time profiler

Setting POCL_TRACING=cq collects kernel execution times by force
enabling the command queue profiling feature, and dumps collected stats
atexit(). The purpose of this feature is to enable implementation of
minimally intrusive profile collection; the profile data collector can
choose the occasions when it gathers the time stamp data from the events.
The impact to the observed execution profile is minimized by avoiding writing
any logs, copying objects or such while collecting the data during
execution.

It relies on the standard event timestamps to enable devices update them
as (and when) they see fit during the execution.

The drawback is accumulation of cl_object garbage, which should be taken
in account in the data collection interval; the collector should release the
events and the extra data objects they hold often enough to avoid
memory consumption to become a problem.

The current version does not perform garbage collection, but assumes
the alive OpenCL objects that are kept until the exit is a non-problem,
which is clearly the case with most of the OpenCL programs which are rather
simple; not long running, nor launch a lot of commands over their lifetime.

The default profile data collector counts only kernel commands at the moment.
Collecting stats of data transfers would be a useful addition.
  • Loading branch information...
pjaaskel committed Aug 20, 2019
1 parent 790df88 commit 0c3147ce422673f8de151c0c10395213603b4d9a
16 CHANGES
@@ -1,8 +1,22 @@
Changes updated until commit 7288619eb1b (May 25, 2019).

1.5 unreleased
==============

Notable User Facing Changes
---------------------------
- POCL_TRACE_EVENT, POCL_TRACE_EVENT_OPT and POCL_TRACE_EVENT_FILTER
environment variables were renamed to
POCL_TRACING, POCL_TRACING_OPT and POCL_TRACING_FILTER, respectively.

Usability
---------
- A simple per-kernel execution time statistics atexit() for quick and easy
low-impact per-device profiling purposes (relies on event time stamps purely).
It can be enabled by setting POCL_TRACING env to 'cq'.

1.4 September 2019
==============
==================

Highlights
----------
@@ -246,18 +246,24 @@ pocl.
Default 0. If set to an integer N > 0, libpocl will make a pause of N seconds
once, when it's loading. Useful e.g. to set up a LTTNG tracing session.

- **POCL_TRACE_EVENT**, **POCL_TRACE_EVENT_OPT** and **POCL_TRACE_EVENT_FILTER**
- **POCL_TRACING**, **POCL_TRACING_OPT** and **POCL_TRACING_FILTER**

If POCL_TRACE_EVENT is set to some tracer name, then all events
If POCL_TRACING is set to some tracer name, then all events
will be traced automatically. Depending on the backend, traces
may be output in different formats.
POCL_TRACE_EVENT_FILTER is a comma separated list of string to
may be output in different formats and collected in a different way.
POCL_TRACING_FILTER is a comma separated list of string to
indicate which event status should be filtered. For instance to trace
complete and running events POCL_TRACE_EVENT_FILTER should be set
complete and running events POCL_TRACING_FILTER should be set
to "complete,running". Default behavior is to trace all events.

cq -- Dumps a simple per-kernel execution time statistics at the
program exit time which is collected from command queue
start and finish time stamps. Useful for quick and easy profiling
purposes with accurate kernel execution time stamps produced
in a per device way. Currently only tracks kernel timings, and
POCL_TRACING_FILTER has no effect.
text -- Basic text logger for each events state
Use POCL_TRACE_EVENT_OPT=<file> to set the
Use POCL_TRACING_OPT=<file> to set the
output file. If not specified, it defaults to
pocl_trace_event.log
lttng -- LTTNG tracepoint support. When activated, a lttng session
@@ -138,7 +138,7 @@ set(POCL_LIB_SOURCES "clCreateContextFromType.c"
"clEnqueueSVMMap.c" "clEnqueueSVMUnmap.c"
"clEnqueueSVMMemcpy.c" "clEnqueueSVMMemFill.c"
"clSetKernelArgSVMPointer.c" "clSetKernelExecInfo.c"
"pocl_binary.c" "pocl_opengl.c")
"pocl_binary.c" "pocl_opengl.c" "pocl_cq_profiling.c")

if(ANDROID)
list(APPEND POCL_LIB_SOURCES "pocl_mkstemp.c")
@@ -8,10 +8,10 @@
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
@@ -22,6 +22,7 @@
*/

#include "pocl_cl.h"
#include "pocl_cq_profiling.h"
#include "pocl_util.h"

CL_API_ENTRY cl_command_queue CL_API_CALL
@@ -44,7 +45,7 @@ POname(clCreateCommandQueue)(cl_context context,
POCL_GOTO_ERROR_ON((properties > (1<<2)-1), CL_INVALID_VALUE,
"Properties must be <= 3 (there are only 2)\n");

if (POCL_DEBUGGING_ON)
if (POCL_DEBUGGING_ON || pocl_cq_profiling_enabled)
properties |= CL_QUEUE_PROFILING_ENABLE;

for (i=0; i<context->num_devices; i++)
@@ -23,13 +23,14 @@
*/

#include "config.h"
#include "pocl_binary.h"
#include "pocl_cache.h"
#include "pocl_cl.h"
#include "pocl_context.h"
#include "pocl_cq_profiling.h"
#include "pocl_llvm.h"
#include "pocl_util.h"
#include "pocl_cache.h"
#include "utlist.h"
#include "pocl_binary.h"
#include "pocl_context.h"

#ifndef _MSC_VER
# include <unistd.h>
@@ -603,6 +604,13 @@ if (local_##c1 > 1 && local_##c1 <= local_##c2 && local_##c1 <= local_##c3 && \

POname(clRetainKernel) (kernel);

if (pocl_cq_profiling_enabled)
{
pocl_cq_profiling_register_event (command_node->event);
clRetainKernel (kernel);
command_node->event->meta_data->kernel = kernel;
}

pocl_command_enqueue (command_queue, command_node);
errcode = CL_SUCCESS;

@@ -3,10 +3,10 @@
Copyright (c) 2011-2019 pocl developers
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
of this software and associated documentation files (the "Software"), to
deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
@@ -16,20 +16,16 @@
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.
*/

#include "pocl_util.h"

CL_API_ENTRY cl_int CL_API_CALL
POname(clGetEventInfo)(cl_event event,
cl_event_info param_name,
size_t param_value_size,
void * param_value,
size_t * param_value_size_ret)
CL_API_SUFFIX__VERSION_1_0
CL_API_ENTRY cl_int CL_API_CALL POname (clGetEventInfo) (
cl_event event, cl_event_info param_name, size_t param_value_size,
void *param_value, size_t *param_value_size_ret) CL_API_SUFFIX__VERSION_1_0
{
POCL_RETURN_ERROR_COND((event == NULL), CL_INVALID_EVENT);
POCL_LOCK_OBJ (event);
@@ -4,10 +4,10 @@
2011-2019 pocl developers
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
of this software and associated documentation files (the "Software"), to
deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
@@ -17,14 +17,14 @@
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.
*/

#include "devices/devices.h"
#include "pocl_cl.h"
#include "pocl_util.h"
#include "devices/devices.h"

#include <unistd.h>

@@ -455,7 +455,7 @@ pocl_init_devices ()
POCL_GOTO_ERROR_ON ((pocl_cache_init_topdir ()), CL_DEVICE_NOT_FOUND,
"Cache directory initialization failed");

pocl_event_tracing_init ();
pocl_tracing_init ();

#ifdef HAVE_SLEEP
int delay = pocl_get_int_option ("POCL_STARTUP_DELAY", 0);
@@ -1581,7 +1581,7 @@ pocl_hsa_submit (_cl_command_node *node, cl_command_queue cq)
pocl_hsa_device_data_t *d = device->data;
unsigned added_to_readylist = 0;

PTHREAD_CHECK (pthread_mutex_lock(&d->list_mutex));
PTHREAD_CHECK (pthread_mutex_lock (&d->list_mutex));

node->ready = 1;
if (pocl_command_is_ready (node->event))
@@ -1139,6 +1139,14 @@ struct event_node
event_node * volatile next;
};

/* Optional metadata for events for improved profile data readability etc. */
typedef struct _pocl_event_md
{
/* The kernel executed by the NDRange command associated with the event,
if any. */
cl_kernel kernel;
} pocl_event_md;

typedef struct _cl_event _cl_event;
struct _cl_event {
POCL_ICD_OBJECT
@@ -1169,9 +1177,12 @@ struct _cl_event {
cl_ulong time_start; /* the time the command actually started executing */
cl_ulong time_end; /* the finish time of the command */

void *data; /* Device specific data */
void *data; /* Device specific data. */

/* Additional (optional data) used to make profile data more readable etc. */
pocl_event_md *meta_data;

/* impicit event = an event for pocl's internal use, not visible to user */
/* Event for pocl's internal use, not visible to user. */
int implicit_event;
_cl_event * volatile next;
_cl_event * volatile prev;
@@ -0,0 +1,138 @@
/* OpenCL runtime library: integrated command queue profile collecting
functionality
Copyright (c) 2019 Pekka Jääskeläinen
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.
*/

#include "pocl_cq_profiling.h"
#include "pocl_cl.h"
#include "pocl_util.h"

/* Maximum number of events collected. */
#define POCL_CQ_PROFILING_MAX_EVENTS 1000000
#define POCL_CQ_PROFILING_MAX_KERNELS 1000

int pocl_cq_profiling_enabled = 0;
static unsigned cq_events_collected = 0;
static cl_event *profiled_cq_events = NULL;

struct kernel_stats
{
cl_kernel kernel;
unsigned long time;
unsigned long launches;
};

static int
order_by_time (const void *a, const void *b)
{
if (((struct kernel_stats *)a)->time < ((struct kernel_stats *)b)->time)
return 1;
else if (((struct kernel_stats *)a)->time > ((struct kernel_stats *)b)->time)
return -1;
else
return 0;
}

static void
pocl_atexit ()
{
unsigned long total_time = 0;
unsigned long total_commands = 0;
unsigned long different_kernels = 0;

struct kernel_stats kernel_statistics[cq_events_collected];
bzero (kernel_statistics, sizeof (kernel_statistics));

/* First statistics computation round. */
for (unsigned i = 0; i < cq_events_collected; ++i)
{
cl_event e = profiled_cq_events[i];
cl_kernel kernel = e->meta_data->kernel;
unsigned long kernel_t = e->time_end - e->time_start;
total_time += kernel_t;
total_commands++;

unsigned k_i = 0;
while (k_i < different_kernels
&& strcmp (kernel_statistics[k_i].kernel->name, kernel->name)
!= 0)
++k_i;

if (kernel_statistics[k_i].kernel == NULL)
{
kernel_statistics[k_i].kernel = kernel;
different_kernels++;
}
kernel_statistics[k_i].time += kernel_t;
kernel_statistics[k_i].launches++;
}

printf ("\n");
printf (" %-30s %10s %15s %3s %10s\n", "kernel", "launches", "total us",
"", "avg us");
qsort (kernel_statistics, different_kernels, sizeof (struct kernel_stats),
order_by_time);

for (int i = 0; i < different_kernels; ++i)
{
printf ("%3d) %-30s %10lu %15lu %3lu%% %10lu\n", i + 1,
kernel_statistics[i].kernel->name, kernel_statistics[i].launches,
kernel_statistics[i].time,
kernel_statistics[i].time * 100 / total_time,
kernel_statistics[i].time / kernel_statistics[i].launches);
}
printf (" %-30s %10s %15s %3s %10s\n", "",
"==========", "==========", "====", "==========");

printf (" %-30s %10lu %15lu %4s %10lu\n", "", total_commands, total_time,
"100%", total_time / total_commands);

/* TODO: Critical path information of the task graph. */
}

/* Initialize the profiling data structures, if not yet done. */

void
pocl_cq_profiling_init ()
{
profiled_cq_events
= (cl_event *)malloc (sizeof (cl_event *) * POCL_CQ_PROFILING_MAX_EVENTS);
atexit (pocl_atexit);
pocl_cq_profiling_enabled = 1;
}

/* Registers the event for profiling. Retains it to keep it alive until stats
have been collected.
*/

void
pocl_cq_profiling_register_event (cl_event event)
{
clRetainEvent (event);
if (event->meta_data == NULL)
event->meta_data = (pocl_event_md *)malloc (sizeof (pocl_event_md));

unsigned cq_events_pos = POCL_ATOMIC_INC (cq_events_collected) - 1;
if (cq_events_pos >= POCL_CQ_PROFILING_MAX_EVENTS)
POCL_ABORT ("CQ profiler reached the limit on tracked events.");
profiled_cq_events[cq_events_pos] = event;
}

0 comments on commit 0c3147c

Please sign in to comment.
You can’t perform that action at this time.