# Using binary kernels in OpenCL

There are at least two reasons why you would want to use binary kernels to store pre-compiled kernel code: compilation speed and protection of intellectual property. Some measure of intellectual property protection can be achieved by compiling the source code into the application binary, however this does increase the size of the application, which might not be ok in some situations. Futhermore, in applications where setup time of the OpenCL kernel is crucial, one can save time by pre-compiling the OpenCL kernel to an intermediate binary format. The OpenCL runtime provides the function [clCreateProgramWithBinary](https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateProgramWithBinary.html) to complement the slower [clCreateProgramWithSource](https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateProgramWithSource.html). 

The programs [mat_mult_create_binary](code/mat_mult_create_binary.cpp) and [mat_mult_use_binary](code/mat_mult_use_binary.cpp) illustrate the use of [clCreateProgramWithBinary](https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateProgramWithBinary.html) for the matrix multiplication example. In [mat_mult_create_binary](code/mat_mult_create_binary.cpp), after OpenCL setup, a program is created from source and compiled using  [clCreateProgramWithSource](https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateProgramWithSource.html) and [clBuildProgram](https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clBuildProgram.html). Then the following code in [mat_mult_create_binary](code/mat_mult_create_binary.cpp) extracts the binary kernels from the program and saves the kernels to file.

```C++
    // Get the number of devices in the program. 
    // It should be 1, but we pretend there are many devices 
    // attached to the program
    cl_uint nprogram_devices;
    errchk(clGetProgramInfo(program, 
                            CL_PROGRAM_NUM_DEVICES, 
                            sizeof(cl_uint), 
                            &nprogram_devices, 
                            NULL),
                            "Getting the number of program devices");

    // Get the size of binary code for all devices in the program
    size_t binary_sizes[nprogram_devices];
    errchk(clGetProgramInfo(program, 
                            CL_PROGRAM_BINARY_SIZES, 
                            sizeof(size_t)*nprogram_devices, 
                            binary_sizes,
                            NULL),
                            "Getting the size of compiled binaries");

    // Make an array for each binary created
    unsigned char* binary_codes[nprogram_devices];
    for (int n=0; n<nprogram_devices; n++) {
        binary_codes[n]=(unsigned char*)calloc(binary_sizes[n], sizeof(unsigned char));
    }

    // Fill the arrays with binary information
    errchk(clGetProgramInfo(program,
                            CL_PROGRAM_BINARIES,
                            sizeof(unsigned char*)*nprogram_devices,
                            binary_codes,
                            NULL),
                            "Retrieving the compiled binaries");
    
    // Now save the compiled binaries to disk
    for (int n=0; n<nprogram_devices; n++) {
        char filename[50];
        sprintf(filename, "kernels_device_%01d.bin",n);
        FILE* fp=fopen(filename,"w");
        fwrite(binary_codes[n], binary_sizes[n], sizeof(unsigned char), fp);
        fclose(fp);
    }

```

If we look at the generated code [kernels_device_0.bin](code/kernels_device_0.bin) we see the intermediate binary format generated by NVIDIA's use of the LLVM compiler.


```assembly
//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-23076110
// Driver 387.26
// Based on LLVM 3.4svn
//

.version 6.1
.target sm_61, texmode_independent
.address_size 64

	// .globl	mat_multiply

.entry mat_multiply(
	.param .u64 .ptr .global .align 4 mat_multiply_param_0,
	.param .u64 .ptr .global .align 4 mat_multiply_param_1,
	.param .u64 .ptr .global .align 4 mat_multiply_param_2,
	.param .u32 mat_multiply_param_3,
	.param .u32 mat_multiply_param_4
)
{
	.reg .pred 	%p<7>;
	.reg .f32 	%f<36>;
	.reg .b32 	%r<40>;
	.reg .b64 	%rd<49>;


	ld.param.u64 	%rd8, [mat_multiply_param_0];
	ld.param.u64 	%rd9, [mat_multiply_param_1];
	ld.param.u64 	%rd10, [mat_multiply_param_2];
	ld.param.u32 	%r14, [mat_multiply_param_3];
	ld.param.u32 	%r15, [mat_multiply_param_4];
	mov.b32	%r16, %envreg3;
	mov.u32 	%r17, %ctaid.x;
	mov.u32 	%r18, %ntid.x;
	mad.lo.s32 	%r19, %r17, %r18, %r16;
	mov.u32 	%r20, %tid.x;
	add.s32 	%r21, %r19, %r20;
	cvt.s64.s32	%rd1, %r21;
	mov.u32 	%r22, %ctaid.y;
	mov.u32 	%r23, %ntid.y;
	mov.b32	%r24, %envreg4;
	mad.lo.s32 	%r1, %r22, %r23, %r24;
	mov.u32 	%r2, %tid.y;
    ...
```

Now in [mat_mult_use_binary](code/mat_mult_use_binary.cpp) we load the binary kernels from file using the following code:

```C++
    // Load kernel 0 from file
    char filename[50];
    sprintf(filename, "kernels_device_%01d.bin",0);

    // Get the size of the file in a portable way
    struct stat stat_buf;
    errcode = stat(filename, &stat_buf);
    assert(errcode==0);
    const size_t file_bytes=stat_buf.st_size;

    // Create a character buffer 
    unsigned char* buffer=(unsigned char *)calloc(file_bytes, sizeof(unsigned char));

    // Read the binary code into the file
    fp=fopen(filename, "r");
    assert(fp!=NULL);

    // Read the file
    fread(buffer, sizeof(unsigned char), file_bytes, fp );
  
    // Close the file
    fclose(fp);

    const unsigned char* binary_source=buffer; 
    
    // Turn the binary code into a program
    cl_program program=clCreateProgramWithBinary(   context, 
                                                    1,
                                                    &device,
                                                    &file_bytes,
                                                    &binary_source,
                                                    NULL,
                                                    &errcode);

    errchk(errcode, "Creating program from source");

    // Free the binary source code 
    free(buffer);
```

The program is compiled and run as before. 


<address>
&copy; 2018 by Dr. Toby Potter<br>
email: <a href="mailto:tobympotter@gmail.com">tobympotter@gmail.com</a><br>
Visit us at: <a href="https://www.pelagos-consulting.com">www.pelagos-consulting.com</a><br>
</address>