# OpenCL data types

We saw in the survival C++ notebook that some C datatypes are represented with a nominal number of bits depending on the platform and operating system in use. Within OpenCL kernels, a particular C datatype always uses the same number of bits, however in the main program that same C datatype might use a different number of bits. This is a problem for OpenCL applications, which promise a level of portability across implementations. In order to remedy this, the latest [OpenCL C specification](https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_C.pdf) has a number of standard data types for fixing the number of bits used, and maintaining consitency with the number of bits used for types in the kernel. As follows is a number of commonly used OpenCL data types and how many bits they use. It is **good practice** to use these data types wherever practical in OpenCL applications.

## Scalar types

| kernel C type | OpenCL standard type | Description and bits used |  
| :- | :- | :- |
| bool | NA | undefined |
| char | cl_char | 8 bits, signed two's complement integer  |
| unsigned char, uchar | cl_uchar | 8 bits, unsigned integer |
| short | cl_short | 16 bits, signed two's complement integer |
| unsigned short, ushort | cl_ushort | 16 bits, unsigned integer |
| int | cl_int | 32 bits, signed two's complement integer |
| unsigned int, uint | cl_uint | 32 bits, unsigned integer |
| long | cl_long | 64 bits, signed two's complement integer |
| unsigned long, ulong | cl_ulong | 64 bits, unsigned integer |
| half | cl_half | 16 bits, floating point number |
| float | cl_float | 32 bits, floating point number |
| double | cl_double | 64 bits, floating point number |
| size_t | NA | unknown bits, result of **sizeof** operator |
| ptrdiff_t | NA | unknown bits, signed integer type from the subtraction of one pointer from another |
| intptr_t | NA | unknown bits, pointer storage in a signed integer type |
| uintptr_t | NA | unknown bits, pointer storage in an unsigned integer type |
| void | void | unknown bits, incomplete type |

## Vector types

In addition to the standard types above, the OpenCL standard also defines a number of vector types with **n** = 2,3,4,8, and 16 elements. Amazingly, CUDA currently has support for vectors of only up to 4 elements. Vector types are useful because can help unlock vector optimisations and better memory fetching patterns. 

| kernel C type | OpenCL standard type | Description and bits used |  
| :- | :- | :- |
| char**n** | cl_char**n** | **n** x 8 bits, signed two's complement integers  |
| uchar**n** | cl_uchar**n** | **n** x 8 bits, unsigned integers |
| short**n** | cl_short**n** | **n** x 16 bits, signed two's complement integers |
| ushort**n** | cl_ushort**n** | **n** x 16 bits, unsigned integers |
| int**n** | cl_int**n** | **n** x 32 bits, signed two's complement integers |
| uint**n** | cl_uint**n** | **n** x 32 bits, unsigned integers |
| long**n** | cl_long**n** | **n** x 64 bits, signed two's complement integers |
| ulong**n** | cl_ulong**n** | **n** x 64 bits, unsigned integers |
| float**n** | cl_float**n** | **n** x 32 bits, floating point numbers |
| double**n** | cl_double**n** | **n** x 64 bits, floating point numbers |

There is a sophisticated means of indexing into a vector type within an OpenCL kernel, however from the host one has to use the **.s[index]** indexing to get at individual elements.

```C++
// Code from the host

// Declare an initialised vector
cl_float4 f = (cl_float4){0.0, 1.0, 2.0, 3.0};
    
// Could have also been done like this
//cl_float4 f = (cl_float4){0.0};

// Print out the last element
std::printf("%f\n", f.s[3]);
    
// Store a value in the last element
f.s[3] = 10.0;
    
// Print out the last element again
std::printf("%f\n", f.s[3]);
```

You might have noticed that a complex type is currently missing from the OpenCL standard. However we can use 2 component vector such as **float2** to represent complex numbers.

### Vector indexing within a kernel

Access to a vector type from within a kernel is done using dot notation. You can use **.x .y .z** and **.w** for the first four elements, or you can use **.s0, .s1, .s2, .s3, .s4, .s5, .s6, .s7, .s8, .s9, .sa, .sb, .sc, .sd, .se, .sf** to access values up to the 16th element. The cool thing about OpenCL vectors is that you can "swizzle", or permute indices (using either .xyzw or .s* but not both) to mix up the order of the vector.

```C++
// Code within a kernel

// Explicit declaration
float4 f = (float4)(1.0f, 2.0f, 3.0f, 4.0f);

// Explicit declaration
float4 v = (float4)(1.0f);

// Valid examples of swizzling
v.xyzw = f.wzyx;
v.xyzw = f.s3210;
```

You can also load and store vectors from a memory allocation using the **vloadn** and **vstoren** functions. 

```C++

// OpenCL kernel code

// Assuming arr is a memory allocation from global memory

// Load a float4 vector starting at the memory location of offset*4
float4 f = vloadn(offset, arr);

// Store a float4 vector starting at the memory location of offset*4
vstore4(f, offset, arr);
```

In order to avoid undefined behaviour the address **arr** for the allocation needs to be byte-aligned to the data type being used. Usually the OpenCL implementation and the **calloc** function align memory allocations to the largest possible alignment. So as long as you use the allocated address as the address for **vstoren** and **vloadn** functions you will be fine.