GPUPluginMemoryFormats

GPU memory formats

The memory format descriptor in clDNN usually uses the following letters:

b - batch
f - features/channels
w, z, y, x - spatial dimensions
i - input channels (for weights layout only)
o - output channels (for weights layout only)
g - groups (for weights layout only)

The combination of the characters above defines tensor format, i.e. the actual layout of tensor values in memory buffer. For example: bfyx format means that the tensor has 4 dimensions in planar layout and x coordinate changes faster than y, y - faster than f, and so on. It means that for tensor with size [b: 2; f: 2; y: 2; x: 2] we have a linear memory buffer with size=16 where:

i = 0  => [b=0; f=0; y=0; x=0];
i = 1  => [b=0; f=0; y=0; x=1];

i = 2  => [b=0; f=0; y=1; x=0];
i = 3  => [b=0; f=0; y=1; x=1];

i = 4  => [b=0; f=1; y=0; x=0];
i = 5  => [b=0; f=1; y=0; x=1];

i = 6  => [b=0; f=1; y=1; x=0];
i = 7  => [b=0; f=1; y=1; x=1];

i = 8  => [b=1; f=0; y=0; x=0];
i = 9  => [b=1; f=0; y=0; x=1];

i = 10 => [b=1; f=0; y=1; x=0];
i = 11 => [b=1; f=0; y=1; x=1];

i = 12 => [b=1; f=1; y=0; x=0];
i = 13 => [b=1; f=1; y=0; x=1];

i = 14 => [b=1; f=1; y=1; x=0];
i = 15 => [b=1; f=1; y=1; x=1];

Usually, planar memory formats are not very efficient for DNN operations, so clDNN has plenty blocked format. Blocking means that we take some tensor dimension and put blocks of adjacent elements closer in memory (in the format with single blocking they are stored lineary in the memory). Consider the most widely used blocked format in cldnn: b_fs_yx_fsv16. First of all, let's understand what these additional letters mean. We have b, f, y, x dimensions here, so this is 4D tensor. fs=CeilDiv(f, block_size); fs means feature slice - the blocked dimension. The block size is specified in the format name: fsv16 - block_size = 16, blocked dimension is f; fsv means feature slice vector Just like with any other layout, the coordinate of the rightmost dimension (fsv) is changed first, then coordinate to the left (x), and so on.

Note: if the original f dimension is not divisible by block size (16 in this case), then it's aligned up to the first divisible value. These pad values are filled with zeroes.

Let's look at the changes with the tensor above if we reorder it into b_fs_yx_fsv16 format:

Actual buffer size becomes [b: 2; f: 16; y: 2; x: 2], and total size = 128
The order of elements in memory changes:

// first batch
i = 0   => [b=0; f=0;  y=0; x=0] == [b=0; fs=0; y=0; x=0; fsv=0];
i = 1   => [b=0; f=1;  y=0; x=0] == [b=0; fs=0; y=0; x=0; fsv=1];
i = 2   => [b=0; f=2;  y=0; x=0] == [b=0; fs=0; y=0; x=0; fsv=2];
...
i = 15  => [b=0; f=15; y=0; x=0] == [b=0; fs=0; y=0; x=0; fsv=15];

i = 16  => [b=0; f=0;  y=0; x=1] == [b=0; fs=0; y=0; x=1; fsv=0];
i = 17  => [b=0; f=1;  y=0; x=1] == [b=0; fs=0; y=0; x=1; fsv=1];
i = 18  => [b=0; f=1;  y=0; x=1] == [b=0; fs=0; y=0; x=1; fsv=2];
...
i = 31  => [b=0; f=15; y=0; x=1] == [b=0; fs=0; y=0; x=1; fsv=15];

i = 32  => [b=0; f=0;  y=1; x=0] == [b=0; fs=0; y=1; x=0; fsv=0];
i = 33  => [b=0; f=1;  y=1; x=0] == [b=0; fs=0; y=1; x=0; fsv=1];
i = 34  => [b=0; f=1;  y=1; x=0] == [b=0; fs=0; y=1; x=0; fsv=2];
...
i = 47  => [b=0; f=15; y=1; x=0] == [b=0; fs=0; y=1; x=0; fsv=15];

i = 48  => [b=0; f=0;  y=1; x=1] == [b=0; fs=0; y=1; x=1; fsv=0];
i = 49  => [b=0; f=1;  y=1; x=1] == [b=0; fs=0; y=1; x=1; fsv=1];
i = 50  => [b=0; f=1;  y=1; x=1] == [b=0; fs=0; y=1; x=1; fsv=2];
...
i = 63  => [b=0; f=15; y=1; x=1] == [b=0; fs=0; y=1; x=1; fsv=15];

// second batch
i = 64  => [b=1; f=0;  y=0; x=0] == [b=1; fs=0; y=0; x=0; fsv=0];
i = 65  => [b=1; f=1;  y=0; x=0] == [b=1; fs=0; y=0; x=0; fsv=1];
i = 66  => [b=1; f=2;  y=0; x=0] == [b=1; fs=0; y=0; x=0; fsv=2];
...
i = 79  => [b=1; f=15; y=0; x=0] == [b=1; fs=0; y=0; x=0; fsv=15];

i = 80  => [b=1; f=0;  y=0; x=1] == [b=1; fs=0; y=0; x=1; fsv=0];
i = 81  => [b=1; f=1;  y=0; x=1] == [b=1; fs=0; y=0; x=1; fsv=1];
i = 82  => [b=1; f=1;  y=0; x=1] == [b=1; fs=0; y=0; x=1; fsv=2];
...
i = 95  => [b=1; f=15; y=0; x=1] == [b=1; fs=0; y=0; x=1; fsv=15];

i = 96  => [b=1; f=0;  y=1; x=0] == [b=1; fs=0; y=1; x=0; fsv=0];
i = 97  => [b=1; f=1;  y=1; x=0] == [b=1; fs=0; y=1; x=0; fsv=1];
i = 98  => [b=1; f=1;  y=1; x=0] == [b=1; fs=0; y=1; x=0; fsv=2];
...
i = 111 => [b=1; f=15; y=1; x=0] == [b=1; fs=0; y=1; x=0; fsv=15];

i = 112 => [b=1; f=0;  y=1; x=1] == [b=1; fs=0; y=1; x=1; fsv=0];
i = 113 => [b=1; f=1;  y=1; x=1] == [b=1; fs=0; y=1; x=1; fsv=1];
i = 114 => [b=1; f=1;  y=1; x=1] == [b=1; fs=0; y=1; x=1; fsv=2];
...
i = 127 => [b=1; f=15; y=1; x=1] == [b=1; fs=0; y=1; x=1; fsv=15];

All clDNN formats are specified in inference-engine/thirdparty/clDNN/api/cldnn/runtime/tensor.hpp file. Most of the formats there follow the notation above.

Home
General resources
- Getting started
- Contribute
  - Google Summer of Code
How to build
Developer documentation
- Inference Engine architecture
- CPU plugin
- GPU plugin
- HETERO plugin architecture
- Snippets
- Sample for IE C++/C/Python API
- Proxy plugin (Concept)
Tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPUPluginMemoryFormats

GPU memory formats

Clone this wiki locally