# 3D Convolution

Now, we have previously covered 2D Convolution earlier to explain the fundamental recurring operation in deep learning stacks for image processing/vision analytics today. Deep learning is "deep" because it organizes the flow of data into multiple layers ordered one after another. Typical deep learing routines are simple linear pipelines, but others can be arbitrary acyclic graphs. In all these cases, the use of convolution in any layer can be though of as a batch of multiple 2D convolutions -- or, 3D convolutions. We still operate on 2D images, and 2D kernels, but we compute of several of them in batched fashion. This is good as it represents another easy form of parallelism we can readily exploit. But is it also challenging at the same time when considering the storage requirements imposed by the third dimension for kernels as well as the resulting image maps.

If we package up a 2D convolution in the **convolve2D** method we saw earlier, it is possible to represent the 3D convolution computation as shown below: 

In [None]:
// loop over all the output maps of the convolution layer
for(int out_map=0;out_map<M;out_map++) {
    // loop over all the input maps (once for each output maps)
    for(int in_map=0;in_map<N;in_map++) {
        // convolve all input maps accumulate the result in output_maps[out_map]
        // the 2D filter kernels are stored in kernel[][] structure.
        convolve2D(input_maps[in_map],kernel[in_map][out_map],output_maps[out_map]);
    }
}

While the code above looks simple, it imposes a large operation complexity and associated memory bandwidth and storage cost. For efficient mapping, we must carefully reason about what loop to parallelize and how to parallelize it. We can visually represent this operation in the figure shown below.

![](convolve3d.1.png) 