# A Basic Convolutional NN
Let's generalize an image object based on the figure below. 
![BasicConvo](BasicConvo.png "Convolutional Definition")

In [1]:
import tensorflow as tf
import numpy as np

An image in this example is going to have multiple channels (RGB) of size 28x28 pixels. Each pixel has 3 intensities (R, G, B) instead on only one value (grey scale). Let's say there are 4 images - (batch size = 4). Now place holder is going to look like this:

In [2]:
n_of_input_images=4

x = tf.placeholder(tf.float32, [None, 28, 28, 3])

None above, indicates the number of images. The other dimensions are size (28x28 pixel) and pixel values in RGB.

Now we want to apply filters of 4x4 on each of these channels (RGB) - we need 3 weight matrices (4x4) one for each of these input channels. We also want to have 2 filters (output channels.) Then weight would be variable of type [4, 4, 3, 2]  (filter size = 4x4, input channels = 3, output channels (aka volume) = 2).

We're going to initialize all of these weights to 1 (instead of random,) just to make computations easier.

The biases there are going to be initialized 1 (one for each output channel.) All neurons in a given output channel share the same bias.

In [13]:
W = tf.Variable(tf.ones([4, 4, 3, 2]))
b = tf.Variable(tf.ones([2]))

Let's define the TF conv2d operation. Strides dimensions indicates move one pixel in x and y directions, no padding (See [this](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d))

For the 'VALID' padding, the output height and width are computed as:

1. out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
2. out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

In [27]:
m = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='VALID') + b
m

<tf.Tensor 'add_16:0' shape=(?, 25, 25, 2) dtype=float32>

Let's create the session object and evaluate m by feeding x with all values as ones (just 2 images)

In [7]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())# Initialize W and b
z = sess.run(m, feed_dict={x:np.ones([n_of_input_images, 28, 28, 3])})
z.shape

(4, 25, 25, 2)

Each element would be 49. Let's verify this - one channel matrix [4, 4] - all ones with input [4, 4] all ones, weighted sum would be 16, 3 channels (so three weight matrix multiplications) = 16 + 16 + 16 = 48 then finally add bias 1 to it.

In [24]:
z

array([[[[ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.],
         ..., 
         [ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.]],

        [[ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.],
         ..., 
         [ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.]],

        [[ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.],
         ..., 
         [ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.]],

        ..., 
        [[ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.],
         ..., 
         [ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.]],

        [[ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.],
         ..., 
         [ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.]],

        [[ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.],
         ..., 
         [ 49.,  49.],
         [ 49.,  49.],
         [ 49.,  49.]]],


       [[[ 49.,  49.],
         [ 49.,  49.],
        

2 activation maps (or output channels) x n_of_input_images