[Last time](https://github.com/opetrova/TweakingInception/blob/master/GettingStarted.ipynb) we went over loading the pre-trained [Inception5h model](https://arxiv.org/abs/1409.4842) and running it on our own input images in excrutiating level of detail. First lets go ahead and load the model:

In [1]:
import tensorflow as tf
import numpy as np

model_fn = 'tensorflow_inception_graph.pb'
mygraph = tf.Graph() 
sess = tf.InteractiveSession(graph=mygraph)

with tf.gfile.FastGFile(model_fn, 'rb') as f: 
    graph_def = tf.GraphDef() 
    graph_def.ParseFromString(f.read())  
    
t_input = tf.placeholder(np.float32, name='input') 
tf.import_graph_def(graph_def, {'input':t_input})

layers = [op.name for op in mygraph.get_operations() if op.type=='Conv2D' and 'import/' in op.name]
print('Number of Conv2D layers: ', len(layers))

  from ._conv import register_converters as _register_converters


Number of Conv2D layers:  59


Andrew Ng has an excellent course on [Convolutional Neural Networks](https://www.coursera.org/learn/convolutional-neural-networks) on Coursera. If you just want to know what convolution layers are, the [second video of the course](https://www.coursera.org/lecture/convolutional-neural-networks/edge-detection-example-4Trod) goes through an example of a convolution operation starting around 01:20. Basically, you can think of covering a part of your input image with a *filter*, the result of that being your output (both your input image and the filter are nothing but numbers, you just have to multiply one by the other in a particular way to get the "output" number), then sliding the filter to a different part of the image. Repeat until you've dragged the filter all over the - say, two-dimensional - input, and produced the two-dimensional output. 
Naturally, the filter should be smaller than your input image - it could be 3x3, 5x5, etc (even 1x1 which is very useful for some very specific purposes that I might discuss some other time). The thing about filters is that different filters detect different things. These things could be a vertical edge in the input, a horizontal line, a wavy squiggle at 45 degrees, etc. Roughly speaking, if the input contains what the filter is looking for, the output is maximized. As a soon-to-be-former-physicist, I like convolutional NNs because they make use of the [translational invariance](https://en.wikipedia.org/wiki/Translational_symmetry) property: if you are interested in whether the input image contains a cat, you don't care *where* the cat is located on the image. That's why filters that *slide* over the image come in handy!

Continuing [my hand-waving guide to deep learning](https://www.olgapaints.net/blog/2018/7/27/inception-getting-started), lets see what the computer is actually learning in this scenario. When you start showing it images, at first the computer has no idea what kind of features to look for, so initially the filters are just matrices populated with random numbers (this should start to sound familiar). In an image classification problem, the network classifies each image *(input)* as belonging to a particular category *(output)*. While training, the computer compares its output to the correct answers, and adjusts the *trainable variables* **W** to get its output(s) closer to those correct values.

In convolutional layers, **W** are the filters: for a 3x3 filter, there is a total of nine values the computer will learn and later will use to classify the new input images it has not seen before. In a given layer, there are often multiple filters of the same size that are detecting different features (e.g., one for vertical and one for horizontal lines, etc). Lets see what kind of convolutional layers is the Inception network made up of:

In [4]:
import re 

for i in range(len(layers)):
    layers[i] = re.search('import/(.+?)/conv',layers[i]).group(1)
    
print(layers)

['conv2d0_pre_relu', 'conv2d1_pre_relu', 'conv2d2_pre_relu', 'mixed3a_1x1_pre_relu', 'mixed3a_3x3_bottleneck_pre_relu', 'mixed3a_3x3_pre_relu', 'mixed3a_5x5_bottleneck_pre_relu', 'mixed3a_5x5_pre_relu', 'mixed3a_pool_reduce_pre_relu', 'mixed3b_1x1_pre_relu', 'mixed3b_3x3_bottleneck_pre_relu', 'mixed3b_3x3_pre_relu', 'mixed3b_5x5_bottleneck_pre_relu', 'mixed3b_5x5_pre_relu', 'mixed3b_pool_reduce_pre_relu', 'mixed4a_1x1_pre_relu', 'mixed4a_3x3_bottleneck_pre_relu', 'mixed4a_3x3_pre_relu', 'mixed4a_5x5_bottleneck_pre_relu', 'mixed4a_5x5_pre_relu', 'mixed4a_pool_reduce_pre_relu', 'mixed4b_1x1_pre_relu', 'mixed4b_3x3_bottleneck_pre_relu', 'mixed4b_3x3_pre_relu', 'mixed4b_5x5_bottleneck_pre_relu', 'mixed4b_5x5_pre_relu', 'mixed4b_pool_reduce_pre_relu', 'mixed4c_1x1_pre_relu', 'mixed4c_3x3_bottleneck_pre_relu', 'mixed4c_3x3_pre_relu', 'mixed4c_5x5_bottleneck_pre_relu', 'mixed4c_5x5_pre_relu', 'mixed4c_pool_reduce_pre_relu', 'mixed4d_1x1_pre_relu', 'mixed4d_3x3_bottleneck_pre_relu', 'mixed4d_3

In [None]:
sess.close()