## Tensors

* constant tensor `tf.constant()`
    * Value of tensor never changes, hence *constant*.
    * `tf.constant(1234)` is a 0-dimensional int32 tensor
    * `tf.constant([1,2,3,4])` is a 4-dimensional int32 tensor

Sample code:

In [398]:
import tensorflow as tf
import numpy as np

# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')

## Session
* An environment for running a graph. In charge of allocating the operations to GPU(s) and/or CPU(s).

Continuing our example:

In [2]:
with tf.Session() as sess:
    output = sess.run(hello_constant)
    print(output)

Hello World!


## Input

* `tf.placeholder()`: returns a tensor that gets it’s value from data passed to the `tf.session.run()` function, allowing you to set the input right before the session runs.
* `feed_dict`: Use the feed_dict parameter in tf.session.run() to set the placeholder tensor. 

Example: 


In [3]:
a = tf.placeholder(tf.string)
b = tf.placeholder(tf.int32)
c = tf.placeholder(tf.float32)
with tf.Session() as sess:
    output = sess.run(a, feed_dict={a: 'hi', b: 23, c: 32.0})
    print(output)

hi


It also works if you feed it only `{a: 'hi'}`, i.e. the relevant placeholder value(s).


## Maths

In [3]:
# Add, subtract, multiply and divide operations
# sub and mul has been removed in v1.0.1
add = tf.add(5, 2) # 7
# sub = tf.sub(10, 4) # 6
# mul = tf.mul(2, 5)  # 10
sub = tf.subtract(10, 4) # 6
mul = tf.multiply(2, 5)  # 10
div = tf.div(10, 5) # 2

with tf.Session() as sess:
    output = [sess.run(add), sess.run(sub), sess.run(mul), 
              sess.run(div)]
    print(output)

[7, 6, 10, 2]


[TF Math documentation](https://www.tensorflow.org/versions/r0.11/api_docs/python/math_ops.html)

## Variables

* `tf.Variable()` function creates a tensor with an initial value that can be modified later, much like a normal Python variable. This tensor stores it’s state in the session, so you must use the `tf.initialize_all_variables()` function to initialize the state of the tensor.


In [7]:
# Initialisation

def variables():
    output = None
    
    x = tf.Variable([1, 2, 3, 4])
    
    # Initialise all variables
    # init = tf.initialize_all_variables()
    """initialize_all_variables (from tensorflow.python.ops.variables) is deprecated 
    and will be removed after 2017-03-02."""
    init = tf.global_variables_initializer()
    
    with tf.Session() as sess:
        sess.run(init)
        output = sess.run(x)
    
    return output

variables()

array([1, 2, 3, 4], dtype=int32)

In [9]:
# Logistic Regression

def logits():
    output = None
    x_data = [[1.0, 2.0], [2.5, 6.3]]
    test_weights = [[-0.3545495, -0.17928936], [-0.63093454, 0.74906588]]
    class_size = 2
    
    
    x = tf.placeholder(tf.float32, shape=(2, 2))
    weights = tf.Variable(test_weights)
    biases = tf.Variable(tf.zeros([class_size]))
    
    # ToDo: Implement wx + b in TensorFlow
    logits = tf.matmul(weights, x)
    
    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init)
        output = sess.run(logits, feed_dict={x: x_data})
        
    return output

logits()

array([[-0.80277288, -1.83862185],
       [ 1.24173021,  3.4572463 ]], dtype=float32)

In [38]:
# Logistic Regression, modified


output = None
x_data = np.array([1., 2.]).reshape((2, 1))
test_weights = [[1., 0.], [2., 1.]]
class_size = 2


#     x = tf.placeholder(tf.float32)
x = tf.placeholder(tf.float32, shape=(2, 1))
w = tf.Variable(test_weights)
b = tf.Variable(tf.zeros([class_size]))
b = tf.Variable([[0.], [0.]])
# all op has to be created and run!
b_update = tf.assign(b, [[10.], [3.]])


# Implement wx + b in TensorFlow
logits = ((tf.matmul(w, x) + b) )
logits = tf.sigmoid(logits)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    output = sess.run(logits, feed_dict={x: x_data})
    print('Before update:\n{}'.format(output))
    sess.run(b_update) # crititcal!
    output = sess.run(logits, feed_dict={x: x_data})
    print('After update:\n{}'.format(output))
    tf.Print(logits, [logits], message='This is logits:')


Before update:
[[ 0.7310586 ]
 [ 0.98201376]]
After update:
[[ 0.99998331]
 [ 0.999089  ]]


In [32]:
tf.Print(logits, [logits])

<tf.Tensor 'Print_1:0' shape=(2, 1) dtype=float32>

In [34]:
print(logits)

Tensor("Sigmoid_14:0", shape=(2, 1), dtype=float32)


In [61]:
import tensorflow as tf
import numpy as np

x = tf.Variable([1, 2])
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)

print(x.eval())

# x.assign(1)
sess.run(tf.assign(x, [3, 4]))
print(x.eval())

[1 2]
[3 4]


In [57]:
a = tf.Variable([[1], [4], [10]])
b = tf.Variable(tf.zeros([3, 1], dtype=np.int32))
c = tf.Variable(tf.zeros([1, 3], dtype=np.int32))
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    output = sess.run(b + c)
    print('b = {}'.format(sess.run(b)))
    print('c = {}'.format(sess.run(c)))
    print('b+c = {}'.format(sess.run(b+c))) # uses broadcast

b = [[0]
 [0]
 [0]]
c = [[0 0 0]]
b+c = [[0 0 0]
 [0 0 0]
 [0 0 0]]


## Difference between placeholder and Variable (and Tensor)
[Link](http://stackoverflow.com/questions/36693740/whats-the-difference-between-tf-placeholder-and-tf-variable)

- Placeholder is used to store the input training data and given labels. The Variables are used to store the weights to be trained. In other words, **placeholder**s are used to train **Variable**s. 
- Variables has to be initialized while placeholder just need to be specified datatype and size.

### How about Tensor itself?
- A Tensor object is a symbolic handle to the result of an operation, but does not actually hold the values of the operation's output.
- Therefore it is impossible to check the value of a tensor without running the graph (a collection of operations on tensors).


In [72]:
a = tf.constant([np.log(100), np.log(10)])
b = tf.nn.softmax(a)
print(tf.Session().run(a))
print(tf.Session().run(b))

[ 4.60517025  2.30258512]
[ 0.90909088  0.09090908]


## Softmax

Turns logits into probabilities that sum to 1.
* `tf.nn.softmax()`.

Example of how it works:

```
# logits is a one-dimensional array with 3 elements
logits = [1.0, 2.0, 3.0]
# softmax will return a one-dimensional array with 3 elements
print softmax(logits)

[ 0.09003057  0.24472847  0.66524096]

# logits is a two-dimensional array
logits = np.array([
    [1, 2, 3, 6],
    [2, 4, 5, 6],
    [3, 8, 7, 6]])
# softmax will return a two-dimensional array with the same shape
print softmax(logits)


[
    [ 0.09003057  0.00242826  0.01587624  0.33333333]
    [ 0.24472847  0.01794253  0.11731043  0.33333333]
    [ 0.66524096  0.97962921  0.86681333  0.33333333]
]
```

In [161]:
# Softmax function in ram Python

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    # TODO: Compute and return softmax(x)
    # S(y_i) = (e**(y_i) / sum_over_j(e**y_j))
    # sum over last dim
    return np.exp(x) / np.reshape(np.sum(np.exp(x), axis=-1), list(x.shape[:-1]) + [1]) 
    # sum over the first dimention
#     return np.exp(x) / np.sum(np.exp(x), axis=0) 

In [162]:
logits = np.array([
    [1, 2, 3, 6],
    [2, 4, 5, 6],
    [3, 8, 7, 6]])
softmax(logits)

array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047],
       [ 0.01203764,  0.08894682,  0.24178252,  0.65723302],
       [ 0.00446236,  0.66227241,  0.24363641,  0.08962882]])

That's some elegant Numpy code.

In [165]:
# Softmax with TF

import tensorflow as tf


def run():
    output = None
    logit_data = np.log([10.0, 1.0, 0.1])
    logits = tf.placeholder(tf.float32)
    
    # ToDo: Calculate the softmax of the logits
    softmax = tf.nn.softmax(logits)    
    
    with tf.Session() as sess:
        # ToDo: Feed in the logits data
        output = sess.run(softmax, feed_dict={logits: logit_data})

    return output
run()

array([ 0.9009009 ,  0.09009008,  0.00900901], dtype=float32)

Scaling and Softmax
* When you divide all the logits by e.g. 10, the probabilities get closer to the uniform distribution.
* When you multiply all the logits by e.g. 10, the probabilities get closer to 0.0 or 1.0.

## One-Hot Encodings
* Vectors with one 1.0 and 0.0 everywhere else.


## ReLUs: f(x) = max(0,x)
*Adding nonlinearities*

A Rectified linear unit (ReLU) is type of **activation function** that is defined as `f(x) = max(0, x)`. The function returns 0 if `x` is negative, otherwise it returns `x`. TensorFlow provides the ReLU function as `tf.nn.relu()`, as shown below.

![](images/relu.png)

In [None]:
# Hidden Layer with ReLU activation function
hidden_layer = tf.add(tf.matmul(features, weights), biases)
hidden_layer = tf.nn.relu(hidden_layer)

output = tf.add(tf.matmul(hidden_layer, weights), biases)

The above code applies the `tf.nn.relu()` function to the `hidden_layer`, effectively turning off any negative weights and acting like an on/off switch. Adding additional layers, like the output layer, after an activation function turns the model into a nonlinear function. This nonlinearity allows the network to solve more complex problems.



It's interesting how you just add `hidden_layer=tf.nn.relu(hidden_layer)`.

In [168]:
# Solution is available in the other "solution.py" tab
def run():
    output = None
    hidden_layer_weights = [
        [0.1, 0.2, 0.4],
        [0.4, 0.6, 0.6],
        [0.5, 0.9, 0.1],
        [0.8, 0.2, 0.8]]
    out_weights = [
        [0.1, 0.6],
        [0.2, 0.1],
        [0.7, 0.9]]

    # Weights and biases
    weights = [
        tf.Variable(hidden_layer_weights),
        tf.Variable(out_weights)]
    biases = [
        tf.Variable(tf.zeros(3)),
        tf.Variable(tf.zeros(2))]

    # Input
    features = tf.Variable([[1.0, 2.0, 3.0, 4.0], 
                            [-1.0, -2.0, -3.0, -4.0], 
                            [11.0, 12.0, 13.0, 14.0]])

    # Model
    hidden_layer = tf.matmul(features, weights[0]) + biases[0]
    # ToDo: Apply activation using a single Relu
    hidden_layer = tf.nn.relu(hidden_layer)
    logits = tf.matmul(hidden_layer, weights[1]) + biases[1]

    # Calculate logits
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        output = sess.run(logits)

    return output


In [169]:
run()

array([[  5.11000013,   8.44000053],
       [  0.        ,   0.        ],
       [ 24.01000214,  38.23999786]], dtype=float32)

## DNN in Tensorflow

In [170]:
import tensorflow as tf

### Import data

In [188]:
help(tf.zeros)
# tf.Session().run(tf.zeros((4, 3)))

Help on function zeros in module tensorflow.python.ops.array_ops:

zeros(shape, dtype=tf.float32, name=None)
    Creates a tensor with all elements set to zero.
    
    This operation returns a tensor of type `dtype` with shape `shape` and
    all elements set to zero.
    
    For example:
    
    ```python
    tf.zeros([3, 4], tf.int32) ==> [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
    ```
    
    Args:
      shape: Either a list of integers, or a 1-D `Tensor` of type `int32`.
      dtype: The type of an element in the resulting `Tensor`.
      name: A name for the operation (optional).
    
    Returns:
      A `Tensor` with all elements set to zero.



In [172]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True)
# Udacity version included reshape=False but this got the 
# 'unexpected keyword' error

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting ./train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting ./train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting ./t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting ./t10k-labels-idx1-ubyte.gz


### Quick peep into data

In [228]:
%matplotlib inline
import pandas as pd
pd.DataFrame(mnist.train._labels).head()


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0,0,0,0,0,0,0,1,0,0
1,0,0,0,1,0,0,0,0,0,0
2,0,0,0,0,1,0,0,0,0,0
3,0,0,0,0,0,0,1,0,0,0
4,0,1,0,0,0,0,0,0,0,0


In [226]:
mnist.train._images.shape

(55000, 784)

### Learning parameters

In [267]:
# Learning Parameters
## Usually we have to find these.
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28), i.e., input feature dimension
n_classes = 10  # MNIST total classes (0-9 digits)

### Hidden layer width

In [232]:
n_hidden_layer = 256 # layer number of features (width of a layer)

### Weights and biases

**Note:** 
All matrix is of (sample x feature) size, or (input x output) size
- Every row is a sample vector (input)
- Every col is a feature vector (output)

Broadcast rules:
```
m samples, k features
(m, k) + (m,) --> (m, k) + (1, m) --> broadcast failure!
(m, k) + (k,) --> (m, k) + (1, k) --> broadcast success!
```

In [255]:
# Store layers weight & bias
# random initialization
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

### Input

In [270]:
# tf Graph input
# the size of the samples is TBD, ie `None`
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
x_flat = x

### Shape
- tf.shape() creates an op. It has to be run to return the real shape of the tensor
- x.get_shape() gets the real shape

In [299]:
tf.shape(weights['hidden_layer']) # shape of the op, means 2-D

<tf.Tensor 'Shape_56:0' shape=(2,) dtype=int32>

In [285]:
tf.Session().run(tf.shape(weights['hidden_layer']))

array([784, 256], dtype=int32)

In [292]:
((weights['hidden_layer'])._variable).shape

TensorShape([Dimension(784), Dimension(256)])

In [290]:
weights['hidden_layer'].get_shape()

TensorShape([Dimension(784), Dimension(256)])

In [298]:
# convert to normal python shape
weights['hidden_layer'].get_shape().as_list()

[784, 256]

In [248]:
# tf Graph input
x = tf.placeholder("float", [None, 784, 1])
y = tf.placeholder("float", [None, n_classes])
x_flat = x

# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

# The MNIST data is made up of 28px by 28px images with a single channel. 
# `tf.reshape` reshapes a batch of 28*28 pixels, x, to a batch of 784 pixels.
x_flat = tf.reshape(x, [-1, n_input])

In [344]:
# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

# The MNIST data is made up of 28px by 28px images with a single channel. 
# `tf.reshape` reshapes a batch of 28*28 pixels, x, to a batch of 784 pixels.
x_flat = tf.reshape(x, [-1, n_input])

### Multilayer Perceptron

In [345]:
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.matmul(layer_1, weights['out']) + biases['out']

*So we're putting RELUs in between layers with weights and biases in them to allow for more complexity. Here we have two layers sandwiching a ReLU.*

In [346]:
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

In [347]:
print(batch_size)
print(batch_x.shape)
print(batch_y.shape)

100
(100, 784)
(100, 10)


In [404]:
# Initializing the variables
init = tf.global_variables_initializer()


display_step = 1

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        total_c = 0
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost], feed_dict={x_flat: batch_x, y: batch_y})
            total_c += c
        if (epoch % display_step == 0):
            print('epoch\t{}\tcost\t{:.3f}'.format(epoch, c))

epoch	0	cost	35.951
epoch	1	cost	24.361
epoch	2	cost	17.062
epoch	3	cost	14.131
epoch	4	cost	15.133
epoch	5	cost	12.878
epoch	6	cost	7.694
epoch	7	cost	9.774
epoch	8	cost	11.218
epoch	9	cost	18.743
epoch	10	cost	6.224
epoch	11	cost	5.963
epoch	12	cost	6.953
epoch	13	cost	8.625
epoch	14	cost	7.692


In [358]:
'''
A Multilayer Perceptron implementation example using TensorFlow library.
This example is using the MNIST database of handwritten digits
(http://yann.lecun.com/exdb/mnist/)

Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''

from __future__ import print_function

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True)

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1

# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_hidden_2 = 256 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])


# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)
    # Output layer with linear activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))


Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz
Epoch: 0001 cost= 183.440410611
Epoch: 0002 cost= 40.923661626
Epoch: 0003 cost= 25.367347529
Epoch: 0004 cost= 17.637425760
Epoch: 0005 cost= 12.756124606
Epoch: 0006 cost= 9.527275549
Epoch: 0007 cost= 6.966637792
Epoch: 0008 cost= 5.269151712
Epoch: 0009 cost= 3.955956824
Epoch: 0010 cost= 2.966537620
Epoch: 0011 cost= 2.245004738
Epoch: 0012 cost= 1.623642844
Epoch: 0013 cost= 1.213023317
Epoch: 0014 cost= 1.068061601
Epoch: 0015 cost= 0.871373281
Optimization Finished!
Accuracy: 0.9475


## Dropout

Dropout is a regularization technique for reducing overfitting. The technique temporarily drops units (Artificial Neurons) from the network, along with all its incoming and outgoing connections as shown in Figure 1.
*Presumably it then drops those units if it obtains better results when dropping those units.*

`tf.nn.dropout()`


In [None]:
keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

The first parameter, hidden_layer, is the tensor that is regularized using dropout.

The second parameter, keep_prob, is the probability of keeping (i.e. not dropping) any given unit.

keep_prob allows you to adjust the number of units to drop. In order to compensate for dropped units, tf.nn.dropout() multiplies all units that are kept (i.e. not dropped) by 1/keep_prob.

During training, a good starting value for keep_prob is 0.5.

During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of the model.

Added: 
* `hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)`
and 
* `feed_dict` portion of
`output = sess.run(logits, feed_dict={keep_prob: 0.5})`

In [410]:
# Solution is available in the other "solution.py" tab
import tensorflow as tf


def run():
    output = None
    hidden_layer_weights = [
        [0.1, 0.2, 0.4],
        [0.4, 0.6, 0.6],
        [0.5, 0.9, 0.1],
        [0.8, 0.2, 0.8]]
    out_weights = [
        [0.1, 0.6],
        [0.2, 0.1],
        [0.7, 0.9]]

    # Weights and biases
    weights = [
        tf.Variable(hidden_layer_weights),
        tf.Variable(out_weights)]
    biases = [
        tf.Variable(tf.zeros(3)),
        tf.Variable(tf.zeros(2))]
        
    keep_prob = tf.placeholder(tf.float32)

    # Input
    features = tf.Variable([[0.0, 2.0, 3.0, 4.0], 
                            [0.1, 0.2, 0.3, 0.4], 
                            [11.0, 12.0, 13.0, 14.0]])

    # Model
    hidden_layer = tf.matmul(features, weights[0]) + biases[0]
    hidden_layer = tf.nn.relu(hidden_layer)
    # Add dropout
    hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)
    
    logits = tf.matmul(hidden_layer, weights[1]) + biases[1]

    # Calculate logits
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        output = sess.run(logits, feed_dict={keep_prob: 0.5})

    return output
run()

array([[  1.10000002,   6.60000038],
       [  0.11200001,   0.67200011],
       [  4.71999979,  28.31999969]], dtype=float32)

## Convolution layer
* `tf.nn.conv2d()`
    * Computes convolution. TensorFlow uses a stride for each input dimension, [batch, input_height, input_width, input_channels].
* `tf.nn.bias_add()`
    * adds a 1-d bias to the last dimension in a matrix.
    


You'll focus on changing input_height and input_width while setting batch and input_channels to 1. The input_height and input_width strides are for striding the filter over input. In the example code, I'm using a stride of 2 with 5x5 filter over input.



In [417]:
# Output depth
k_output = 64

# Image Properties
image_width = 10
image_height = 10
color_channels = 3

# Convolution filter
filter_size_width = 5
filter_size_height = 5

# Input/Image
input = tf.placeholder(
    tf.float32,
    # NHWC order [batch, height, width, channels]
    shape=[None, image_width, image_height, color_channels])

# Weight and bias
weight = tf.Variable(tf.truncated_normal(
    [filter_size_width, filter_size_height, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))

# Apply Convolution
# strides must take 1-D of length 4 Must have strides[0] = strides[3] = 1. 
# For the most common case of the same horizontal and vertices strides, 
# strides = [1, stride, stride, 1].
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)

In [415]:
print(input.get_shape())
print(weight.get_shape())
print(conv_layer.get_shape())

(?, 10, 10, 3)
(5, 5, 3, 64)
(?, 5, 5, 64)


## Max Pooling

`tf.nn.max_pool()`

The image above is an example of max pooling with a 2x2 filter and stride of 2. The four 2x2 colors represent each time the filter was applied to find the maximum value.

* **Benefits of max pooling**: reduces the size of the input, and allow the neural network to focus on only the most important elements. 
* **Method**: Max pooling does this by only retaining the maximum value for each filtered area, and removing the remaining values.



In [None]:
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
conv_layer = tf.nn.bias_add(conv_layer, bias)
conv_layer = tf.nn.relu(conv_layer)
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
    conv_layer,
    ksize=[1, 2, 2, 1],  # size of window
    strides=[1, 2, 2, 1],# size of stride
    padding='SAME')

The tf.nn.max_pool() function performs max pooling with the ksize parameter as the size of the filter and the strides parameter as the length of the stride. 2x2 filters with a stride of 2x2 are common in practice.

The ksize and strides parameters are structured as 4-element lists, with each element corresponding to a dimension of the input tensor ([batch, height, width, channels]). For both ksize and strides, the batch and channel dimensions are typically set to 1.