# Chapter 6 - Deep Feedforward Networks

## Table of Contents

1. [Introduction](#intro)
2. []()
3.
4.
5.
6. [References](#ref)

### 2.1 Introduction and gradient based learning <a id = "intro"></a>

#### 2.1.1 Synthesis

* They aim to approximate a function
* Raw input values get translated into features in the hidden layers, such that in th eoutput layer, the function is lineary solvable
* There is no feedback in the case of feedforward networks
* Gradient based learning - Loss unfctions become nonconvex, meaning that gradient based optimizers just drive the value of the loss function a low value (local optima), as opposed to looking for global optima
* Important to initialize weights to small random values, and biases to 0 or small positive values

#### 2.1.2 Experiments

##### 2.1.2.1 Learning XOR

In [1]:
import tensorflow as tf
import time
import numpy as np

ModuleNotFoundError: No module named 'tensorflow'

* The XOR function returns 1 when EXACTLY one of the inputs is 1
* We need the hidden layer as the XOR function cannot be represented as direct linear transformation of the inputs

![Why XOR cannot be represented as a linear function](XOR_Linear_Problem.png)

* This is because, when x1 = 0, model's output must *increase* as x2 increases. However, when x1 = 1, the models ouput must *decrease* as x2 increases. A linear mode must also apply a fixed coefficient w2 on x2. This becomes impossible
* Instead, in the learned feature space, we map the two inputs \[0,1\] and \[1,0\] to same point in the learned feature space (h). h = \[0,1\]
* The linear model can now be applied on this - increasing in h1 and decreasing in h2

![A simple network to solve the XOR problem](XOR_network.png)


**First layer**
The first layer is the input layer, which takes in the 4 possible inputs of diemension 2

In [10]:
# First input layer
X = tf.placeholder(tf.float32, shape = [4,2], name = 'X')

# Actuals

Y = tf.placeholder(tf.float32, shape=[4,1], name = 'Y')

**Second layer**
* The second layer contains the hidden units, which compute an affine transformation on the raw inputs
* An affine transformation basically performs a linear transformation and adds an intercept, h = xT \* W + c, where t and b are the parameters to be learnt
* W is a square matrix, which performs a linear transformation on X
* W will have n = 2, and hence m = 2
* c will be a column vector of length of length 2

In [11]:
W = tf.Variable(tf.truncated_normal([2,2]), name = 'W')
c = tf.Variable(tf.truncated_normal([4,2]), name = 'c')

wt = tf.Variable(tf.zeros([2,1]), name = 'w')
b = tf.Variable(tf.zeros([4,1]), name = 'b')

In [12]:
## Defining the layers

with tf.name_scope("hidden_layer") as scope:
    h = tf.nn.relu(tf.add(tf.matmul(X,W), c))
    
with tf.name_scope("output") as scope:
    y_estimated = tf.sigmoid(tf.add(tf.matmul(h, wt), b))
    
with tf.name_scope("loss") as scope:
    loss = tf.reduce_mean(tf.squared_difference(y_estimated, Y))


In [14]:
# Training

with tf.name_scope("train") as scope:
    train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
    
INPUT_XOR = [[0,0],[0,1],[1,0],[1,1]]
OUTPUT_XOR = [[0],[1],[1],[0]]

init = tf.global_variables_initializer()
sess = tf.Session()

writer = tf.summary.FileWriter("./logs/xor_logs", sess.graph)

sess.run(init)

t_start = time.clock()
for epoch in range(100001):
    sess.run(train_step, feed_dict={X: INPUT_XOR, Y: OUTPUT_XOR})
    if epoch % 10000 == 0:
        print("_"*80)
        print('Epoch: ', epoch)
        print('   y_estimated: ')
        for element in sess.run(y_estimated, feed_dict={X: INPUT_XOR, Y: OUTPUT_XOR}):
            print('    ',element)
        print('   W: ')
        for element in sess.run(W):
            print('    ',element)
        print('   c: ')
        for element in sess.run(c):
            print('    ',element)
        print('   w: ')
        for element in sess.run(wt):
            print('    ',element)
        print('   b ')
        for element in sess.run(b):
            print('    ',element)
        print('   loss: ', sess.run(loss, feed_dict={X: INPUT_XOR, Y: OUTPUT_XOR}))
t_end = time.clock()
print("_"*80)
print('Elapsed time ', t_end - t_start)

________________________________________________________________________________
Epoch:  0
   y_estimated: 
     [0.49983603]
     [0.5001549]
     [0.5001562]
     [0.49984103]
   W: 
     [ 0.06276221 -1.0465262 ]
     [-1.9994133   0.46935537]
   c: 
     [ 0.22249052 -0.17276908]
     [-0.1103125  -0.37792176]
     [-1.8173399  0.6337322]
     [-0.23770733  0.762648  ]
   w: 
     [-0.00013906]
     [-5.87772e-05]
   b 
     [-0.000625]
     [0.000625]
     [0.000625]
     [-0.000625]
   loss:  0.24984151
________________________________________________________________________________
Epoch:  10000
   y_estimated: 
     [0.10032695]
     [0.87701094]
     [0.87692654]
     [0.09273354]
   W: 
     [ 0.06276221 -0.74257016]
     [-1.9994133  0.5756058]
   c: 
     [ 0.6710684  -0.17276908]
     [-0.1103125  -0.57562596]
     [-1.8173399  0.6337322]
     [-0.23770733  1.0666034 ]
   w: 
     [-0.6331325]
     [-0.55834955]
   b 
     [-1.7687219]
     [1.9644246]
     [1.9636419]
   

## References <a id = "ref"></a>