# Backpropagation from scratch with TensorFlow

## (1A) Implementing your own network and its backward pass

You are asked to write the backward pass for a different neural network architecture, similar to how I did it in the session 02 notebook. The difference is the number of units, the choice of activation function, and the loss function (here we use one that is used in binary classification).

**Layer 1**: 2 inputs, 12 outputs

**Nonlinearity 1**: max(0, x)

**Layer 2**: 12 inputs, 32 outputs

**Nonlinearity 2**: max(0, x)

**Layer 3**: 32 inputs, 1 output

**Nonlinearity3**: sigmoid(x)

**Loss function**: $\mathcal{L}_{\text{Crossentropy}}(y, p)= - y \circ \text{log}(p) - (1-y) \circ \text{log}(1-p)$

for a label $y \in$ {0,1} and the model prediction $p$.

The local derivative of $\mathcal{L}_{\text{Crossentropy}}$ w.r.t. $p$ is $-\frac{y}{p} + \frac{1-y}{1-p}$

For the derivative of max(0,x) you can set the derivative value to 0 for x=0, even though it is mathematically not defined.
You can use tf.nn.relu(x) for the non-linearity.

## Step 1: Instantiate the weights

You should instantiate tf.Variable objects for each weight and bias tensor. Then you should put them into a tuple of tuples, one inner tuple for each layer, comprising (weights, biases). You should call this tuple of tuples "variables".

## Step 2: Write the forward pass, returning results from intermediate computations

The function should take an input tensor, a target tensor and the nested variables tuple, perform the forward computation and return a tuple of all intermediate results, i.e. 

(loss, nonlinearity3_out, layer3_out, nonlinearity2_out, layer2_out, nonlinearity1_out, layer1_out, inputs)

## Step 3: Write the compute_gradients function, chaining the gradients

The function should take the data from the forward function, as well as the variables, the target and a batch size as arguments. It should compute the gradients like it was done in session 02 and then return the gradients in the same format as the variables (a tuple of tuples)

## Step 4: Evaluate your implementation with the cell below
If no error occurs, it means you succeeded!

In [3]:
# Use something like this for testing your results. x should have shape 
import tensorflow as tf

# Instantiate variables for the layers
# YOUR CODE HERE

# e.g.
# weights_1 = tf.random.uniform((2, 12))
# ...
# variables = ((weights_1, bias_1), (weights_2, bias_2))

def compute_forward_pass(inputs, targets, variables):
    #YOUR CODE HERE
    ...
    
def compute_gradients(variables, data_from_forward, target, batch_size=1):
    # YOUR CODE HERE
    ...

In [None]:
# TEST YOUR SOLUTION:
x = tf.constant([[0.3,0.9]])
y = tf.constant([[1.0]])

data_from_forward = compute_forward_pass(x, y, variables)
your_solution_gradients = compute_gradients(variables, data_from_forward, y, batch_size=1)

with tf.GradientTape() as tape:
    tape.watch(variables)
    data_from_forward = compute_forward_pass(x, y, variables)
    loss = data_from_forward[0]
gradients = tape.gradient(loss, variables)

assert gradients[0][0]==your_solution_gradients[0][0], "computed gradients do not match with the TensorFlow oracle!"