## 2. Multilayer Perceptron

[Similar tutorial](https://www.tensorflow.org/get_started/mnist/pros)

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Import data

The MNIST dataset can be automatically downloaded via tensorflow with the following:

In [3]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


Try plotting one of the datapoints (from mnist.train.images) with matplotlib's `plt.imshow`. 

In [8]:
mnist.train.images[0].reshape(28, 2)

ValueError: cannot reshape array of size 784 into shape (64,64)

## Network

Define placeholders for the data and labels, however now instead of feeding in one datapoint at a time, define the placeholders for minibatches of size 200.

Now define a two-layer perceptron with a third output layer, where each layer is a linear transformation followed by a sigmoid nonlinearity:

$$y=\sigma_{softmax}\circ A_2 \circ \sigma \circ A_1\circ \mathbf{W} $$

where 
- $A_i(x)=\mathbf{W}_i \mathbf{x}+\mathbf{b}_i$,
- $\sigma$ is a [sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) function (`tf.nn.sigmoid`) and
- $\sigma_{softmax}$ is a [softmax](https://en.wikipedia.org/wiki/Softmax_function) function (`tf.nn.softmax`).

As initial values for the parameters, use a truncated normal with `stddev = 0.1` for the weight and a constant `0.1` for the bias.

## Loss

Define a cross-entropy loss function 
$$-\sum y_{true} \cdot \log y$$
and an optimizer. The loss should be minimised on average over the minibatch. Try both the gradient descent optimizer from the previous excercise and the [Adam](https://arxiv.org/abs/1412.6980) optimizer. There is [possible numerical instability](https://deepnotes.io/softmax-crossentropy) of the loss function if it is defined directly with the above equation, which can be cirumvented for example by using the built-in `tf.nn.softmax_cross_entropy_with_logits` in the loss.

## Training

Finally, write a training loop where you initialize the variables and iterate over minibatches of data. You can get the next minibatch of size N with:
```
x_batch, y_batch = mnist.train.next_batch(batch_size)
```
Print out the loss, train and test errors at every n-th step, either to stdout or to tensorboard. As a guide, test accuracy of at least 97% is possible with this setup.