# Simple Linear Classifier
In this notebook we build a simple linear classifier for MNIST digit classification. 

This classifier calculates $$\sigma(Wx+b)$$
where $\sigma$ is the _softmax_ nonlinearity:
$$
\sigma(x)_i = \frac{\exp(x_i)}{\sum_{j=1}^{n} \exp(x_j)}
$$

We optimize this using _gradient descent_ with respect to the _cross entropy loss_
$$
X(p, q) = \sum_{i=1}^n \log q_i \cdot p_i.
$$

First, we import tensorflow (with shorthand `tf`) and `tflean` for the dataset. We will also use numpy later.
Further, we define the number of steps to train.

In [None]:
import tensorflow as tf
import tensorflow.contrib.learn as tflearn
import numpy as np

NUM_STEPS = 50

## BUILDING THE FORWARD NETWORK
We define the _placeholders_ for feeding in the data and the Variables `W` and `b` which we initialize with random values which we get from numpy. Then we calculate the _logits_ by the linear transformation `Wx + b`. We get the corresponding probabilities by applying the `softmax` nonlinearity. 

Most functions typically used for neural networks can be found in the `tf.nn` module. Mathmatical functions (`matmul`, `sqrt`, ...) are typically found directly under `tf`. 

In [None]:
# placeholders
x = tf.placeholder(tf.float32, (None, 784))
y = tf.placeholder(tf.int64, (None))

# variables
W = tf.Variable(np.random.random((784, 10)), dtype=tf.float32)
b = tf.Variable(np.random.random(10), dtype=tf.float32)

# calculation
l = tf.matmul(x, W) + b
p = tf.nn.softmax(l)

## Loss Function and Optimizer
We calculate the cross entropy not by the formula given above, but directly from the logits. This is numerically more stable and can be calculated faster. However, to be able to do that, we need to know the corresponding nonlinearity. 
This yield the somewhat cumbersome function name `softmax-cross_entropy_with_logits`. Since the labels we receive from the dataset will be integers, we convert them to a probability distribution using the `one_hot` function.

This yield the loss for each image individually but we would like this to be the mean so that the loss magnitude becomes independet of the number of passed in examples. (If we did not perform a reduction here, the minimization step would implicitly sum op all individual contributions). 

After calculating the loss we define the optimizer. In this case we want to do vanilla gradient decent with a learning rate of `0.1`. The `minimize` function then extends the computation graph to calculate the gradients and operations that update the variables. The resulting `train_op` has control dependencies on all these update steps, so when it is fetched all updates have to be performed.

**Nothing has been calculated yet; we are just building the necessary computation graph**

In [None]:
loss = tf.nn.softmax_cross_entropy_with_logits(labels=tf.one_hot(y, depth=10), logits=l)
loss = tf.reduce_mean(loss)

optimizer = tf.train.GradientDescentOptimizer(0.1)
train_op = optimizer.minimize(loss)

## Visualization
There is another file `show_graph.py` in the directory of these notebooks. The code is taken from
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/deepdream/deepdream.ipynb 
and embeds javascript to show the graph. Unfortunately, firefox blocks this script so it does not work.

In [None]:
import show_graph
show_graph.show_graph(tf.get_default_graph())

## Getting the Data
We use the predefined MNIST dataset given by `tflearn`. 
The `load_dataset` function will download MNIST and save it into the current directory if it is not yet present.

In [None]:
mnist = tflearn.datasets.load_dataset("mnist")
images = mnist.train.images
labels = mnist.train.labels

## Create the Session
Now we create an `InteractiveSession` so we can directly run Tensors and Ops. 
Then the initialization operation for all global variables is created (this again just builds an Op that has control dependencies on all the individual init op) and directly run. We never need that again so we don't even need to remember the op.

In [None]:
session = tf.InteractiveSession()

# Important: Initialize the Variables.
tf.global_variables_initializer().run()

## Training
Now we do the training. In this very first example, we push the whole dataset through for each iteration. A more efficient (and for bigger datasets necessary) approach is to split the data into minibatches. 

For a training step, we run the `train_op` in the graph, which causes all Variable updates to be performed. We also fetch the value of the loss tensor. Note that this does not cause any additional computations to be performed, since we need the loss anyway to calculate the Variable updates (well, to be exact, we only need the gradient, so a smart implementation might be able to skip some steps in the loss computation.)

To monitor the progress we simply print the loss after each step. This is very unsatisfying. We could do a matplotlib plot that is interactively updated within the loop, which is quite a lot of effort and does not scale well with the number of quantities that we want to track over the course of training.

This is what motivates the introduction of *summaries* and *tensorboard* in the next notebook.

In [None]:
for i in range(NUM_STEPS):
    loss_v, _ = session.run([loss, train_op], {x: images, y: labels})
    print(loss_v)