# Softmax regressions

This is the second introduction on tensorflow for RVL224 students.

In this section, you will learn:

* What is softmax regressions.
* The basic concept behind softmax regressions.
* How to impliment it using tensorflow to classify **mnist** dataset.


**Softmax regressions** is a classifier that can assign probabilities to each class given an input, for example, if you feed the following picture 
<img src="http://bradleymitchell.me/wp-content/uploads/2014/06/decompressed.jpg" width="128" height="128" />

into a trained softmax classifier, it might assign the probability of been 5 is 0.96, and probability of been 9 is 0.03, and a bit of probability to others, which sum up to 1.0.

A softmax regression has two steps: first we add up the evidence of our input being in certain classes, and then we convert that evidence into probabilities.

The evidence for a class $i$ given an input $x$ is:
$$
\text{evidence}_i = \sum_j W_{i,~ j} x_j + b_i
$$
where $W_i$ is the weights and $b_i$ is the bias for class $i$, and $j$ is the index  for summing over the pixels in our input image $x$, We then convert the evidence tallies into our predicted probabilities $y$ using the "softmax" function:

$$
y = \text{softmax}(\text{evidence})
$$



You can picture our softmax regression as looking something like the following:

<img src="https://www.tensorflow.org/images/softmax-regression-scalargraph.png" width="323" height="129" />

where $x$ is pixels of one input picture, and $y$ is the prbabilities for each class. We can write that out as a equation like:

<img src="https://www.tensorflow.org/images/softmax-regression-vectorequation.png" width="328" height="80" />




Now, let's impliment it using tensorflow to classify hand-written digits from MNIST dataset.


In [2]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data


## Download and extract dataset

In [3]:
mnist  = input_data.read_data_sets('/tmp/data/', one_hot=True) #load dataset
train_img   = mnist.train.images
train_label = mnist.train.labels
test_img    = mnist.test.images
test_label  = mnist.test.labels

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


## Build the computational graph for softmax regressions

In [19]:
# create an input node with shape [None, 784] so we can feed images later, 
# note that 'None' means an arbitrary size.
x = tf.placeholder("float", [None, 784]) 
print (tf.__version__)

# create an input node so we can feed labels later.
y_ = tf.placeholder("float", [None, 10]) 



W = tf.Variable(tf.zeros([784, 10])) # create Weights variable and initrailize to all zero
b = tf.Variable(tf.zeros([10]))# create bias variable 


evidence = tf.matmul(x, W) + b
prob = tf.nn.softmax(evidence)


# COST FUNCTION
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = evidence,labels = y_)) 
# OPTIMIZER
learning_rate = 0.005
optm = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

1.0.0


In [20]:
# PREDICTION
pred = tf.equal(tf.argmax(prob, 1), tf.argmax(y_, 1))    
# ACCURACY
accr = tf.reduce_mean(tf.cast(pred, "float"))
# INITIALIZER
init = tf.initialize_all_variables()

Instructions for updating:
Use `tf.global_variables_initializer` instead.


## Train the classifier

In [21]:
training_epochs = 50
batch_size      = 100
display_step    = 5
# SESSION
sess = tf.Session()
sess.run(init)
# MINI-BATCH LEARNING
for epoch in range(training_epochs):
    avg_cost = 0.
    num_batch = int(mnist.train.num_examples/batch_size)
    for i in range(num_batch): 
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        sess.run(optm, feed_dict={x: batch_xs, y_: batch_ys})
        feeds = {x: batch_xs, y_: batch_ys}
        avg_cost += sess.run(cost, feed_dict=feeds)/num_batch
    
    
    # DISPLAY
    if epoch % display_step == 0:
        feeds_train = {x: batch_xs, y_: batch_ys}
        feeds_test = {x: mnist.test.images, y_: mnist.test.labels}
        train_acc = sess.run(accr, feed_dict=feeds_train)
        test_acc = sess.run(accr, feed_dict=feeds_test)
        print ("Epoch: %03d/%03d cost: %.9f train_acc: %.3f test_acc: %.3f" 
               % (epoch, training_epochs, avg_cost, train_acc, test_acc))
print ("DONE")


Epoch: 000/050 cost: 1.477954281 train_acc: 0.810 test_acc: 0.832
Epoch: 005/050 cost: 0.533739885 train_acc: 0.890 test_acc: 0.882
Epoch: 010/050 cost: 0.446352980 train_acc: 0.870 test_acc: 0.893
Epoch: 015/050 cost: 0.408236921 train_acc: 0.900 test_acc: 0.899
Epoch: 020/050 cost: 0.385582237 train_acc: 0.870 test_acc: 0.904
Epoch: 025/050 cost: 0.370123066 train_acc: 0.950 test_acc: 0.907
Epoch: 030/050 cost: 0.358701173 train_acc: 0.870 test_acc: 0.909
Epoch: 035/050 cost: 0.349740249 train_acc: 0.850 test_acc: 0.910
Epoch: 040/050 cost: 0.342516444 train_acc: 0.900 test_acc: 0.912
Epoch: 045/050 cost: 0.336515801 train_acc: 0.920 test_acc: 0.913
DONE


You will find that it achives about 92% of accurracy after 50 epoch, next we will impliment a Neural Net which can achive even better accuracy.

## exercise 

plase adjust the hyperperameters such as batch size, training epoch and learning rate, observe  the effects.

Also randomly choose images in test dataset and feed into the trained softmax, see if it can acctually predict the labels.