## Tensor Flow basic : Lab 6
### Softmax classification : multinomial classification

Here we are interested in multiple classification. We can do it via applying multiple binary classification. For implementation, we can think as matrix multiplication.

\begin{align}
\begin{pmatrix}
W_{A1} & W_{A2} & W_{A3} \\
W_{B1} & W_{B2} & W_{B3} \\
W_{C1} & W_{C2} & W_{C3} 
\end{pmatrix}
\cdot
\begin{pmatrix}
x_1\\
x_2 \\
x_3
\end{pmatrix}
=
\begin{pmatrix}
W_{A1} x_1 + W_{A2} x_2 + W_{A3} x_3\\
W_{B1} x_1 + W_{B2} x_2 + W_{B3} x_3\\
W_{C1} x_1 + W_{A2} x_2 + W_{C3} x_3
\end{pmatrix}
=
\begin{pmatrix}
\bar{y}_1\\
\bar{y}_2 \\
\bar{y}_3
\end{pmatrix}
=
\begin{pmatrix}
H_A(x)\\
H_B(x) \\
H_C(x)
\end{pmatrix}
\end{align}
where $\bar{y}$ is hypothesized value. We need like a sigmoid function to regulate our $\bar{y}$ to make binary classification. To do this, we impose softmax function

\begin{align}
s(y_i) = \frac{e^{y_i}}{\sum_j e^{y_j}}
\end{align}

This gives 1. all values are in $[0,1]$ and 2. sum of all values are 1 (like probabiliy). Schematically,

\begin{align}
XW=y=
\begin{pmatrix}
2.0\\
1.0\\
0.1
\end{pmatrix}
\xrightarrow{s(y)}
\begin{pmatrix}
0.7\\
0.2\\
0.1
\end{pmatrix}
\end{align}

If we want to get one specific value from above, we use one-hot encoding to make highest value to be one.

Now consider cost function. Here, we use cross-entropy

\begin{align}
D(S,L) = - \sum_j L_i log(S_i)
\end{align}

where $S_i$ is value from hypothesis and $L_i$ is the actual one. This is the same as logistic cost function. So, for total cost function (or loss function)
\begin{align}
L = \frac{1}{N} \sum_i D(S(WX_i + b),L_i)
\end{align}

Then we can use gradient descent algorithm

In [1]:
import tensorflow as tf

In [19]:
x_data = [[1,2,1,1],[2,1,3,2],[3,1,3,4],[4,1,5,5],[1,7,5,5],[1,2,5,6],[1,6,6,6],[1,7,7,7]]
y_data = [[0,0,1],[0,0,1],[0,0,1],[0,1,0],[0,1,0],[0,1,0],[1,0,0],[1,0,0]]

X = tf.placeholder("float",[None,4])
Y = tf.placeholder("float",[None,3])
nb_classes = 3 # How many classification things in data set

W = tf.Variable(tf.random_normal([4,nb_classes]),name='weight')
b = tf.Variable(tf.random_normal([nb_classes]),name='bias')

# Using tf library to build softmax fucntion
# softmax= exp(Logits)/reduce_sum(exp(Logits),dim)
logits = tf.matmul(X,W)+b
hypothesis = tf.nn.softmax(logits)

#Cross entropy for cost/loss
cost = tf.reduce_mean(-tf.reduce_sum(Y*tf.log(hypothesis),axis=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    for step in range(2001):
        sess.run(optimizer,feed_dict={X:x_data,Y:y_data})
        if step % 200 == 0:
            print("step:",step, "cost:",sess.run(cost, feed_dict={X:x_data,Y:y_data}))       
   
    print('--------------')

    # Testing & One-hot encoding
    a = sess.run(hypothesis, feed_dict={X: [[1, 11, 7, 9]]})
    print(a, sess.run(tf.arg_max(a, 1))) # argmax check which one is big one

    print('--------------')

    b = sess.run(hypothesis, feed_dict={X: [[1, 3, 4, 3]]})
    print(b, sess.run(tf.argmax(b, 1))) # arg_max = argmax

    print('--------------')

    c = sess.run(hypothesis, feed_dict={X: [[1, 1, 0, 1]]})
    print(c, sess.run(tf.argmax(c, 1)))

    print('--------------')

    all = sess.run(hypothesis, feed_dict={
                   X: [[1, 11, 7, 9], [1, 3, 4, 3], [1, 1, 0, 1]]})
    print(all, sess.run(tf.argmax(all, 1)))


step: 0 cost: 9.8787
step: 200 cost: 0.649903
step: 400 cost: 0.550687
step: 600 cost: 0.461039
step: 800 cost: 0.37225
step: 1000 cost: 0.283444
step: 1200 cost: 0.2301
step: 1400 cost: 0.209406
step: 1600 cost: 0.191962
step: 1800 cost: 0.177076
step: 2000 cost: 0.164239
--------------
[[  2.80454066e-02   9.71945643e-01   8.90843421e-06]] [1]
--------------
[[ 0.69767439  0.27606395  0.02626167]] [0]
--------------
[[  1.76797368e-08   3.65601503e-04   9.99634385e-01]] [2]
--------------
[[  2.80454215e-02   9.71945643e-01   8.90844331e-06]
 [  6.97674155e-01   2.76064128e-01   2.62616724e-02]
 [  1.76797030e-08   3.65600950e-04   9.99634385e-01]] [1 0 2]


In [35]:
%reset -f

Some advanced function for softmax classifier. We will use existing function in tf lib

In this example, we are doing animal classification with `data-04-zoo.csv`

In [40]:
import tensorflow as tf
import numpy as np

In [50]:
xy = np.loadtxt('../data/data-04-zoo.csv',delimiter=',',dtype=np.float32)
x_data = xy[:,0:-1]
y_data = xy[:,[-1]]

nb_classes = 7
X = tf.placeholder(tf.float32,[None,16])
Y = tf.placeholder(tf.int32,[None,1]) # 0~6, shape = (?,6)

# Create ont hot
Y_one_hot = tf.one_hot(Y,nb_classes) # shape =(?,1,7)
# If the input indicates rank N, the output will have rank N+1.
# The new axis is created at dimension axis
# (default: the new axis is appended at the end)
Y_one_hot = tf.reshape(Y_one_hot,[-1,nb_classes]) # shape = (?,7)

W = tf.Variable(tf.random_normal([16,nb_classes]),name='weight')
b = tf.Variable(tf.random_normal([nb_classes]),name='bias')

# Using tf library to build softmax fucntion
# softmax= exp(Logits)/reduce_sum(exp(Logits),dim)
logits = tf.matmul(X,W)+b
hypothesis = tf.nn.softmax(logits)

#Cross entropy using existing lib in tf
cost_i = tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=Y_one_hot)
cost = tf.reduce_mean(cost_i)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

# Check accuracy
prediction = tf.argmax(hypothesis,1)
correct_prediction = tf.equal(prediction, tf.argmax(Y_one_hot,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    for step in range(2001):
        sess.run(optimizer,feed_dict={X:x_data,Y:y_data})
        if step % 200 == 0:
            loss,acc, = sess.run([cost,accuracy],feed_dict={X:x_data,Y:y_data})
            print("Step: {:5}\tLoss: {:.3f}\tAcc: {:2%}".format(step,loss,acc))
            
    # Let's see if we can predict
    pred = sess.run(prediction,feed_dict={X:x_data})
    # y_data : (N,1) = flatten -> (N,) matches pred.shape
    for p,y in zip(pred,y_data.flatten()):
        print("[{}] Prediction: {} True Y: {}".format(p==int(y),p,int(y)))
    

Step:     0	Loss: 7.713	Acc: 3.960396%
Step:   200	Loss: 0.422	Acc: 90.099013%
Step:   400	Loss: 0.258	Acc: 93.069309%
Step:   600	Loss: 0.184	Acc: 95.049506%
Step:   800	Loss: 0.143	Acc: 98.019803%
Step:  1000	Loss: 0.117	Acc: 98.019803%
Step:  1200	Loss: 0.098	Acc: 99.009901%
Step:  1400	Loss: 0.084	Acc: 99.009901%
Step:  1600	Loss: 0.074	Acc: 100.000000%
Step:  1800	Loss: 0.066	Acc: 100.000000%
Step:  2000	Loss: 0.059	Acc: 100.000000%
[True] Prediction: 0 True Y: 0
[True] Prediction: 0 True Y: 0
[True] Prediction: 3 True Y: 3
[True] Prediction: 0 True Y: 0
[True] Prediction: 0 True Y: 0
[True] Prediction: 0 True Y: 0
[True] Prediction: 0 True Y: 0
[True] Prediction: 3 True Y: 3
[True] Prediction: 3 True Y: 3
[True] Prediction: 0 True Y: 0
[True] Prediction: 0 True Y: 0
[True] Prediction: 1 True Y: 1
[True] Prediction: 3 True Y: 3
[True] Prediction: 6 True Y: 6
[True] Prediction: 6 True Y: 6
[True] Prediction: 6 True Y: 6
[True] Prediction: 1 True Y: 1
[True] Prediction: 0 True Y: 0
