# TensorFlow.jl

This notebook is an introduction to TensorFlow package by using Julia 0.6.2. Packages we used here includes TensorFlow, MNIST, Distributions.      

Created by Lijing Wang, based on the TensorFlow.jl instructions and examples. 

In [None]:
Pkg.add("TensorFlow")
Pkg.add("MNIST")
Pkg.add("Distributions")

## 1. Load MNIST dataset

This part is based on the tutorial for Package MNIST

In [1]:
using MNIST

#Define a type for loading dataset
type DataLoader
    cur_id::Int
    order::Vector{Int}
end

DataLoader() = DataLoader(1, shuffle(1:60000))

DataLoader

In [2]:
#Read the training dataset by the giving batch_size
function next_batch(loader::DataLoader, batch_size)
    x = zeros(Float32, batch_size, 784)
    y = zeros(Float32, batch_size, 10)
    for i in 1:batch_size
        x[i, :] = trainfeatures(loader.order[loader.cur_id])
        label = trainlabel(loader.order[loader.cur_id])
        y[i, Int(label)+1] = 1.0
        loader.cur_id += 1
        if loader.cur_id > 60000
            loader.cur_id = 1
        end
    end
    x, y
end

next_batch (generic function with 1 method)

In [3]:
#Load test set
function load_test_set(N=10000)
    x = zeros(Float32, N, 784)
    y = zeros(Float32, N, 10)
    for i in 1:N
        x[i, :] = testfeatures(i)
        label = testlabel(i)
        y[i, Int(label)+1] = 1.0 
        #Julia API assumes 1-based indexing
    end
    x,y
end

load_test_set (generic function with 2 methods)

In [4]:
loader = DataLoader()

DataLoader(1, [30662, 41926, 20864, 58356, 1826, 38976, 33608, 47393, 12972, 10720  â€¦  30512, 38238, 42194, 17062, 44210, 58204, 2111, 1841, 33052, 56974])

## 2. Start TensorFlow session
In this session if you have GPU to be used, you can link it to your session. 

#### Julia Code
ENV["TF_USE_GPU"] = "1"

In [5]:
using TensorFlow
sess = Session()

2018-01-31 15:02:18.625400: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 15:02:18.625433: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 15:02:18.625440: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 15:02:18.625446: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.


Session(Ptr{Void} @0x0000000118be7090)

## 3. Build a softmax regression Model

### Set up Placeholders

In [6]:
x = placeholder(Float32)
y_ = placeholder(Float32) #True Y

<Tensor placeholder_2:1 shape=unknown dtype=Float32>

### Initiate Parameters

In [7]:
W = Variable(zeros(Float32, 784, 10))
b = Variable(zeros(Float32, 10))

run(sess, global_variables_initializer())

### Predicted Class and Loss Function

In [8]:
y = nn.softmax(x*W + b) #Predict Y

#Cross Entropy Loss Function
cross_entropy = reduce_mean(-reduce_sum(y_ .* log(y), axis=[2]))

<Tensor reduce_2:1 shape=unknown dtype=Float32>

### Train the model with mini-batch and Gradient Descent Optimizer
This step may take a while. 

In [9]:
train_step = train.minimize(train.GradientDescentOptimizer(.00001), cross_entropy)
for i in 1:1000
    batch = next_batch(loader, 100)
    run(sess, train_step, Dict(x=>batch[1], y_=>batch[2]))
end

### Evaluate the model with test set

In [10]:
correct_prediction = equal(indmax(y,2), indmax(y_, 2))
accuracy=reduce_mean(cast(correct_prediction, Float32))
testx, testy = load_test_set()

println(run(sess, accuracy, Dict(x=>testx, y_=>testy)))

0.9093


We only have one softmax node so we may not get high accuracy model. 

## 4. Build a multi-layer convolutional network

Here we set up CNN model for MNIST. 

In [11]:
loader = DataLoader()

session = Session(Graph())


Session(Ptr{Void} @0x0000000117270150)

### Initiate weight_variable W and bias_variable b

In [12]:
function weight_variable(shape)
    initial = map(Float32, rand(Normal(0, .001), shape...))
    return Variable(initial)
end

function bias_variable(shape)
    initial = fill(Float32(.1), shape...)
    return Variable(initial)
end

bias_variable (generic function with 1 method)

### Build 2D convolutional function and perform maxpooling

In [13]:
function conv2d(x, W)
    nn.conv2d(x, W, [1, 1, 1, 1], "SAME")
end

function max_pool_2x2(x)
    nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], "SAME")
end

max_pool_2x2 (generic function with 1 method)

### Build your CNN

In [14]:
using Distributions


x = placeholder(Float32)
y_ = placeholder(Float32)

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

x_image = reshape(x, [-1, 28, 28, 1])
    
h_conv1 = nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = reshape(h_pool2, [-1, 7*7*64])
h_fc1 = nn.relu(h_pool2_flat * W_fc1 + b_fc1)

keep_prob = placeholder(Float32)
h_fc1_drop = nn.dropout(h_fc1, keep_prob)

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = nn.softmax(h_fc1_drop * W_fc2 + b_fc2)

cross_entropy = reduce_mean(-reduce_sum(y_.*log(y_conv), axis=[2]))

train_step = train.minimize(train.AdamOptimizer(1e-4), cross_entropy)

correct_prediction = equal(indmax(y_,2), indmax(y_conv, 2))

accuracy = reduce_mean(cast(correct_prediction, Float32))


<Tensor reduce_3:1 shape=unknown dtype=Float32>

### Initialized parameters

In [15]:
run(session, global_variables_initializer())

### Training with Minibatch, dropout and AdamOptimizer
This step may take a while. 

In [17]:
for i in 1:1000
    batch = next_batch(loader, 50)
    if i%100 == 1
        train_accuracy = run(session, accuracy, Dict(x=>batch[1], y_=>batch[2], keep_prob=>1.0))
        info("step $i, training accuracy $train_accuracy")
    end
    run(session, train_step, Dict(x=>batch[1], y_=>batch[2], keep_prob=>.5))
end

[1m[36mINFO: [39m[22m[36mstep 1, training accuracy 0.12
[39m[1m[36mINFO: [39m[22m[36mstep 101, training accuracy 0.84
[39m[1m[36mINFO: [39m[22m[36mstep 201, training accuracy 0.96
[39m[1m[36mINFO: [39m[22m[36mstep 301, training accuracy 0.96
[39m[1m[36mINFO: [39m[22m[36mstep 401, training accuracy 0.96
[39m[1m[36mINFO: [39m[22m[36mstep 501, training accuracy 0.98
[39m[1m[36mINFO: [39m[22m[36mstep 601, training accuracy 0.96
[39m[1m[36mINFO: [39m[22m[36mstep 701, training accuracy 0.9
[39m[1m[36mINFO: [39m[22m[36mstep 801, training accuracy 0.96
[39m[1m[36mINFO: [39m[22m[36mstep 901, training accuracy 0.96
[39m

### Test accuracy

In [18]:
testx, testy = load_test_set()
test_accuracy = run(session, accuracy, Dict(x=>testx, y_=>testy, keep_prob=>1.0))
info("test accuracy $test_accuracy")

[1m[36mINFO: [39m[22m[36mtest accuracy 0.9784
[39m

Yeah! Now we get good prediction for MNIST by CNN!

Here we do not have overfitting because of the dropout. 