# Identity Function approximation with Julia

In this code, I tried to approximate Identity function, $f(x) = x$, with a neural network with 1 layer in Julia. I used Knet library and autograd module

In [1]:
using Knet

$x$ is a vector of sample data with size 5 and $w$ is a vector of weigths with size 2. I initialized them randomly.

In [2]:
x = randn(5)
w = randn(2,1)

2×1 Array{Float64,2}:
  1.7601 
 -1.13114

Simple predict function for forward pass. $w[1]$ is the weight multiplier and $w[2]$ is the bias.

In [3]:
predict(w,x) = (w[1] .* x) .+ w[2]

predict (generic function with 1 method)

Simple loss function to calculate the distance of the prediction to true value. Loss is normalized by a power of 2.

In [4]:
loss(w,x) = (predict(w,x) .- x).^2

loss (generic function with 1 method)

This is where magic happens. Knet's autograd automatically calculates the gradient of loss to its first variable, namely $w$. This way, we can access the update values easily during the Stochastic Gradient Descent phase of the training.

In [5]:
gradloss = grad(loss)

(::gradfun) (generic function with 1 method)

The train function is composed of two loops. The outer loop represents the iterations over all data samples and each full iteration over the data is called **epoch**. The inner loop is to iterate over individual samples. In practice, loss is summed over and weights are updated in few samples, called **batches**. However, for our simple purpose, an update in every data point is OK.

Here I used the _gradloss_ function to calculate the gradient values corresponding to weight and bias. Then I update the real weight in the opposite direction of gradient multiplied by a **learning rate** of 1.

In [6]:
function train!(w,x)
    for epoch = 1:100
        for sample in x
            w[:] = w[:] - gradloss(w,sample)[:] .* 0.01
        end
        if(epoch % 20 == 0) println(w) end
    end
end

train! (generic function with 1 method)

While training, I printed the value of $w$ for every 20 epochs. The last line is the final for $w$. I expected the optimal value to be [1,0] as $f(x) = 1*x+0 = x$ and my results are pretty close for really shallow sample data.

In [7]:
train!(w,x)

[1.3162; -0.134838]
[1.34641; -0.0134722]
[1.10106; -0.0413009]
[1.04679; -0.0187434]
[1.02162; -0.00861958]


The total training error after training.

In [8]:
sum(loss(w,x))

0.0010307501187531232

The original values and predicted values.

In [9]:
println(x)
predict(w,x)'

[-0.882686,0.836563,-0.229076,0.159088,1.07268]


1×5 Array{Float64,2}:
 -0.676439  0.84603  -0.242648  0.153909  1.08725

I created test data to see how well my net performs against unseen data. Here are the test data, predictions and the loss.

In [10]:
test_x = randn(3,1)
println(test_x)
println(predict(w,test_x)')
println(sum(loss(w,test_x)))

[1.30348; -0.296061; -1.24957]
[1.32304 -0.311082 -1.2852]
0.0030683044463786714
