In this assignment, we will walk you through the process of implementing 

- A softmax function
- A simple neural network
- Back propagation
- Word2vec models

and training your own word vectors with stochastic gradient descent (SGD) for a sentiment analysis task.

In [2]:
using PyPlot;

INFO: Loading help data...


## 1. Softmax

>If you want the outputs of a network to be interpretable as posterior
>probabilities for a categorical target variable, it is highly desirable for
>those outputs to lie between zero and one and to sum to one. The purpose of
>the softmax activation function is to enforce these constraints on the
>outputs. 

http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-12.html


$$softmax(x) = softmax(x + c)$$
where $x + c$ means adding the constant $c$ to every dimension of $x$.

Note: In practice, we make use of this property and choose $c = − max_ix_i$ when computing softmax probabil-
ities for numerical stability (i.e. subtracting its maximum element from all elements of x).

>Hence you can always pick one of the output units, and
add an appropriate constant to each net input to produce any desired net
input for the selected output unit, which you can choose to be zero or
whatever is convenient. You can use the same trick to make sure that none of
the exponentials overflows.

Given an input matrix of *N* rows and *d* columns, compute the softmax prediction for each row. That is, when the input is

    [[1,2],
    [3,4]]
    
the output of your functions should be

    [[0.2689, 0.7311],
    [0.2689, 0.7311]]

In [3]:
function softmax(x)
    # Softmax function #
    ###################################################################
    # Compute the softmax function for the input here.                #
    # It is crucial that this function is optimized for speed because #
    # it will be used frequently in later code.                       #
    # You might find numpy functions np.exp, np.sum, np.reshape,      #
    # np.max, and numpy broadcasting useful for this task. (numpy     #
    # broadcasting documentation:                                     #
    # http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)  #
    # You should also make sure that your code works for one          #
    # dimensional inputs (treat the vector as a row), you might find  #
    # it helpful for your later problems.
    #
    # http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression
    ###################################################################
    ### YOUR CODE HERE
    # find max element per row
    row = size(x,1);
    xMax = zeros(size(x));
    for r = 1:row
        xMax[r,:] = exp(x[r,:] - maximum(x[r,:]))/sum(exp(x[r,:]-maximum(x[r,:]))) ;
    end
    x = xMax;
    ### END YOUR CODE
    return x;
end

softmax (generic function with 1 method)

In [4]:
# Verify your softmax implementation

println("=== For autograder ===");
println(softmax([[1 2],[3 4]]));
println(softmax([[1001 1002],[3 4]]));
println(softmax([[-1001 -1002]]));

=== For autograder ===
[0.2689414213699951 0.7310585786300049
 0.2689414213699951 0.7310585786300049]
[0.2689414213699951 0.7310585786300049
 0.2689414213699951 0.7310585786300049]
[0.7310585786300049 0.2689414213699951]


## 2. Neural network basics

In this part, we're going to implement

* A sigmoid activation function and its gradient
* A forward propagation for a simple neural network with cross-entropy cost
* A backward propagation algorithm to compute gradients for the parameters
* Gradient / derivative check

In [5]:
function sigmoid(x)
    # Sigmoid function #
    ###################################################################
    # Compute the sigmoid function for the input here.                #
    ###################################################################
    
    ### YOUR CODE HERE
    x = 1.0./(1.0+exp(-x));
    ### END YOUR CODE
    
    return x;
end

sigmoid (generic function with 1 method)

In [6]:
function sigmoid_grad(f)
    # Sigmoid gradient function #
    ###################################################################
    # Compute the gradient for the sigmoid function here. Note that   #
    # for this implementation, the input f should be the sigmoid      #
    # function value of your original input x.                        #
    ###################################################################
    
    ### YOUR CODE HERE
    f = f.*(1.0-f);
    ### END YOUR CODE
    
    return f;
end

sigmoid_grad (generic function with 1 method)

In [7]:
# Check your sigmoid implementation
x = [[1 2], [-1 -2]];
f = sigmoid(x);
g = sigmoid_grad(f);
println("=== For autograder ===");
println(f);
println(g);

=== For autograder ===
[0.7310585786300049 0.8807970779778823
 0.2689414213699951 0.11920292202211755]
[0.19661193324148185 0.10499358540350662
 0.19661193324148185 0.1049935854035065]


Using the functions implemented above to implement a neural network with one sigmoid hidden layer.

In [8]:
# First implement a gradient checker by filling in the following functions
function gradcheck_naive(f, x)
    ###
    # Gradient check for a function f 
    # - f should be a function that takes a single argument and outputs the cost and its gradients
    # - x is the point (numpy array) to check the gradient at
    ### 

    rndstate = random.getstate()
    random.setstate(rndstate)  
    fx, grad = f(x) # Evaluate function value at original point
    h = 1e-4

    # Iterate over all indexes in x
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        ix = it.multi_index
    
        ### YOUR CODE HERE: try modifying x[ix] with h defined above to compute numerical gradients
        ### make sure you call random.setstate(rndstate) before calling f(x) each time, this will make it 
        ### possible to test cost functions with built in randomness later
    
        return # replace this line with your code
    
        ### END YOUR CODE

        # Compare gradients
        reldiff = abs(numgrad - grad[ix]) / max(1, abs(numgrad), abs(grad[ix]))
        if(reldiff > 1e-5)
            print "Gradient check failed."
            print "First gradient error found at index %s" % str(ix)
            print "Your gradient: %f \t Numerical gradient: %f" % (grad[ix], numgrad)
            return
        end
    
    end
        println("Gradient check passed!");
end

LoadError: syntax: invalid character literal
while loading In[8], in expression starting on line 15

In [None]:
# Sanity check for the gradient checker
quad = lambda x: (np.sum(x ** 2), x * 2)

print "=== For autograder ==="
gradcheck_naive(quad, np.array(123.456))      # scalar test
gradcheck_naive(quad, np.random.randn(3,))    # 1-D test
gradcheck_naive(quad, np.random.randn(4,5))   # 2-D test