## Do not use other NN (or similar) packages for implementation

You should implement the forward and backward/gradient computation by yourself. Your implementation should <b>NOT</b> be a wrapper of some existing packages (such as tensorflow, pytorch etc). For example the following code:

    import some_lib 
    def compute_gradients(x, y):
        return som_lib.compute_gradients(x,y)
        
will get "0" point.

## Numpy

When it comes to matrix/vector computation, you must use the package "numpy". For example, you may need to compute:

$$C_{j, k} = \sum_i A_{i, j}\times B_{i, k}$$.

Observing that this is a matrix multiplication $C = A^TB$, you can compute it by:

    import numpy
    C = numpy.dot(A.T, B)
    
Do not implement basic matrix/vector computation by loops, for example
    
    
    for j = 1 ...
        for k = 1 ...
            for i = 1 ...
                C[j, k] += A[i, j]*B[i, k]
                
The loop implementation in python can be 1000 time slower.

## Verify your implementation

You can compare the result from runing your implementation to that from an existing NN package to check the correctness of your code. The following code build a network using keras and train it with the digits data. You can compare the loss calculated and the values of the parameters between your code and the keras model. (Both code should use the same initial values and minibatch for the comparison to be meaningful.)

In [1]:
import numpy as np
from keras import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.callbacks import TensorBoard

def make_model(config):
    layers = config['layers']
    m = Sequential()
    m.add(Dense(layers[0], input_dim=config['input_dim'], activation='relu'))
    for n in layers[1:]:
        m.add(Dense(n, activation='relu'))
    m.add(Dense(1, activation='sigmoid'))
    sgd = SGD(lr=1e-3)
    m.compile(loss='mse', optimizer=sgd)
    return m


m = make_model({'input_dim':28*28, 'layers':[20]})

Using TensorFlow backend.


In [2]:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [3]:
def select_data(X, y, i, n, ny):
    ixs = np.nonzero(y == i)[0]
    Xn = X[ixs[np.random.permutation(len(ixs))[:n]]]
    yn = ny*np.ones((n,))
    return Xn, yn

X0, y0 = select_data(x_train, y_train, 2, 1000, 0)
X1, y1 = select_data(x_train, y_train, 6, 1000, 1)
X = np.concatenate([X0, X1], axis=0)
y = np.concatenate([y0, y1], axis=0)

# shuffle data
ixs = np.random.permutation(len(y))
X = X[ixs].reshape((len(y), -1))
y = y[ixs]

In [4]:
tb = TensorBoard(log_dir='./logs')
m.fit(X, y, batch_size=32, epochs=10, shuffle=False, callbacks=[tb])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2acf9089748>