This project will walk you through the process of building Deep Neural Networks (DNN) algorithms to predict dogs and cats from scratch (without using 3rd party libraries). The accuracy of this DNN will be compared to the one from SKLearn using Logistic Regression.
Please unzip data.zip file into data folder before running this jupyter notebook file. Download data.zip: https://drive.google.com/file/d/1Mkqf8r_8mQLjUEsvkydGIoDxPIsXbb3Z/view?usp=sharing
- The datasets contains train dataset (5000 images) and test dataset (500 images) - which are in H5 format. H5 is an efficience way to store data, and Python has a library h5py for working with H5.
- The train and test dataset contain 2 keys:
images
(which contains training instances) andlabels
(which has 2 values: 0 represents dogs and 1 represents cats).
For the first part, we are going to build a simple neural network with 2 hidden layer
If you have no idea what is neural network, this article mights help
- X: Our input layer, in this case is our
images
of dog and cat - Yh, Y: Predicted label (which is produced by neural network) and Real label
- W1, W2, W3: Our need-to-optimized parameters - these are bones of neural network.
- A, Z: These are informations that transfer through the network from input data (adjustments by multiply with W) and help us to produce the prediction - which then used for optimized W.
We are going to initialize W and b (bias) regarded to input layer and each hidden layer.
- The input layer has 49,152 node, while each hidden layer has 3. The output layer contain 1 node that represent the label (0 for dog and 1 for cat) => the shape of W1, W2 and W3 are (3, 49.152), (3,3), (1,3) respectively.
W1 = np.random.randn(num_hidden, X_train.shape[1])/np.sqrt(X_train.shape[1])
b1 = np.random.randn(num_hidden, 1)
W2 = np.random.randn(num_hidden, num_hidden)/np.sqrt(num_hidden)
b2 = np.random.randn(num_hidden, 1)
W3 = np.random.randn(1, num_hidden)/np.sqrt(num_hidden)
b3 = np.random.randn(1, 1)
- Remember that the shape of input and W must match, so we should check the shape before making any calculation.
Feed forward
Z1 = np.dot(W1, X_train.T) + b1
A1 = Relu(Z1)
Z2 = np.dot(W2, A1) + b2
A2 = Relu(Z2)
Z3 = np.dot(W3, A2) + b3
y_hat = Sigmoid(Z3)
Calculate the cost function: J =
J = (- np.multiply(y_train, np.log(y_hat)) - np.multiply(1-y_train, np.log(1-y_hat))) / y_train.shape[1]
After having the cost J (Which present how big the different between predicts made by neural network and real labels), we compute derivative of J with respect to W1, W2 & W3 - which are used for adjusting W and b.
- Note that the formular for back-propagation at all layer are lookalike except for last layer.
- At each step we should check the shape of all factors to make sure that we are on the right track.
Compute derivative
e3 = Yh - Y
dW3 = e3.dot(A2.T)/A2.shape[0]
db3 = np.sum(e3)/A2.shape[0]
e2 = W3.T.dot(e3) * A2
dW2 = e2.dot(A1.T)/A1.shape[0]
db2 = np.sum(e2)/A1.shape[0]```
e1 = W2.T.dot(e2) * A1
dW1 = e1.dot(X_train)/X_train.shape[1]
db1 = np.sum(e1)/X_train.shape[1]
Then we update W and b
W3 -= lr * dW3
W2 -= lr * dW2
W1 -= lr * dW1
b3 -= lr * db3
b2 -= lr * db2
b1 -= lr * db1
lr
is the "learning rate". The learning rate define how much we want our model learned at each iteration. If the learning rate is too big it will make the neural network never converge while the very small learning rate will take our model forever to converge.
Congrat! We have just finished 1 round of feed-forward and backpropagation. Our parameters is making a small step toward the optimal paremeters and that is good. Normally we would run couple of thousands time in order to have the good-enough model that can be able to distinquish between dog and cat.
=> So it's time to put all that we have done in a for loop
lr = 0.01
for i in range(3000):
num_train = np.random.randint(0, 12289)
Z1 = np.dot(W1, X_train.T) + b1
A1 = Relu(Z1)
Z2 = np.dot(W2, A1) + b2
A2 = Relu(Z2)
Z3 = np.dot(W3, A2) + b3
y_hat = Sigmoid(Z3)
J = (- np.multiply(y_train, np.log(y_hat)) - np.multiply(1-y_train, np.log(1-y_hat))) / y_train.shape[1]
e3 = y_hat - y_train
dW3 = e3.dot(A2.T)/A2.shape[0]
db3 = np.sum(e3)/A2.shape[0]
e2 = W3.T.dot(e3) * A2
dW2 = e2.dot(A1.T)/A1.shape[0]
db2 = np.sum(e2)/A1.shape[0]
e1 = W2.T.dot(e2) * A1
dW1 = e1.dot(X_train)/X_train.shape[1]
db1 = np.sum(e1)/X_train.shape[1]
W3 -= lr * dW3
W2 -= lr * dW2
W1 -= lr * dW1
b3 -= lr * db3
b2 -= lr * db2
b1 -= lr * db1
- We should try to clean our code by putting them in
def
function. - For now, we are using the parameter whose name is manually create => We should try to create a more flexible way for our parameter in case we want to increase (decrease) the number of hidden layer.
- If the number of traning is big, then the training time will increase very much. We could deal with that by using mini-batch. You can read about it here