# Implementing a custom solver

Now that we have all the elements, we can write a (very basic) solver in python.

In [1]:
"""Initialization (see "00 Basic solver usage")."""
import os
import numpy as np
import math

# Silence caffe network loading output. Must be set before importing caffe
os.environ["GLOG_minloglevel"] = '2'
import caffe
CAFFE_ROOT="/caffe"
os.chdir(CAFFE_ROOT) # change the current directory to the caffe root, to help
                     # with the relative paths
USE_GPU = True
if USE_GPU:
    caffe.set_device(0)
    caffe.set_mode_gpu()
else:
    caffe.set_mode_cpu()
# For reproducible results
caffe.set_random_seed(0) # custom modification, remove this line from your code if it doesn't work
np.random.seed(0)

## Loading the data

We are going to load the data from an LMDB database with the python `lmdb` module.

In [2]:
import lmdb

def image_generator(db_path):
    """A generator that yields all the images in the database, normalized."""
    db_handle = lmdb.open(db_path, readonly=True) # We don't need to write in there
    with db_handle.begin() as db:
        cur = db.cursor() # Points to an element in the database
        for _, value in cur: # Iterate over all the images
            # Read the LMDB and transform the protobuf into a numpy array
            datum = caffe.proto.caffe_pb2.Datum()
            datum.ParseFromString(value) # String -> Protobuf
            int_x = caffe.io.datum_to_array(datum) # parse the datum into a nparray
            x = np.asfarray(int_x, dtype=np.float32) # Convert to float
            yield x - 128 # Normalize by removing the mean
            
def batch_generator(shape, db_path):
    """A generator that yield all the images in the database by batches"""
    gen = image_generator(db_path)
    res = np.zeros(shape) # Result array
    while True: # It will stop when next(gen) finishes
        for i in range(shape[0]):
            res[i] = next(gen) # Set by slices
        yield res

## Testing the network

Every so often during training, we will test the network. It is a simple matter of iterating over the whole test database, running it through the network and collecting the results.

In [3]:
def test_network(test_net, db_path_test):
    # Average the accuracy and loss over the number of batches
    accuracy = 0
    loss = 0
    test_batches = 0
    input_shape = test_net.blobs["data"].data.shape
    for test_batch in batch_generator(input_shape, db_path_test):
        test_batches += 1
        # Run the forward step
        test_net.blobs["data"].data[...] = test_batch
        test_net.forward()
        # Collect the outputs
        accuracy += test_net.blobs["accuracy"].data
        loss += test_net.blobs["loss"].data
    return (accuracy / test_batches, loss / test_batches)

## Loading the network

Here, we want two networks: the training network, and the testing one. However, they should share the weights of their layers. Fortunately, `pycaffe` provides a `share_with` method to do just that.

In [4]:
net_path = "examples/mnist/lenet_train_test.prototxt"
net = caffe.Net(net_path, caffe.TRAIN)
test_net = caffe.Net(net_path, caffe.TEST) # Testing version
net.share_with(test_net) # Share the weights between the two networks

## Solving the network

Now is time to do the actual solving. This solver is relatively minimalistic, while still presenting the main features you could expect from a solver.

In [5]:
num_epochs = 2 # How many times we are going to run through the database
iter_num = 0 # Current iteration number

# Training and testing examples
db_path = "examples/mnist/mnist_train_lmdb"
db_path_test = "examples/mnist/mnist_test_lmdb"

# Learning rate. We are using the lr_policy "inv", here, with no momentum
base_lr = 0.01
# Parameters with which to update the learning rate
gamma = 1e-4
power = 0.75

for epoch in range(num_epochs):
    print("Starting epoch {}".format(epoch))
    # At each epoch, iterate over the whole database
    input_shape = net.blobs["data"].data.shape
    for batch in batch_generator(input_shape, db_path):
        iter_num += 1
        
        # Run the forward step
        net.blobs["data"].data[...] = batch
        net.forward()
        
        # Clear the diffs, then run the backward step
        for name, l in zip(net._layer_names, net.layers):
            for b in l.blobs:
                b.diff[...] = net.blob_loss_weights[name]
        net.backward()
        
        # Update the learning rate, with the "inv" lr_policy
        learning_rate = base_lr * math.pow(1 + gamma * iter_num, - power)
        
        # Apply the diffs, with the learning rate
        for l in net.layers:
            for b in l.blobs:
                b.data[...] -= learning_rate * b.diff
        
        # Display the loss every 50 iterations
        if iter_num % 50 == 0:
            print("Iter {}: loss={}".format(iter_num, net.blobs["loss"].data))
            
        # Test the network every 200 iterations
        if iter_num % 200 == 0:
            print("Testing network: accuracy={}, loss={}".format(*test_network(test_net, db_path_test)))

print("Training finished after {} iterations".format(iter_num))
print("Final performance: accuracy={}, loss={}".format(*test_network(test_net, db_path_test)))
# Save the weights
net.save("examples/mnist/lenet_iter_{}.caffemodel".format(iter_num))

Starting epoch 0
Iter 50: loss=1.38813483715
Iter 100: loss=0.689562916756
Iter 150: loss=0.589528501034
Iter 200: loss=0.59515106678
Testing network: accuracy=0.881399998069, loss=0.428480608463
Iter 250: loss=0.606553435326
Iter 300: loss=0.395008981228
Iter 350: loss=0.173302009702
Iter 400: loss=0.336296796799
Testing network: accuracy=0.91219999969, loss=0.309457472917
Iter 450: loss=0.210273712873
Iter 500: loss=0.312768518925
Iter 550: loss=0.262531369925
Iter 600: loss=0.321408331394
Testing network: accuracy=0.922200001478, loss=0.266542272568
Iter 650: loss=0.384231954813
Iter 700: loss=0.22680439055
Iter 750: loss=0.241274178028
Iter 800: loss=0.288684517145
Testing network: accuracy=0.937700000405, loss=0.222113073906
Iter 850: loss=0.224339067936
Iter 900: loss=0.139535531402
Starting epoch 1
Iter 950: loss=0.22338809073
Iter 1000: loss=0.129087716341
Testing network: accuracy=0.94470000267, loss=0.193666659808
Iter 1050: loss=0.261462211609
Iter 1100: loss=0.106695577502


This minimalistic solver shows the possibilities of what you can do in Python. However, to make a proper solver, we would need to implement many more features: different learning rate policies (momentum), gradient clipping, batch normalization, and other potential improvements.

One way to do that is to implement everything in Python, but it has its disadvantages:

  - It is slower than C++, even though usually most of the time will be spent on the GPU anyway
  - Re-writing it should not be necessary, the logic is already there in the C++ code
  - Re-writing it **will** lead to bugs, sometimes hard to check
  
For all these reasons, sometimes it's better to just call the c++ functions directly. Caffe has a relatively limited python API for now, but it will grow, and it is relatively easy to modify to add the functionnalities you want. The next tutorial will cover Python API modifications for custom function calls.