In [1]:
import numpy as np
import renom as rm

## ReNom v3.0 for v2.0+ users

For users familiar with the v2.0 framework, this example gives several short side-by-side comparisons of v2.0 and v3.0. The two main purposes of v3.0 is to take many of the functions that previously were loosely related before and insert them into a single form that reduces overhead of jumping between the different modes, allows for more optimization and better support for multiple devices. 
In short, v2.0 worked by gathering the "history" of operations being performed and using this history as a way to determine what calculations to perform. ReNom was in other words dependent on the user constantly rebuilding and tearing down the history in order to produce its calculations.

v3.0 on the other hand, introduces the idea of a calculation graph, allowing ReNom to predict what the user wants to do, so long as it is told what the graph should look like at least once. Having control over the graph and being able to observe inputs and outputs coming to and from ReNom is for debugging purposes very beneficial, so it is still possible to use v3.0 in the same way that v2.0 was used, but we now allow for a different, more light-weight and optimized way of running ReNom.

One of the key goals of the ReNom v3.0 implementation was that previous users of ReNom should not have to adapt much in order to update their code to use the new multi-gpu oriented features. For most of the layers, this means that all that has to change is the name and location of the operations being executed. As an example, a fully-connected layer is not constructed using rm.Dense, but should be constructed using rm.graph.DenseGraphElement instead. The new naming convention is a way to make sure that users are aware of the change from v2.0 to v3.0 by 'opting-in' using the new names.

### Some basic examples

What follows next is a side-by-side comparison of v2.0 and v3.0. In constructing the graph, there are made very few changes in how to perform the forward calculations, which is as described above on purpose.

For a simple Dense network, v2.0 uses the name __Dense__, whereas v3.0 requires the use __graph.DenseGraphElement__. Both accept NumPy arrays as inputs and can the results of performing the operations can be printed to the python interpreter immediately.

In [2]:
arr = np.arange(4).reshape(2,2)
init = rm.utility.initializer.Constant(1)


model_old1 = rm.Dense(3, initializer = init)
model_new1 = rm.graph.DenseGraphElement(3, initializer = init)

ret_old1 = model_old1(arr)
ret_new1 = model_new1(arr)

print('Old result:')
print(ret_old1)
print('New result:')
print(ret_new1)


Old result:
[[ 1.  1.  1.]
 [ 5.  5.  5.]]
New result:
[[ 1.  1.  1.]
 [ 5.  5.  5.]]


A thing to note however, is that v2.0 returns a Node, which is an instance of the NumPy Ndarray class, whereas v3.0 returns a learnable_graph_element, which is a class __not__ inheriting NumPy's ndarray. This means that you should no longer call NumPy functions directly without explicitly turning the returned value into an ndarray using __as_ndarray__.

In [3]:
print('v2.0 return value is NumPy array:',isinstance(ret_old1, np.ndarray))
print('v3.0 return value is NumPy array:',isinstance(ret_new1, np.ndarray))

print()
print('Result from NumPy operation on v2.0 returned value:')
print(np.sum(ret_old1, axis=0))
print('Result from NumPy operation on v3.0 returned value:')
print(np.sum(ret_new1, axis=0))
print('Result from NumPy operation on v3.0 converted value:')
print(np.sum(ret_new1.as_ndarray(), axis=0))

v2.0 return value is NumPy array: True
v3.0 return value is NumPy array: False

Result from NumPy operation on v2.0 returned value:
[ 6.  6.  6.]
Result from NumPy operation on v3.0 returned value:
[[ 1.  1.  1.]
 [ 5.  5.  5.]]
Result from NumPy operation on v3.0 converted value:
[ 6.  6.  6.]


Like before, we can chain several models together to produce networks capable of training on more complicated data sets than what a single layer could hope to accomplish. This is done in the same way as before, where the output of one operation is fed to the input of another.

In [4]:
model_old2 = rm.Dense(5, initializer = init)
model_new2 = rm.graph.DenseGraphElement(5, initializer = init)

ret_old2 = model_old2(ret_old1)
ret_new2 = model_new2(ret_new1)

print('v2.0 result:')
print(ret_old2)
print('v3.0 result:')
print(ret_new2)

v2.0 result:
[[  3.   3.   3.   3.   3.]
 [ 15.  15.  15.  15.  15.]]
v3.0 result:
[[  3.   3.   3.   3.   3.]
 [ 15.  15.  15.  15.  15.]]


One of the major differences in how v2.0 and v3.0 is used comes as a result of the new method of figuring out how to perform the backwards calculations. As explained before, v2.0 uses the history that was developed going forward, by following it backwards and picking up the gradients for each _significant_ variable.

These gradients are determined by and stored in a singular __Grads__ object, which the user constructs by calling the method __grad__ on a Node. It then follows the history that is recorded in each Node backwards. In v2.0, the history is not recorded unless we explicitly tell the model that it should record the history by the __train__ context for the model.

In [5]:
with model_old1.train():
    ret_old3 = model_old1(arr)
    ret_old4 = rm.sum(model_old2(ret_old3))
print('Gradient found through old method is:')
ret_old4.grad().get(ret_old3)

Gradient found through old method is:


Reshape([[ 5.,  5.,  5.],
         [ 5.,  5.,  5.]], dtype=float32)

In v3.0, we now construct the backward graph _while_ we are constructing the forward calculation graph, meaning that the grad object that we previously used is no longer required. Since we are no longer dealing with a history, but rather a full tree from the first calculated value to the gradient of its input, we instead perform the backward operation going forward from the current point in the graph.

This is done by calling __backward__ on any element returned from a GraphFactory.

In [6]:
ret_new3 = model_new1(arr)
ret_new4 = model_new2(ret_new3)
print('Gradient found through new method is:')
ret_new4.backward().get_gradient(ret_new3.output)

Gradient found through new method is:


[[ 5.  5.  5.]
 [ 5.  5.  5.]]

Lastly, in order to update the networks, we need to include an optimizer in our model. In v2.0, the optimizer is another example of a seperate utlity. Putting it simply, this old optimizer accepts gradients, and simply transforms this gradient according to the method chosen, allowing the __grad__ object from before to use this before applying updates to the graph.

First we construct our optimizer object and then we feed said optimizer object to network.

In [7]:
opt_old = rm.Sgd()
ret_old4.grad().update(opt_old)

In v3.0, the optimizer is no longer a seperate process but is integrated into the calculation graph itself. This change is subtle and the way the user interacts with the update functionality has not changed, but provides more control on the back-end of things. The way to update the new graph is done as so:

In [8]:
opt_new = rm.graph.sgd_update()
ret_new4.backward().update(opt_new)

### Feeding data

So far, we've seen how to perform the basic operations that we are used to. It is possible to use the same way of feeding data to the model using ReNom's __NdarrayDistributor__, but later we show examples on how to use the new __DistributorElement__.

The current method for feeding data to ReNom, is to make use of one of the classes in the __Distributor__ module, by calling the __batch__ method, iterating over the input values.

In [10]:
x = np.arange(10).reshape(-1,1)
y = x.copy() * 2

distributor_old = rm.NdarrayDistributor(x,y)
for batch, label in distributor_old.batch(batch_size = 1, shuffle = True):
    print(batch, label)

[[0]] [[0]]
[[1]] [[2]]
[[2]] [[4]]
[[3]] [[6]]
[[4]] [[8]]
[[5]] [[10]]
[[6]] [[12]]
[[7]] [[14]]
[[8]] [[16]]
[[9]] [[18]]
