In [1]:
import numpy as np
import renom as rm

## ReNom v3.0 for v2.0+ users

For users familiar with the v2.0 framework, this example gives several short side-by-side comparisons of v2.0 and v3.0. The two main purposes of v3.0 is to take many of the functions that previously were loosely related before and insert them into a single form that reduces overhead of jumping between the different modes, allows for more optimization and better support for multiple devices. 
In short, v2.0 worked by gathering the "history" of operations being performed and using this history as a way to determine what calculations to perform. ReNom was in other words dependent on the user constantly rebuilding and tearing down the history in order to produce its calculations.

v3.0 on the other hand, introduces the idea of a calculation graph, allowing ReNom to predict what the user wants to do, so long as it is told what the graph should look like at least once. Having control over the graph and being able to observe inputs and outputs coming to and from ReNom is for debugging purposes very beneficial, so it is still possible to use v3.0 in the same way that v2.0 was used, but we now allow for a different, more light-weight and optimized way of running ReNom.

One of the key goals of the ReNom v3.0 implementation was that previous users of ReNom should not have to adapt much in order to update their code to use the new multi-gpu oriented features. For most of the layers, this means that all that has to change is the name and location of the operations being executed. As an example, a fully-connected layer is not constructed using rm.Dense, but should be constructed using rm.graph.DenseGraphElement instead. The new naming convention is a way to make sure that users are aware of the change from v2.0 to v3.0 by 'opting-in' using the new names.

### Some basic examples

What follows next is a side-by-side comparison of v2.0 and v3.0. In constructing the graph, there are made very few changes in how to perform the forward calculations, which is as described above on purpose.

For a simple Dense network, v2.0 uses the name __Dense__, whereas v3.0 requires the use __graph.DenseGraphElement__. Both accept NumPy arrays as inputs and can the results of performing the operations can be printed to the python interpreter immediately.

In [2]:
arr = np.arange(4).reshape(2,2)
init = rm.utility.initializer.Constant(1)


model_old1 = rm.Dense(3, initializer = init)
model_new1 = rm.graph.DenseGraphElement(3, initializer = init)

ret_old1 = model_old1(arr)
ret_new1 = model_new1(arr)

print('Old result:')
print(ret_old1)
print('New result:')
print(ret_new1)


Old result:
[[ 1.  1.  1.]
 [ 5.  5.  5.]]
New result:
[[ 1.  1.  1.]
 [ 5.  5.  5.]]


A thing to note however, is that v2.0 returns a Node, which is an instance of the NumPy Ndarray class, whereas v3.0 returns a learnable_graph_element, which is a class __not__ inheriting NumPy's ndarray. This means that you should no longer call NumPy functions directly without explicitly turning the returned value into an ndarray using __as_ndarray__.

In [3]:
print('v2.0 return value is NumPy array:',isinstance(ret_old1, np.ndarray))
print('v3.0 return value is NumPy array:',isinstance(ret_new1, np.ndarray))

print()
print('Result from NumPy operation on v2.0 returned value:')
print(np.sum(ret_old1, axis=0))
print('Result from NumPy operation on v3.0 returned value:')
print(np.sum(ret_new1, axis=0))
print('Result from NumPy operation on v3.0 converted value:')
print(np.sum(ret_new1.as_ndarray(), axis=0))

v2.0 return value is NumPy array: True
v3.0 return value is NumPy array: False

Result from NumPy operation on v2.0 returned value:
[ 6.  6.  6.]
Result from NumPy operation on v3.0 returned value:
[[ 1.  1.  1.]
 [ 5.  5.  5.]]
Result from NumPy operation on v3.0 converted value:
[ 6.  6.  6.]


Like before, we can chain several models together to produce networks capable of training on more complicated data sets than what a single layer could hope to accomplish. This is done in the same way as before, where the output of one operation is fed to the input of another.

In [4]:
model_old2 = rm.Dense(5, initializer = init)
model_new2 = rm.graph.DenseGraphElement(5, initializer = init)

ret_old2 = model_old2(ret_old1)
ret_new2 = model_new2(ret_new1)

print('v2.0 result:')
print(ret_old2)
print('v3.0 result:')
print(ret_new2)

v2.0 result:
[[  3.   3.   3.   3.   3.]
 [ 15.  15.  15.  15.  15.]]
v3.0 result:
[[  3.   3.   3.   3.   3.]
 [ 15.  15.  15.  15.  15.]]


One of the major differences in how v2.0 and v3.0 is used comes as a result of the new method of figuring out how to perform the backwards calculations. As explained before, v2.0 uses the history that was developed going forward, by following it backwards and picking up the gradients for each _significant_ variable.

These gradients are determined by and stored in a singular __Grads__ object, which the user constructs by calling the method __grad__ on a Node. It then follows the history that is recorded in each Node backwards. In v2.0, the history is not recorded unless we explicitly tell the model that it should record the history by the __train__ context for the model.

In [5]:
with model_old1.train():
    ret_old3 = model_old1(arr)
    ret_old4 = rm.sum(model_old2(ret_old3))
print('Gradient found through old method is:')
ret_old4.grad().get(ret_old3)

Gradient found through old method is:


Reshape([[ 5.,  5.,  5.],
         [ 5.,  5.,  5.]], dtype=float32)

In v3.0, we now construct the backward graph _while_ we are constructing the forward calculation graph, meaning that the grad object that we previously used is no longer required. Since we are no longer dealing with a history, but rather a full tree from the first calculated value to the gradient of its input, we instead perform the backward operation going forward from the current point in the graph.

This is done by calling __backward__ on any element returned from a GraphFactory.

In [6]:
ret_new3 = model_new1(arr)
ret_new4 = model_new2(ret_new3)
print('Gradient found through new method is:')
ret_new4.backward().get_gradient(ret_new3.output)

Gradient found through new method is:


[[ 5.  5.  5.]
 [ 5.  5.  5.]]

Lastly, in order to update the networks, we need to include an optimizer in our model. In v2.0, the optimizer is another example of a seperate utlity. Putting it simply, this old optimizer accepts gradients, and simply transforms this gradient according to the method chosen, allowing the __grad__ object from before to use this before applying updates to the graph.

First we construct our optimizer object and then we feed said optimizer object to network.

In [7]:
opt_old = rm.Sgd()
ret_old4.grad().update(opt_old)

In v3.0, the optimizer is no longer a seperate process but is integrated into the calculation graph itself. This change is subtle and the way the user interacts with the update functionality has not changed, but provides more control on the back-end of things. The way to update the new graph is done as so:

In [8]:
opt_new = rm.graph.sgd_update()
ret_new4.backward().update(opt_new)

### The Tree

Before we delve deeper into how training works on the new framework, it is prudent to introduce some of the mechanics happening in the background of v3.0. In v2.0, the computation is done using an ad-hoc approach, taking data _as it is_ and transforming it into something else. The previous version of ReNom tried to refrain from storing any unecessary information and allow for a NumPy-like interface for the calculations.

With v3.0, we introduce the concept of the calculation graph, which gives ReNom the ability to predict and optimize the calculations that the user wishes to perform. To this however, there are some caveats to be careful of, since now we maintain and modify a state.

In order to print a tree, one can call the __print_tree__ method on a graph element, which displays what operations have been gathered up so far and what correlation they have.

In [9]:
a = rm.graph.StaticVariable(np.array([2]))
b = rm.graph.StaticVariable(np.array([3]))


print('a/b')
print(a, b)
print()
print('Tree output for a:')
a.print_tree()
print()
print('Tree output for b:')
b.print_tree()

a/b
[2] [3]

Tree output for a:
I am a Static Variable at depth 0 with tags: ['Forward']

Tree output for b:
I am a Static Variable at depth 0 with tags: ['Forward']


You can see here how the tree contains some information already, even though nothing has yet been connected. We get the name of the operation (Static Variable), the depth 0, and the singular tag Forward. What this means is that the operation being performed is a Static Variable, (i.e. it does nothing but provide a constant value), the _depth_ of the operation is 0, meaning that it is at the bottom and no prior operations are required for it to perform and the tags are _Forward_, indicating that it is part of the forward calculation.

Naturally, since we haven't connected anything yet nothing special happens, but if we perform an operation of some sort, this tree will change.

In [10]:
c = a + b
print('a + b = c')
print(a, '+', b, '=', c)

print()
print('Tree output for c:')
c.print_tree()

a + b = c
[2] + [3] = [5]

Tree output for c:
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 1 with tags: ['Forward']


The output for our c is more complicated than it was for a and b. With this output, we can see how the tree _builds up_ over time. In the tree output from earlier, we see that there are two static variables and one Add operation. The two static variables are the operations a and b from earlier, still a depth 0.

The new thing we have in our tree is the Add operation at depth 1. The Add operation adds together the two previous values and because it requires these prior element to perform the operation, this operation is placed at depth 1. This allows the tree to figure out that the static variables ___must___ be evaluated before it can perform the add operation.

Continuing in this fashion, we can make the tree grow even larger.

In [11]:
d = c + np.array([1.5])

print('c + 1.5 = d')
print(d)

print()
print('Tree output for d:')
d.print_tree()

c + 1.5 = d
[ 6.5]

Tree output for d:
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 1 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 2 with tags: ['Forward']


Once again, the tree has grown larger, and we see two things with the previous outputs. The first is that there is a new add operation at depth 2, indicating that the relation between the new add operation and the previous one is such, that c ___must___ be evaluated before we can evaluate d. Secondly, the ordering of the tree does in itself not matter overly much, but rather the correlation between depths does. As such, although the NumPy array which we added to c could be placed at either depth 0 or 1, it simply puts it at 0.

Currently, the tree does not contain any operations tagged with Backwards, which is a result of us not yet having constructed the Backwards graph. We can do this automatically by calling the __Backwards__ method from earlier and then printing the tree.

In [12]:
d.backward()

print('Tree output for d with backwards:')
d.print_tree()

Tree output for d with backwards:
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 1 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 2 with tags: ['Forward']
I am a Sum (F) at depth 3 with tags: ['Forward']
I am a Constant (B) at depth 4 with tags: ['Backward']
I am a Add (B) at depth 5 with tags: ['Backward']
I am a Add (B) at depth 6 with tags: ['Backward']
I am a Add (B) at depth 6 with tags: ['Backward']
I am a Add (B) at depth 5 with tags: ['Backward']


One of the options we can do next, is to build diamond-shaped graphs, among other more complicated structures. This does however mean, that if the user is not careful, then the tree might contain unwanted components. Below is an example on using two variables to create a different sort of calculation (v + w) + (v + o) = k

In [13]:
v = rm.graph.StaticVariable(np.array([1.5]))
w = rm.graph.StaticVariable(np.array([2.5]))
o = rm.graph.StaticVariable(np.array([3.5]))

i = v + w
j = v + o
k = i + j

print('Tree output for diamond shaped k:')
k.print_tree()

Tree output for diamond shaped k:
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 1 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 1 with tags: ['Forward']
I am a Add (F) at depth 2 with tags: ['Forward']


In the previous calculation, v appears twice and is only represented once in the tree (since it still only needs to be evaluated once).

If we however try to use this v for another calculation, v will _still_ contain its current tree unless we explicitly disconnect it. In the below example, v is connected to yet another calculation with a tree that is larger than one might expect.

In [14]:
w = v + np.array([-2])

print('Tree output for new w:')
w.print_tree()

Tree output for new w:
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 1 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 1 with tags: ['Forward']
I am a Add (F) at depth 2 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 1 with tags: ['Forward']


The reason for this, is that any elements coming _after_ v is still part of the tree that v is currently holding. In order to remove this prior tree, output elements contain a __disconnect__ method, which cuts off the current tree.

In [15]:
v.disconnect()
w = v + np.array([-2])

print('Tree output for new new w:')
w.print_tree()

Tree output for new new w:
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Static Variable at depth 0 with tags: ['Forward']
I am a Add (F) at depth 1 with tags: ['Forward']


### Feeding data

So far, we've seen how to perform the basic operations that we are used to. It is possible to use the same way of feeding data to the model using ReNom's __NdarrayDistributor__, but later we show examples on how to use the new __DistributorElement__.

The current method for feeding data to ReNom, is to make use of one of the classes in the __Distributor__ module, by calling the __batch__ method, iterating over the input values.

In [16]:
x = np.arange(10).reshape(-1,1)
y = x.copy() * 2

distributor_old = rm.NdarrayDistributor(x,y)
for batch, label in distributor_old.batch(batch_size = 1, shuffle = False):
    print(batch, label)

[[0]] [[0]]
[[1]] [[2]]
[[2]] [[4]]
[[3]] [[6]]
[[4]] [[8]]
[[5]] [[10]]
[[6]] [[12]]
[[7]] [[14]]
[[8]] [[16]]
[[9]] [[18]]


This version of the distributor is somewhat simple, in that it returns a series of data and associated target values as NumPy arrays. This NumPy array can then be used as input for a model, where the data is the input data and label is the target we train against. In another way of understanding it, the distributor receives an array of values and simply outputs to the user, several smaller arrays that are taken from the original. The user is then responsible for feeding these values to the model.

The outputs of the distributor should be iterated over, using the iterator returned by the __batch__ method. The return signature for this iterator is (data, target).

In [17]:
init2 = rm.utility.initializer.Constant(3)
simple_model_old = rm.Dense(2, initializer = init2)


print('Data / Target')
for batch, label in distributor_old.batch(batch_size=1, shuffle = False):
    print(batch, simple_model_old(batch))

Data / Target
[[0]] [[ 0.  0.]]
[[1]] [[ 3.  3.]]
[[2]] [[ 6.  6.]]
[[3]] [[ 9.  9.]]
[[4]] [[ 12.  12.]]
[[5]] [[ 15.  15.]]
[[6]] [[ 18.  18.]]
[[7]] [[ 21.  21.]]
[[8]] [[ 24.  24.]]
[[9]] [[ 27.  27.]]


We can feed input to the models constructed using the v3.0 interface in the same fashion.

In [18]:
simple_model_new = rm.graph.DenseGraphElement(2, initializer = init2)


print('Input / Output')
for batch, label in distributor_old.batch(batch_size=1, shuffle = False):
    print(batch, simple_model_new(batch))

Input / Output
[[0]] [[ 0.  0.]]
[[1]] [[ 3.  3.]]
[[2]] [[ 6.  6.]]
[[3]] [[ 9.  9.]]
[[4]] [[ 12.  12.]]
[[5]] [[ 15.  15.]]
[[6]] [[ 18.  18.]]
[[7]] [[ 21.  21.]]
[[8]] [[ 24.  24.]]
[[9]] [[ 27.  27.]]


One issue with this however, is that the distributor is another seperate process from the calculations. The distributor has received several optimizations, which means that it is still relatively light-weight and this method can be used for several, simpler networks and/or testing purposes. For optimal speed and space-efficiency however, v3.0 offers a new type of distributor that works in a different fashion, than simply outputting data.

This distributor can be somewhat harder to understand, but its use should not be overwhelmingly different. The new distributor no longer produces anything determined for the user, but instead produces elements that can be inserted into the v3.0 calculation graph. The graph is then responsible for preparing and feeding values, giving the graph more control over what is a relatively process consuming task.

The user is still responsible for linking up these graph elements using the __getOutputGraphs__ method. The signature from the previous __batch__ method is the same, (data, target), but the returned values are no longer simple data and cannot be used as input for v2.0. An example follows, and as can be seen, the calling method differs from the previous version.

In [19]:
distributor_new = rm.graph.DistributorElement(x, y, batch_size = 1, shuffle = False)
batch, label = distributor_new.getOutputGraphs()
try:
    while(True):
        print(batch, label)
except StopIteration:
    pass

[[0]] [[0]]
[[1]] [[2]]
[[2]] [[4]]
[[3]] [[6]]
[[4]] [[8]]
[[5]] [[10]]
[[6]] [[12]]
[[7]] [[14]]
[[8]] [[16]]
[[9]] [[18]]
[] []


Whereas previously the distributor was an iterator that produced the different input values, which the user then fed as input to any models using this data, the new distributor does not require the user to connect the data and the model.

The _batch_ and _label_ outputs given from __getOutputGraphs__ as seen above are representations of the data that will be coming into the model, rather than the data itself. These input-graphs will continue to produce output so long as there is enough input data. Once these graph elements run out of data, a StopIteration exception is thrown, to let the user know that there is no longer any data left.

A feature from the new v3.0 execution model, is that attempting to print any graph element will force an evaluation of the element. As seen above in the while loop, we constantly print the batch and label values, forcing the input elements to perform an evaluation, which is the ___next___ element for each input. It is not necessary to understand this in detail however. The following example showcases how to use this new distributor.

In [20]:
distributor_new.reset()
n = simple_model_new(batch)
epochs = 2

# Each iteration of this outer for-loop represents a single epoch.
for e in range(epochs):
    # It is important to remember the try/catch environment, to allow the graph to continue on its own.
    try:
        while(True):
            # Each print forces the graph to re-evaluate the full tree, 
            # forcing the data dispatcher to produce a new value.
            print(n)
    # Once the dispatcher runs out of data, it throws an exception, which should be caught by the user.
    except StopIteration:
        # If necessary to evaluate on the same data again, the user can reset the distributor.
        distributor_new.reset()

[[ 0.  0.]]
[[ 3.  3.]]
[[ 6.  6.]]
[[ 9.  9.]]
[[ 12.  12.]]
[[ 15.  15.]]
[[ 18.  18.]]
[[ 21.  21.]]
[[ 24.  24.]]
[[ 27.  27.]]
[]
[[ 0.  0.]]
[[ 3.  3.]]
[[ 6.  6.]]
[[ 9.  9.]]
[[ 12.  12.]]
[[ 15.  15.]]
[[ 18.  18.]]
[[ 21.  21.]]
[[ 24.  24.]]
[[ 27.  27.]]
[]


## The Executor

Everything introduced so far has been done so for the purpose of essentially one thing: remove any steps that require the user. In the above code block, we still see that there are a few more elements that require connecting. Seen above, the user would be responsible for resetting the graph, launching all the steps in the epochs and otherwise displaying outputs from the model.

The executor was created as a way to automatize and optimize all of these elements, not returning control to the user until the network is fully trained.

The basic executor can be acquired using the __getInferenceExecutor__ method which returns an object containing a method that will calling the graph. In order to output a loss value, we use the special ConstantLossElement, which simply produces the sum of the outputs as its loss, not requiring a target value.

In [21]:
loss = rm.graph.ConstantLossElement()
n = loss(n)
exe = n.getInferenceExecutor()

exe.execute(epochs = 3)

[ 270.]
[ 270.]
[ 270.]
