    A Minimalist Computation Graph Framework 
https://github.com/davidrosenberg/mlcourse/blob/gh-pages/Notebooks/computation-graph/computation-graph-framework.ipynb

In [1]:
import numpy as np
from IPython.display import Image

We start with a completely trival example by making a computation graph with a single node,  representing a scalar value $x$. To represent $x$, we'll use a node called a "value node", since it takes no input, but its output is the value of $x$.  Later, in machine learning contexts, we'll use nodes of this type to represent inputs, outcomes, and parameters.

We will interpret this single-node computation graph as representing the function $f(x)=x$.  Thus the "graph output", i.e the scalar-valued output of the function our computation graph represents, is also the scalar $x$.  We will refer to the node whose output is the graph output as the "graph output node".  Our computation graphs are only designed to represent scalar-valued functions, since our goal is to find inputs that minimize of maximize the function, and it's not clear what that means for a vector-valued function.

We will represent value nodes by instances of the class `ValueNode`, defined below:

In [2]:
class ValueNode(object):
    """Computation graph node having no input but simply holding a value"""
    def __init__(self, node_name):
        self.node_name = node_name
        self.out = None
        self.d_out = None
    
    def forward(self):
        self.d_out = np.zeros(self.out.shape)
        return self.out
    
    def backward(self):
        return self.d_out
    
    def get_predecessors(self):
        return []
    

To give a `ValueNode` a particular output value, we directly set it.  It should always be a numpy array. For example, for a scalar value we can set it as follows:

In [3]:
x = ValueNode("x")
x.out = np.array(3)

In backpropagation, we compute the partial derivative of the graph output with respect to the outputs of every node in the graph. Every node in the graph contains an instance variable called out, witch is a numpy array representing the output of that node.

Every node also has an instance variable d_out, which during backpropagation is set to contain the partial derivatives of the graph output with respect to the entries of out of the same node. In general, a variable named d_X wil be used tO represent the derivative of the graph output with respect to X. So ir we use J to represent the sacalar-valued graph output as we commonly do, then d_out, d_x, and d_y should be undenstood to represent $\frac{\partial J}{\partial \text{out}}$, $\frac{\partial J}{\partial \text{x}}$, and $\frac{\partial J}{\partial \text{y}}$, respectively. 

 In any node, `d_out` will always be a `numpy` array of the same shape as `out`.

Backpropagation always starts by setting `d_out` for the graph output node to the scalar `1`, since it is the partial derivative of the graph output with repect to itself.

Let's do that for the node in our trivial example.  Note that for a `ValueNode`, the `backward` function just returns `d_out`.

In [4]:
x = ValueNode("x")
x.out = np.array(3) # set the value of x
print(x.forward()) # both the node output and the graph output
x.d_out = np.array(1) # initialize backpropagation
print(x.backward()) # the partial derivative of the graph output w.r.t. the output of x

3
1
