# Automatic Differentiation (AD)
***
**Name**: __Poorwa Hirve__
***

## Goal

The goal of this assignment is to build your own AD package from scratch in Python using NumPy for vector operations. At the core of this framework that you will be building, lie's a technique called _Operator Overloading_ . AD using the above techniques proceeds in 2 stages:

1. Building a **Computation Graph** of operators and variables using **Tensor** and **Functions**. 
2. Evaluating the feed-forward and back-prop routine using the above computational graph.

### 1. Computational Graph



A computational graph is a directed acyclic graph (DAG) where nodes correspond to operators or variables and edges represent the flow of data in between these nodes. Implementing a neural network as a computational graph (Torch, Chainer, Theano etc & c.) allows us to track operations executed in mapping a data set from $R^n \to R^m$ with significantly less overhead. 

Moreover, once a computational graph has been built, back-propogation of gradients back to the inputs using the chain rule is quite efficient. 

<img src="./res/mlp_ann.png" alt="mlp_ann" style="width:500px;"/>

Through the course of this assignment, we realize a node in the computational graph using an object - `Function` [1] and implement a container for data called - `Tensor` [2].

In [1]:
import numpy as np

### 1.1 Functions

Every function/node is composed of the following 3 things:

1. The `parents[]` which are inputs to that node. Each input is a Tensor object wrapping NumPy's ndarrays. 
2. The `forward()` which returns the result of an operator executed with its' `parents[]` as the input.
3. The `backward()` which **accumulates** the gradient from it's children (output of forward()) and back-propogates the accumulated gradients to it's parents. The math for reverse mode accumulation will be implemented here.

Below is an implementation of the `Function` class which we will be inheriting to implement following operations in this assignment.

1. Add (Implementation and Usage provided)
2. Subtract
3. Multiply
4. Divide
5. Sum
6. ReLU Activation
7. Mean
8. Dot Product
9. Pow

In [2]:
class Function(object):
    """
    Represents a node in computational graph that perfoms 
    a computation.

    During forward mode computation, it takes in 
    1 or more inputs/parents, and returns a result of the
    computation as the output.

    In reverse mode accumulation, it takes in the 
    gradients w.r.t. the output of the node, accumulates them
    by calculating the gradients w.r.t it's inputs/parents
    and back-propogates the gradients to the parents.
    """
    
    # List of inputs to the node. 
    parents = []
    
    def __init__(self):
        pass

    def forward(self, *args):
        """
        Forward mode computation of the 'operation'
        on the inputs/parents to be implemented here.
        """
        pass

    def backward(self, gradient):
        """
        Reverse mode computation of the node implemented here.
        """
        pass

### 1.1.1 Add

In [3]:
class Add(Function):
    """
    Add parent inputs and return the result.
    """
    def forward(self, *args):
        """
        Forward computation of the binary Add operation.

        param:
        
        args (n=2 Tensors): 2 Tensors to be added.
        
        returns:
        
        value (ndarray): Result of "+" operation on
            input args.
        
        """
        # Extend this nodes' parents to include new inputs. 
        self.parents = list(args)
        # Add the 2 input Tensor's values
        value = self.parents[0].value + self.parents[1].value
        
        return value

    def backward(self, gradient):
        """
        Accumulates the gradients for "Add" operation from its'
        children/outputs and passes them on to self.parents.
        
        param:
        
        gradient (ndarray or scalar): gradient w.r.t output of
        "Add"
        
        returns:
        
        None
        
        """
        
        # Accumulate gradient. Notice how the gradients are accumulated
        # and not stored in parents[i].grad directly. This is a key operation
        # in reverse mode gradient accumulation
        self.parents[0].grad += gradient
        self.parents[1].grad += gradient
        
        # Back propogate gradients to parent's of each inputs.
        # Stop when a leaf node is reached.
        if not self.parents[0].is_leaf:
             self.parents[0].backward(self.parents[0].grad)

        if not self.parents[1].is_leaf:
            self.parents[1].backward(self.parents[1].grad)

### 1.1.2 Sub

In [4]:
class Sub(Function):
    """
    Subtract parent inputs and return the result.
    """
    def forward(self, *args):
        """
        Forward computation of the binary Subtract operation.

        param:
        
        args (n=2 Tensors): 2 Tensors to be subtracted.
        
        returns:
        
        value (ndarray): Result of "-" operation on
            input args.
        
        """
        # Extend this nodes' parents to include new inputs. 
        self.parents = list(args)
        # Add the 2 input Tensor's values
        value = self.parents[0].value - self.parents[1].value
        return value
        
    def backward(self, gradient):
        """
        Accumulates the gradients for "Subtract" operation from its'
        children/outputs and passes them on to self.parents.
        
        param:
        
        gradient (ndarray or scalar): gradient w.r.t output of
        "Sub"
        
        returns:
        
        None
        
        """
        # Accumulate gradient. Notice how the gradients are accumulated
        # and not stored in parents[i].grad directly. This is a key operation
        # in reverse mode gradient accumulation
        self.parents[0].grad += gradient
        self.parents[1].grad -= gradient
        
        # Back propogate gradients to parent's of each inputs.
        # Stop when a leaf node is reached.
        if not self.parents[0].is_leaf:
             self.parents[0].backward(self.parents[0].grad)

        if not self.parents[1].is_leaf:
            self.parents[1].backward(self.parents[1].grad)

### 1.1.3 Mul

In [5]:
class Mul(Function):
    """
    Multiply parent inputs and return the result.
    """
    def forward(self, *args):
        """
        Forward computation of the binary Multiply operation.
        Hint: You can use np.multiply here
        param:
        
        args (n=2 Tensors): 2 Tensors to be multiplied.
        
        returns:
        
        value (ndarray): Result of "*" operation on
            input args.
        """
        # Extend this nodes' parents to include new inputs. 
        self.parents = list(args)
        # Add the 2 input Tensor's values
        value = self.parents[0].value * self.parents[1].value
        return value

    def backward(self, gradient):
        """
        Accumulates the gradients for "Multiply" operation from its'
        children/outputs and passes them on to self.parents.
        
        param:
        
        gradient (ndarray or scalar): gradient w.r.t output of
        "Mul"
        
        returns:
        
        None
        
        """
        # Accumulate gradient. Notice how the gradients are accumulated
        # and not stored in parents[i].grad directly. This is a key operation
        # in reverse mode gradient accumulation
        self.parents[0].grad += gradient * self.parents[1].value
        self.parents[1].grad += gradient * self.parents[0].value
        
        # Back propogate gradients to parent's of each inputs.
        # Stop when a leaf node is reached.
        if not self.parents[0].is_leaf:
             self.parents[0].backward(self.parents[0].grad)

        if not self.parents[1].is_leaf:
            self.parents[1].backward(self.parents[1].grad)

### 1.1.4 Div

In [6]:
class Div(Function):
    """
    Divide parent inputs and return the result.
    """
    def forward(self, *args):
        """
        Forward computation of the binary Divide operation.
        
        param:
        
        args (n=2 Tensors): 2 Tensors to be multiplied.
        
        returns:
        
        value (ndarray): Result of "/" operation on
            input args.
        """
        # Extend this nodes' parents to include new inputs. 
        self.parents = list(args)
        # Add the 2 input Tensor's values
        value = self.parents[0].value / self.parents[1].value
        return value

    def backward(self, gradient):
        """
        Accumulates the gradients for "Divide" operation from its'
        children/outputs and passes them on to self.parents.
        
        param:
        
        gradient (ndarray or scalar): gradient w.r.t output of
        "Div"
        
        returns:
        
        None
        
        """
        # Accumulate gradient. Notice how the gradients are accumulated
        # and not stored in parents[i].grad directly. This is a key operation
        # in reverse mode gradient accumulation
        self.parents[0].grad += gradient / self.parents[1].value
        self.parents[1].grad += -gradient * self.parents[0].value / (self.parents[1].value ** 2)
        
        # Back propogate gradients to parent's of each inputs.
        # Stop when a leaf node is reached.
        if not self.parents[0].is_leaf:
             self.parents[0].backward(self.parents[0].grad)

        if not self.parents[1].is_leaf:
            self.parents[1].backward(self.parents[1].grad)

### 1.1.5 Sum

In [7]:
class Sum(Function):
    """
    Implements Sum of a 1xN input vector and return the result of size 1X1.
    """
    def forward(self, *args):
        """
        Forward computation of the unary Sum operation.
        Hint: You can use np.sum here
        param:
        
        args (n=1 Tensor): Tensor whose elements are to be added.
        
        returns:
        
        value (ndarray): Result of "Sum" operation on
            input args.
        """
        # Extend this nodes' parents to include new inputs. 
        self.parents = list(args)
        # Add the 2 input Tensor's values
        value = np.sum(self.parents[0].value)
        return value
        
    def backward(self, gradient = 1):
        """
        Accumulates the gradients for "Sum" operation from its'
        child/output and passes them on to self.parent.
        
        param:
        
        gradient (ndarray or scalar): gradient w.r.t output of
        "Sum"
        
        returns:
        
        None
        
        """
        # Accumulate gradient. Notice how the gradients are accumulated
        # and not stored in parents[i].grad directly. This is a key operation
        # in reverse mode gradient accumulation
        self.parents[0].grad += gradient
        
        # Back propogate gradients to parent's of each inputs.
        # Stop when a leaf node is reached.
        if not self.parents[0].is_leaf:
             self.parents[0].backward(self.parents[0].grad)

### 1.1.6 ReLU

In [8]:
class ReLU(Function):
    """
    ReLU parent input and return the result.
    """
    def forward(self, *args):
        """
        Forward computation of the unary ReLU operation.
        Hint: You can use np.maximum here
        param:
        
        args (n=1 Tensor): Tensor on which ReLU operation
        is to be applied.
        
        returns:
        
        value (ndarray): Result of "ReLU" operation on
            input args.
        """
        # Extend this nodes' parents to include new inputs. 
        self.parents = list(args)
        # Add the 2 input Tensor's values
        value = np.maximum(0, self.parents[0].value)
        return value

    def backward(self, gradient):
        """
        Accumulates the gradients for "ReLU" operation from its'
        child/output and passes them on to self.parent.
        
        param:
        
        gradient (ndarray or scalar): gradient w.r.t output of
        "ReLU"
        
        returns:
        
        None
        
        """
        # Accumulate gradient. Notice how the gradients are accumulated
        # and not stored in parents[i].grad directly. This is a key operation
        # in reverse mode gradient accumulation
        
        for i in range(len(self.parents[0].value)):
            self.parents[0].grad[i] += gradient[i] if self.parents[0].value[i] > 0 else 0
        
        # Back propogate gradients to parent's of each inputs.
        # Stop when a leaf node is reached.
        if not self.parents[0].is_leaf:
             self.parents[0].backward(self.parents[0].grad)


### 1.1.7 Mean

In [9]:
class Mean(Function):
    """
    Implements Mean of a 1xN input vector and return the result of size 1X1.
    """
    def forward(self, *args):
        """
        Forward computation of the unary Mean operation.
        Hint: You can use np.mean here
        param:
        
        args (n=1 Tensor): Tensor on which Mean operation
        is to be applied.
        
        returns:
        
        value (ndarray): Result of "Mean" operation on
            input args.
        """
        # Extend this nodes' parents to include new inputs. 
        self.parents = list(args)
        # Add the 2 input Tensor's values
        value = np.mean(self.parents[0].value)
        return value

    def backward(self, gradient):
        """
        Accumulates the gradients for "Mean" operation from its'
        child/output and passes them on to self.parent.
        
        param:
        
        gradient (ndarray or scalar): gradient w.r.t output of
        "Mean"
        
        returns:
        
        None
        
        """
        # Accumulate gradient. Notice how the gradients are accumulated
        # and not stored in parents[i].grad directly. This is a key operation
        # in reverse mode gradient accumulation
        self.parents[0].grad += gradient / len(self.parents[0].value)
        
        # Back propogate gradients to parent's of each inputs.
        # Stop when a leaf node is reached.
        if not self.parents[0].is_leaf:
             self.parents[0].backward(self.parents[0].grad)

### 1.1.8 Pow

In [10]:
class Pow(Function):
    """
    Implements Power operation on a 1xN input vector.
    Raises the input to an exponent 
    """
    exp = 0
    def forward(self, *args, exp):
        """
        Forward computation of the bianry Power operation.
        Hint: You can use np.power here
        param:
        
        args (n=1 Tensor): Tensor on which Power operation
        is to be applied.
        
        exp (int): Exponent of the power function
        
        returns:
        
        value (ndarray): Result of "Power" operation on
            input args.
        """
        # Extend this nodes' parents to include new inputs. 
        self.parents = list(args)
        self.exp = exp
        # Add the 2 input Tensor's values
        value = np.power(self.parents[0].value, exp)
        return value
        
    def backward(self, gradient):
        """
        Accumulates the gradients for "Power" operation from its'
        child/output and passes them on to self.parent.
        
        param:
        
        gradient (ndarray or scalar): gradient w.r.t output of
        "Pow"
        
        returns:
        
        None
        
        """
        # Accumulate gradient. Notice how the gradients are accumulated
        # and not stored in parents[i].grad directly. This is a key operation
        # in reverse mode gradient accumulation
        
        power = self.exp
        self.parents[0].grad += gradient * power * (np.power(self.parents[0].value,power-1))
        
        # Back propogate gradients to parent's of each inputs.
        # Stop when a leaf node is reached.
        if not self.parents[0].is_leaf:
             self.parents[0].backward(self.parents[0].grad)

### 1.1.8 Dot

In [11]:
class Dot(Function):
    """
    Computes the dot product of the 2 parent inputs.
    """
    def forward(self, *args):
        """
        Forward computation of the bianry Dot operation.
        Hint: You can use np.dot here
        param:
        
        args (n=2 Tensors): Tensors on which Dot operation
        is to be performed.
        
        returns:
        
        value (ndarray): Result of "Dot" operation on
            input args.
        """
        # Extend this nodes' parents to include new inputs. 
        self.parents = list(args)
        # Add the 2 input Tensor's values
        value = np.dot(self.parents[0].value, self.parents[1].value)
        return value

    def backward(self, gradient):
        """
        Accumulates the gradients for "Dot" operation from its'
        children/outputs and passes them on to self.parent.
        
        param:
        
        gradient (ndarray or scalar): gradient w.r.t output of
        "Dot"
        
        returns:
        
        None
        
        """
        # Accumulate gradient. Notice how the gradients are accumulated
        # and not stored in parents[i].grad directly. This is a key operation
        # in reverse mode gradient accumulation
        self.parents[0].grad += gradient * self.parents[1].value
        self.parents[1].grad += gradient * self.parents[0].value
        
        # Back propogate gradients to parent's of each inputs.
        # Stop when a leaf node is reached.
        if not self.parents[0].is_leaf:
             self.parents[0].backward(self.parents[0].grad)

        if not self.parents[1].is_leaf:
            self.parents[1].backward(self.parents[1].grad)

### 1.2 Tensor

Here, we create a new variable type - **Tensor** that implements all the common primitive operations of numeric types such as `+, -, *, /` as well as some composite functions such as `pow`, `sum`, `mean`, `dot`. To facilitate matrix operations, a **Tensor** wraps around NumPy's ndarrays.

In [12]:
class Tensor(object):
    """
    Tensor. A wrapper around NumPy's vectors 
    which supports a backward call.
    
    Uses operations defined in section 1.1 which takes Tensors
    as arguments and returns a new Tensor with reference to the
    operations's gradient function.
    """
    def __init__(self, value, is_leaf = True, grad_fn = None):
        """
        Constructor
        
        Every Tensor which is a result of the operations in section
        1.1 is a non leaf node.
        A non leaf node holds reference to the backward() of a 
        Function class by which the Tensor has been created. This is
        how the computational graph is built.
        
        params:
        
        value (ndarray or scalar): Holds actual data.
        
        is_leaf (bool): Specifies if the Tensor is a leaf node.
        
        grad_fn (Function.backward method): The gradient object which holds forward and 
            backward calls specific to the operation that results in the creation
            of this Tensor. 
        
        member variables:
        
        value (ndarray): value in the arguments to the is converted to
            an ndarray to support numpy vectorization.
        
        """
        
        if grad_fn is None and not is_leaf:
            raise ValueError(
                'Non leaf nodes require a grad_fn.'
            )
        
        if np.isscalar(value):
            value = np.ones(1)*value
        
        if not isinstance(value, np.ndarray):
            raise ValueError(
                'Value should be of type "np.ndarray" or a scalar, but received {type(value)}'
            )
        
        self.value = value
        self.is_leaf = is_leaf
        self.grad_fn = grad_fn
        self.zero_grad()
        
    def zero_grad(self):
        """
        Reset the gradients of this Tensor to 0 taking in consideration
        the dimensions of the data stored by it.
        """
        self.grad = np.zeros(self.value.shape)
    
    def backward(self, gradient):
        """
        Initiates the chain rule on the computational graph.
        """
        self.grad = gradient
        self.grad_fn(self.grad)
    
    def __add__(self, other):
        """
        Overloaded "+" primitive.
        Example:
        c = a+b
        params:

        self (Tensor): denoted by 'a' in the example expression
        other (Tensor): denoted by 'b' in the example expression

        returns:

        Tensor: denoted by 'c' in the above expression.
        """    
        function = Add()
        forward_value = function.forward(self, other)
        
        return Tensor(
            value = forward_value,
            is_leaf = False,
            grad_fn = function.backward
        )
    
    def __sub__(self, other):
        """
        Overloaded "-" primitive.
        Example:
        c = a-b
        params:

        self (Tensor): denoted by 'a' in the example expression
        other (Tensor): denoted by 'b' in the example expression

        returns:

        Tensor: denoted by 'c' in the above expression.
        """
        function = Sub()
        forward_value = function.forward(self, other)
        
        return Tensor(
            value = forward_value,
            is_leaf = False,
            grad_fn = function.backward
        )
    
    def __mul__(self, other):
        """
        Overloaded "*" primitive.
        Example:
        c = a*b
        params:

        self (Tensor): denoted by 'a' in the example expression
        other (Tensor): denoted by 'b' in the example expression

        returns:

        Tensor: denoted by 'c' in the above expression.
        """
        function = Mul()
        forward_value = function.forward(self, other)
        
        return Tensor(
            value = forward_value,
            is_leaf = False,
            grad_fn = function.backward
        )
    
    def __truediv__(self, other):

        """
        Overloaded "/" primitive.
        Example:
        c = a/b
        params:

        self (Tensor): denoted by 'a' in the example expression
        other (Tensor): denoted by 'b' in the example expression

        returns:

        Tensor: denoted by 'c' in the above expression.
        """
        function = Div()
        forward_value = function.forward(self, other)
        
        return Tensor(
            value = forward_value,
            is_leaf = False,
            grad_fn = function.backward
        )
    
    def sum(self):
        """
        "Sum elements of this Tensor"
        Example:
        b = a.sum()
        params:

        self (Tensor): denoted by 'a' in the example expression
        
        returns:

        Tensor: denoted by 'b' in the above expression.
        """
        function = Sum()
        forward_value = function.forward(self)
        
        return Tensor(
            value = forward_value,
            is_leaf = False,
            grad_fn = function.backward
        )
    
    def relu(self):
        """
        "ReLU activation applied on this Tensor"
        Example:
        b = a.relu()
        params:

        self (Tensor): denoted by 'a' in the example expression
        
        returns:

        Tensor: denoted by 'b' in the above expression.
        """
        function = ReLU()
        forward_value = function.forward(self)
        
        return Tensor(
            value = forward_value,
            is_leaf = False,
            grad_fn = function.backward
        )
    
    def mean(self):
        """
        "Mean of element of this Tensor"
        Example:
        b = a.mean()
        params:

        self (Tensor): denoted by 'a' in the example expression
        
        returns:

        Tensor: denoted by 'b' in the above expression.
        """
        function = Mean()
        forward_value = function.forward(self)
        
        return Tensor(
            value = forward_value,
            is_leaf = False,
            grad_fn = function.backward
        )
    
    def pow(self, exp):
        """
        "Raise the value in this Tensor to an exponent"
        Example:
        b = a.pow(exp)
        params:

        self (Tensor): denoted by 'a' in the example expression
        exp (int): denoted by 'exp' in the example expression
        
        returns:

        Tensor: denoted by 'b' in the above expression.
        """
        function = Pow()
        forward_value = function.forward(self, exp=exp)
        
        return Tensor(
            value = forward_value,
            is_leaf = False,
            grad_fn = function.backward
        )
    
        
    def dot(self, other):
        """
        Dot product of 2 Tensors
        Example:
        c = a.dot(b)
        params:

        self (Tensor): denoted by 'a' in the example expression
        other (Tensor): denoted by 'b' in the example expression

        returns:

        Tensor: denoted by 'c' in the above expression.
        """
        function = Dot()
        forward_value = function.forward(self, other)
        
        return Tensor(
            value = forward_value,
            is_leaf = False,
            grad_fn = function.backward
        )
    
    def __repr__(self):
        return 'value: {}, grad: {}, grad_fn = {}'.format(
            self.value, self.grad, self.grad_fn
        )

In [13]:
import unittest

class TestAD(unittest.TestCase):
    
    a = np.array([-0.09944354, -1.00754719, -0.9861803 , -1.19309044,  0.52475517])
    b = np.array([ 0.07470977, -0.40344353,  1.30297429,  1.2697801 ,  0.79662585])
    
    def test_mul(self):
        x = Tensor(self.a)
        y = Tensor(self.b)
        
        res = (x*y).sum()
        res.backward(1.0)
        
        exp = np.array([-1.98283756])
        exp_grad_x = np.array([ 0.07470977, -0.40344353,  1.30297429,  1.2697801, 0.79662585])
        exp_grad_y = np.array([-0.09944354, -1.00754719, -0.9861803,  -1.19309044,  0.52475517])
        
        self.assertTrue(np.allclose(exp, res.value))
        self.assertTrue(np.allclose(exp_grad_x, x.grad))
        self.assertTrue(np.allclose(exp_grad_y, y.grad))
    
    def test_add_sum(self):
        x = Tensor(self.a) 
        y = Tensor(self.b)
        
        res = (x+y).sum()
        res.backward(1.0)
        
        exp = np.array([0.27914019])
        exp_grad_x = np.ones_like(self.a)
        exp_grad_y = np.ones_like(self.b)
        
        self.assertTrue(np.allclose(exp, res.value))
        self.assertTrue(np.allclose(exp_grad_x, x.grad))
        self.assertTrue(np.allclose(exp_grad_y, y.grad))
    
    def test_sub(self):
        x = Tensor(self.a)
        y = Tensor(self.b)
        
        res = (x-y).sum()
        res.backward(1.0)
        
        exp = np.array([-5.80215279])
        exp_grad_x = np.ones_like(self.a)
        exp_grad_y = -np.ones_like(self.b)
        
        self.assertTrue(np.allclose(exp, res.value))
        self.assertTrue(np.allclose(exp_grad_x, x.grad))
        self.assertTrue(np.allclose(exp_grad_y, y.grad))
    
    def test_truediv(self):
        x = Tensor(self.a)
        y = Tensor(self.b)
        
        res = (x/y).sum()
        res.backward(1.0)
        
        exp = np.array([0.12855352])
        
        exp_grad_x = np.asarray([13.38513022, -2.47866164,  0.76747485,  0.78753794,  1.25529444])
        exp_grad_y = np.asarray([17.81647477,  6.19013165,  0.5808776 ,  0.73997378, -0.82689037])
        
        self.assertTrue(np.allclose(exp, res.value))
        self.assertTrue(np.allclose(exp_grad_x, x.grad))
        self.assertTrue(np.allclose(exp_grad_y, y.grad))
        
    def test_pow(self):
        x = Tensor(self.a)
        
        res = (x.pow(5)).sum()
        res.backward(1.0)
        
        exp = np.array([-4.34881576])
        exp_grad_x = np.array([0.00048896, 5.15266121, 4.72928291, 10.13126015, 0.37913764])
        self.assertTrue(np.allclose(exp, res.value))
        self.assertTrue(np.allclose(exp_grad_x, x.grad))
        
    def test_dot(self):
        x = Tensor(self.a)
        y = Tensor(self.b)
        
        res = x.dot(y)
        res.backward(1.0)
        
        exp = np.array([-1.98283756])
        
        exp_grad_x = np.array([ 0.07470977, -0.40344353, 1.30297429, 1.2697801, 0.79662585])
        exp_grad_y = np.array([-0.09944354, -1.00754719, -0.9861803, -1.19309044, 0.52475517])
        
        self.assertTrue(np.allclose(exp, res.value))
        self.assertTrue(np.allclose(exp_grad_x, x.grad))
        self.assertTrue(np.allclose(exp_grad_y, y.grad))
        
    def test_relu(self):
        x = Tensor(self.a)
        y = Tensor(self.b)
        
        res = (x*y).relu().sum()
        res.backward(1.0)
        
        exp = np.array([0.82452193])
        exp_grad_x = np.array([0, -0.40344353, 0, 0, 0.79662585])
        exp_grad_y = np.array([0, -1.00754719, 0, 0, 0.52475517])
        
        self.assertTrue(np.allclose(exp, res.value))
        self.assertTrue(np.allclose(exp_grad_x, x.grad))
        self.assertTrue(np.allclose(exp_grad_y, y.grad))
        
    def test_mean(self):
        x = Tensor(self.a)
        y = Tensor(self.b)
        
        res = (x+y).mean().sum()
        res.backward(1.0)
        
        exp = np.array([0.05582804])
        exp_grad_x = np.array([0.2, 0.2, 0.2, 0.2, 0.2])
        exp_grad_y = np.array([0.2, 0.2, 0.2, 0.2, 0.2])
        
        self.assertTrue(np.allclose(exp, res.value))
        self.assertTrue(np.allclose(exp_grad_x, x.grad))
        self.assertTrue(np.allclose(exp_grad_y, y.grad))
    
suite = unittest.TestLoader().loadTestsFromTestCase(TestAD)
unittest.TextTestRunner().run(suite)

........
----------------------------------------------------------------------
Ran 8 tests in 0.228s

OK


<unittest.runner.TextTestResult run=8 errors=0 failures=0>