#  Automatic Differentiation

Backward propagation is the commonly used algorithm for training neural networks. <br>In this algorithm, parameters (model weights) are adjusted based on a gradient of a loss function for a given parameter. <br>The first-order derivative method of MindSpore is mindspore.ops.GradOperation (get_all=False, get_by_list=False, sens_param=False). <br>When get_all is set to False, the first input derivative is computed. <br>When get_all is set to True, all input derivatives are computed. <br>When get_by_list is set to False, weight derivatives are not computed. <br>When get_by_list is set to True, weight derivatives are computed. sens_param scales the output value of the network to change the final gradient. <br>The following uses the MatMul operator derivative for in-depth analysis

## Step 1 Compute the first-order derivative of the input. 
To compute the input derivative, you need to define a network requiring a derivative. <br>The following uses a network f(x,y)=z∗x∗y formed by the MatMul operator as an example.

In [2]:
import numpy as np 
import mindspore.nn as nn 
import mindspore.ops as ops 
from mindspore import Tensor 
from mindspore import ParameterTuple, Parameter 
from mindspore import dtype as mstype 

class Net(nn.Cell): 
    def __init__(self): 
        super(Net, self).__init__() 
        self.matmul = ops.MatMul() 
        self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z') 
        
    def construct(self, x, y): 
        x = x * self.z 
        out = self.matmul(x, y) 
        return out 
            
class GradNetWrtX(nn.Cell): 
    def __init__(self, net): 
        super(GradNetWrtX, self).__init__() 
        self.net = net 
        self.grad_op = ops.GradOperation() 
        
    def construct(self, x, y): 
        gradient_function = self.grad_op(self.net) 
        return gradient_function(x, y) 

x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32) 
y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32) 

output = GradNetWrtX(Net())(x, y) 

print(output)

[[4.5099998 2.7       3.6000001]
 [4.5099998 2.7       3.6000001]]


## Step 2 Compute the first-order derivative of the weight. 
To compute weight derivatives, you need to set get_by_list in ops.GradOperation to True. <br>If computation of certain weight derivatives is not required, set requires_grad to False when you definite the network.

In [3]:
class GradNetWrtX(nn.Cell): 
    def __init__(self, net): 
        super(GradNetWrtX, self).__init__() 
        self.net = net 
        self.params = ParameterTuple(net.trainable_params()) 
        self.grad_op = ops.GradOperation(get_by_list=True) 
        
    def construct(self, x, y):
        gradient_function = self.grad_op(self.net, self.params) 
        return gradient_function(x, y) 

output = GradNetWrtX(Net())(x, y) 
print(output)

(Tensor(shape=[1], dtype=Float32, value= [ 2.15359993e+01]),)
