In [15]:
import torch

### Tensor
torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.

To stop a tensor from tracking history, you can call .detach() to detach it from the computation history, and to prevent future computation from being tracked.

To prevent tracking history (and using memory), you can also wrap the code block in with torch.no_grad():. This can be particularly helpful when evaluating a model because the model may have trainable parameters with requires_grad=True, but for which we don’t need the gradients.

In [19]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


##### requires_grad: 
This member, if true starts tracking all the operation history and forms a backward graph for gradient calculation. For an arbitrary tensor a It can be manipulated in-place as follows: a.requires_grad_(True).

###### grad: 
grad holds the value of gradient. If requires_grad is False it will hold a None value. Even if requires_grad is True, it will hold a None value unless .backward() function is called from some other node. For example, if you call out.backward() for some variable out that involved x in its calculations then x.grad will hold ∂out/∂x.

##### grad_fn: 
This is the backward function used to calculate the gradient.

#### is_leaf:
A node is leaf if :
1. It was initialized explicitly by some function like x =torch.tensor(1.0) or x = torch.randn(1, 1).
2. It is created after operations on tensors which all have requires_grad = False.
3. It is created by calling .detach() method on some tensor.

##### Backward() function
Backward is the function which actually calculates the gradient by passing it’s argument (1x1 unit tensor by default) through the backward graph all the way up to every leaf node traceable from the calling root tensor. The calculated gradients are then stored in .grad of every leaf node. Remember, the backward graph is already made dynamically during the forward pass. Backward function only calculates the gradient using the already made graph and stores them in leaf nodes

In [26]:
y=x**2
print(y)
print(y.grad_fn)
y.backward(torch.ones_like(x))
print(x.grad)
x.grad.zero_()

tensor([[1., 1.],
        [1., 1.]], grad_fn=<PowBackward0>)
<PowBackward0 object at 0x0000018342B7E1C8>
tensor([[2., 2.],
        [2., 2.]])


tensor([[0., 0.],
        [0., 0.]])

To stop PyTorch from tracking the history and forming the backward graph, the code can be wrapped inside with torch.no_grad(): It will make the code run faster whenever gradient tracking is not needed.

OK, we got gradients, but there is one more thing to pay attention to: by default, PyTorch accumulates the gradients. How to handle that?
zero_
Every time we use the gradients to update the parameters, we need to zero the gradients afterward. And that’s what zero_() is good for.

### Lin reg example

In [39]:
# Step 0 - Initializes parameters "b" and "w" randomly
torch.manual_seed(42)
b = torch.randn(1, requires_grad=True, dtype=torch.float)
w = torch.randn(1, requires_grad=True, dtype=torch.float)
lr = 0.1
x_train_tensor=torch.randn(120,dtype=torch.float)
y_train_tensor=2*x_train_tensor+3
for epoch in range(200):
    # Step 1 - Computes our model's predicted output - forward pass
    yhat = b + w * x_train_tensor
    # Step 2 - Computes the loss
    error = (y_train_tensor - yhat)
    # It is a regression, so it computes mean squared error (MSE)
    loss = (error ** 2).mean()
    # Step 3 - Computes gradients for both "b" and "w" parameters
    # No more manual computation of gradients!
    loss.backward() 
    # Step 4, for real
    with torch.no_grad():
        b -= lr * b.grad
        w -= lr * w.grad
    # This code will be placed after Step 4 (updating the parameters)
    b.grad.zero_(), w.grad.zero_()
print(w)
print(b)

tensor([2.0000], requires_grad=True)
tensor([3.0000], requires_grad=True)


#### paramater update
- b -= lr * b.grad 
- w -= lr * w.grad
##### But
 it turns out we cannot simply perform an update of params! Why not?! It turns out to be a case of “too much of a good thing”. The culprit is PyTorch’s ability to build a dynamic computation graph from every Python operation that involves any gradient-computing tensor or its dependencies

#####     no_grad
So, how do we tell PyTorch to “back off” and let us update our parameters without messing up with its fancy dynamic computation graph? That’s what torch.no_grad() is good for. It allows us to perform regular Python operations on tensors, without affecting PyTorch’s computation graph
This time, the update will work as expected:

- with torch.no_grad():
    - b -= lr * b.grad
    - w -= lr * w.grad

### another example of using autograd

In [65]:
a=torch.ones(1,requires_grad=True)
w1=torch.tensor(1)
w2=torch.tensor(2)
w3=torch.tensor(3)
w4=torch.tensor(4)

In [66]:
b = w1 * a
c = w2 * a 
d = (w3 * b) + (w4 * c)
L = d**2

![computational graph example](https://miro.medium.com/max/875/1*40LF-3EKdsZsbTP5JmzVjQ.png)

In [67]:
L.backward()
a.grad

tensor([242.])

#### ressources
* https://towardsdatascience.com/pytorch-autograd-understanding-the-heart-of-pytorchs-magic-2686cd94ec95
* https://www.youtube.com/watch?v=MswxJw-8PvE
* https://medium.com/@ODSC/automatic-differentiation-in-pytorch-6131b4581cdf
* https://towardsdatascience.com/getting-started-with-pytorch-part-1-understanding-how-automatic-differentiation-works-5008282073ec