# Examples of Built-In Gradient Descent

## Gradient Calculations
Do not follow the video, but read from
https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

### 1- Scalar domain
Given a function with a scalar domain y:R^3 --> R, y = y(x), i.e., with a scalar domain R, (e.g., y = mean(x)=(x1+x2+x3)/3), 
Compute the gradient of y, dy/dx = (dy/dx1,dy/dx2,dy/dx3)

In [10]:
# 1-Firstly, defining x, requires_grad argument must be set True:
x = torch.tensor([1.,1.,1.], requires_grad=True)
print(x)
# 2-Secondly, define y
y = x.mean()
print(y)
# 3-Third, use back propogation to compute the gradient of y
y.backward()
#4-Fourth, As pytorch stores the gradient in the grad argument of x
# read it as:
print(x.grad)

tensor([1., 1., 1.], requires_grad=True)
tensor(1., grad_fn=<MeanBackward0>)
tensor([0.3333, 0.3333, 0.3333])


### 2 - Non-Scalar Domain
Given a non-scalar fnction y:R^3 --> R^3, y = y(x), i.e., with a  non-scalar domain R^3, (e.g., y = x+2), 
Compute the gradient of y, dy/dx = (dy/dx1,dy/dx2,dy/dx3), 
scaled at a given tensor v, i.e., component-wise product of v and dy/dx

In [11]:
# via specifying the gradient argument of backward: 
x = torch.tensor([1.,1.,1.], requires_grad=True)
y = 2*x+2
v = torch.tensor([2.0,1.0,1.0])
y.backward(gradient =v)
print(x.grad)

tensor([4., 2., 2.])


### 3 - Chain Rule
Given a  combination of functions mean(z(y(x))), 
compute d mean/dx

In [12]:
import torch

In [22]:
x=torch.rand(3,requires_grad=True)
print("x ",x)
y = x+2
print("y ", y)
z = y**2+2
print("z ",z)
z = z.mean()
print("zmean ", z)
z.backward() # computes dz/dx = dz/dy * dy/dx
print(x.grad)

x  tensor([0.7171, 0.2278, 0.8268], requires_grad=True)
y  tensor([2.7171, 2.2278, 2.8268], grad_fn=<AddBackward0>)
z  tensor([9.3826, 6.9632, 9.9909], grad_fn=<AddBackward0>)
zmean  tensor(8.7789, grad_fn=<MeanBackward0>)
tensor([1.8114, 1.4852, 1.8845])


In [27]:
# Prevent tracking computed gradients
# solution 1: requires_grad set to false
print(x)
x.requires_grad_(False)
print(x)

#solution 2: detach
xd = x.detach()
print(xd)

#solution 3: no_grad function
with torch.no_grad():
    y = x+2
    print(y)

tensor([0.7171, 0.2278, 0.8268])
tensor([0.7171, 0.2278, 0.8268])
tensor([0.7171, 0.2278, 0.8268])
tensor([2.7171, 2.2278, 2.8268])


In [33]:
# each time running backward will accommulate/add values to 
# the previous value of .grad
#IMPORTANT FOR TRAINING STEPS
weights = torch.ones(4, requires_grad=True)
for epoch in range(3):
    model_output = (weights*3).sum()
    model_output.backward()
    print(weights.grad)
    weights.grad.zero_() # this prevents accummulation of values 

tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])


In [47]:
# Example of preventing accummulation from
# optimizer from Stochastic Gradient Descent
weights = torch.ones(4, requires_grad=True)
#print(weights)
optimizer = torch.optim.SGD(weights, lr=0.01)
optimizer.step()
optimizer.zero_grad()

TypeError: params argument given to the optimizer should be an iterable of Tensors or dicts, but got torch.FloatTensor

## Backpropogation
T Know: Computation Graph from https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
Loss Function's gradient is usually computed, where 
y_hat = w *x is a linear regression
s = y_hat - y
loss = s**2

In [53]:
x = torch.tensor(1.)
y = torch.tensor(2.)
w = torch.tensor(1., requires_grad=True) # for weight we'd like grad

#forward pass and compute loss
y_hat = w * x
loss = (y_hat -y)**2
print(loss)

# backward pass
loss.backward()
print(w.grad)

tensor(1., grad_fn=<PowBackward0>)
tensor(-2.)
