<a href="https://colab.research.google.com/github/hezhengda/google-colab/blob/main/Pytorch_for_deep_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basics of Pytorch

**Traditional programming**: input + algorithm --> output

**New paradigm**: input + output --> algorithm (if we cannot write the algorithm and there are many outliers)



## Pytorch jargons

**Tensor**: A tensor can be a number, a vector, a matrix or any n-dimensional array (so it is just another name for n-dimensional array [NDA])

The difference between tensor and n-dimensional array:
* The size of tensor should not change
* When you try to do a tensor operation, do it in the tensor type because it is implemented in C++ (which means really fast), but if you do it in python, then it is really slow.

In [1]:
# import pytorch
import torch

In [9]:
# set a number
a = torch.tensor(4.0)
print(a,'|', a.dtype,'|', a.shape)

# set a vector
b = torch.tensor([1, 2, 3, 4])
print(b,'|',  b.dtype,'|',  b.shape)

# set a matrix
c = torch.tensor([[1, 2], [3, 4], [5, 6]])
print(c,'|',  c.dtype,'|',  c.shape)

tensor(4.) | torch.float32 | torch.Size([])
tensor([1, 2, 3, 4]) | torch.int64 | torch.Size([4])
tensor([[1, 2],
        [3, 4],
        [5, 6]]) | torch.int64 | torch.Size([3, 2])


## Create tensors

The `requires_grad` is important to understand. This is used for improving the efficiency of your code, because when your code gets more complex, the time on calculating gradients becomes larger, so we need to find a way to tell the program which gradient we need and which we needn't, if we set `requires_grad=True` to a variable, that means we want to have its derivative. 

In [10]:
x = torch.tensor(3.)
w = torch.tensor(4., requires_grad=True) # what's the meaning of this "requires_grad"?
b = torch.tensor(5., requires_grad=True)

In [13]:
# Arithmatic operations
y = w * x + b
print('{}, {}'.format(y, y.requires_grad))

17.0, True


In [15]:
# compute derivatives --> for gradient descent
y.backward()

In [17]:
# display the gradients
print('dy/dx:', x.grad) # output None because x doesn't have "requires_grad"
print('dy/dw:', w.grad) # 3 because x = 3
print('dy/db:', b.grad) # 1 because the coefficient of b is 1

dy/dx: None
dy/dw: tensor(3.)
dy/db: tensor(1.)


## Questions

**Problems**
Now let's try to solve some questions:

1. What if in $y=w\times x+b$, the `w`, `x` and `b` are all 1-dimensional array? What about two-dimensional array? What will `w.grad` looks like?

  * When the tensor is 1D tensor, there is no gradient ?

2. What if we have the **chain role**, which means: $y = w\times x + b$, $w = m\times p + q$? What will happen if we want to calcualte `p.grad`?

  * If you make sure that you have the leaf node, then the gradient is correct, but if not, then you need to use other settings, e.g. `retain_grad()` on the non-leaf tensor.

In [40]:
# Problem 1 - tensor operation is by element
x = torch.tensor([1.0, 2.0])
w = torch.tensor([1.0, 2.0], requires_grad=True)
b = torch.tensor([1.0, 2.0], requires_grad=True)
y = w * x + b
print(y)
print('dy/dw:', w.grad)

tensor([2., 6.], grad_fn=<AddBackward0>)
dy/dw: None


In [37]:
# Problem 2 - chain role 
x = torch.tensor(1.0)
w = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(5.0, requires_grad=True)

c = w + b 
c.retain_grad() # since c is a non-leaf node, then you need to use 
y = c * w * x + b

# calculate gradient
y.backward()
print('dy/dw: ', w.grad)
print('dy/dc: ', c.grad)

dy/dw:  tensor(9.)
dy/dc:  tensor(2.)


## Interoperability with Numpy

If you want to change numpy array to tensor, you need to use `torch.from_numpy(array)`.

Convert from pytorch tensor to numpy array: `y.numpy()`

In [18]:
# numpy is written in C++, which means this is very efficient
import numpy as np

In [23]:
x = np.array([[1, 2],[3, 4]])
y = torch.from_numpy(x)
print(y, y.dtype)
z = y.numpy()
print(z, z.dtype)

tensor([[1, 2],
        [3, 4]]) torch.int64
[[1 2]
 [3 4]] int64


# Linear Regression of PyTorch

Linear Regression means that we want to rationalize the data by using a linear model: $y = wx+b$. (w: weights, b: bias) We know all the $(x,y)$, now we want to find the optimal coefficient $(m,b)$. Notice that in here, `w` could be a matrix and `b` could also be a vector when `x` and `y` are vectors.

This is a really simple assumption and usually it is the first step of our modeling process.

Our first problem is to predict the yield of apple and orange by giving a certain temperature, rainfall and humidity.

## Set inputs and targets

In [154]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70]], dtype='float32')

In [155]:
# Targets (yield_apples, yield_oranges)
targets = np.array([[56, 70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119]], dtype='float32')

The reason why we seperate the input variables and target variables is because we need to treat them differently in our program. Also we've created numpy arrays, because this is typically how you would work with training data: read some CSV files as numpy arrays, do some processing, and then convert them to Pytorch tensors to do further training.

In [156]:
# convert numpy array for torch tensors
inputs_torch = torch.from_numpy(inputs)
targets_torch = torch.from_numpy(targets)

In [157]:
# start the value of weight and bias from a random value
# since we have 2 targets and 3 inputs, then our weight should be a 2x3 matrix
# our bias should have same size as our targets
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(1, 2, requires_grad=True)
print(w)
print(b)

tensor([[ 1.3066,  0.9426, -1.5872],
        [-1.2522,  1.0184, -1.1143]], requires_grad=True)
tensor([[0.3716, 0.0434]], requires_grad=True)


When our weight is a matrix, then our linear regression can be written as:
$$y = x \cdot w^T + b$$

The size of our weight matrix is related to the number of features and the number of outputs.

**Make sure that the dimension of your matrix is correct! Really important!**

## Build our model

In this case it is a linear regression model, but in the future, the model can be really complex (e.g. A neural network model)

In [148]:
# define the model
def model(x):
  return x @ w.t() + b # @ means matrix multiplication

In [149]:
# predictions of our model 
preds = model(inputs_torch)
print(preds) # really, really bad ... right now ...

tensor([[100.3761,  17.6190],
        [133.1410,  10.2199],
        [ 80.8693, -55.7864],
        [156.0216, 105.1184],
        [101.3740, -43.1640]])


In [150]:
print(targets_torch)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Calculate the loss function

The **mean-square error** can be defined as:

$$mse=\frac{1}{N}\sum_{i=1}^N(t_1^i-t_2^i)^2$$

In [141]:
# define the mean-square error
def mse(t1, t2):
  diff = t1 - t2
  return torch.sum(diff*diff)/diff.numel() # diff*diff: element-wise multiplication; diff.numel() number of elements in diff

## Calculate the gradient --> Gradient Descent

In [None]:
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(1, 2, requires_grad=True)
threshold = 10
lr = 1e-5 # learning_rate
ind = 0 #
preds = model(inputs_torch)
loss = mse(preds, targets_torch)
print(loss)
while loss > threshold:
    new_preds = model(inputs_torch)
    loss = mse(new_preds, targets_torch)
    loss.backward()
    with torch.no_grad():
        w -= lr * w.grad
        b -= lr * b.grad
        w.grad.zero_()
        b.grad.zero_()
    ind += 1
    print('Epoch {} ... loss = {}'.format(ind, loss))

In [212]:
preds = model(inputs_torch)
print(preds)
print(targets_torch)

tensor([[ 58.2112,  69.5565],
        [ 81.6339, 100.3739],
        [118.3174, 134.8474],
        [ 26.1825,  31.0243],
        [ 98.1191, 122.4509]], grad_fn=<AddBackward0>)
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In here I need to say: in pytorch, `a = a - 1` is not the same as `a -= 1`.

* `a = a - 1` means first we evaluate `a-1`, then we create another tensor `a`, which has different reference number 

* `a -= 1` means we only evaluate `a-1`, but the reference for `a` is the same.

In [206]:
a = torch.tensor([1, 2])
print('original id = {}'.format(id(a)))
b = torch.tensor([2, 2])
a -= b
print('After a = a - b, id = {}'.format(id(a)))

original id = 139940533485432
After a = a - b, id = 139940533485432
