# PyTorch
> referenced by CS224N, ECCS498

[PyTorch](https://pytorch.org/) is an open source machine learning framework. At its core, PyTorch provides a few key features:

- A multidimensional **Tensor** object, similar to [numpy](https://numpy.org/) but with GPU accelleration.
- An optimized **autograd** engine for automatically computing derivatives
- A clean, modular API for building and deploying **deep learning models**

You can find more information about PyTorch by following one of the [oficial tutorials](https://pytorch.org/tutorials/) or by [reading the documentation](https://pytorch.org/docs/stable/).



In [1]:
import torch

## Tensor

A `torch` **tensor** is
- a multidimensional grid of values
- all of the same type
- indexed by a tuple of nonnegative integers.

The number of dimensions is the **rank** of the
tensor; the **shape** of a tensor is a tuple of integers giving the size of the array along each dimension. Accessing an element from a PyTorch tensor returns a PyTorch scalar; we can convert this to a Python scalar using the `.item()` method:

### Tensor constructors & Datatypes

In [2]:
# From a Python List
python_list = [[0, 1], [2, 3],[4, 5]]

# From a NumPy Array
import numpy as np
ndarray = np.array([[4, 5, 6], [7, 8, 9]])

print(f"Python List to Tensor : \n{torch.Tensor(python_list)}")
print(f"\nNumpy Array to Tensor : \n{torch.from_numpy(ndarray)}")

Python List to Tensor : 
tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])

Numpy Array to Tensor : 
tensor([[4, 5, 6],
        [7, 8, 9]])


In [3]:
# Same as 'torch.zeros(3,2)'
shape = (3,2)

# Creates a tensor of all zeros
zeros_tensor = torch.zeros(shape)

# Creates a tensor of all ones
ones_tensor = torch.ones(shape)

# Creates a 3x3 identity matrix
eye_tensor = torch.eye(3)

# Creates a tensor of random values
# sampled from a uniform distribution between 0 and 1
rand_tensor = torch.rand(shape)
# sampled from a normal distribution
randn_tensor = torch.randn(shape)

# Create a tensor with values 0-9
x = torch.arange(10)

# Create a tensor with value 3.14
x = torch.full(shape, 3.14)

In [4]:
data = [[0, 1], [2, 3],[4, 5]]

# 'torch.Tensor' defaults to float32
tensor1 = torch.Tensor(data)
tensor2 = torch.tensor(data, dtype=torch.float32)
tensor3 = torch.tensor(data, dtype=torch.float)
tensor4 = torch.FloatTensor(data)

# Cast a tensor to another datatype using the '.to()' method
# '.float()' and '.long()' : cast to particular datatypes
tensor5 = tensor1.to(torch.float64)
tensor6 = tensor1.long()

print(f"tensor1.dtype : {tensor1.dtype}")
print(f"tensor2.dtype : {tensor2.dtype}")
print(f"tensor3.dtype : {tensor3.dtype}")
print(f"tensor4.dtype : {tensor4.dtype}")
print(f"tensor5.dtype : {tensor5.dtype}")
print(f"tensor6.dtype : {tensor6.dtype}")

tensor1.dtype : torch.float32
tensor2.dtype : torch.float32
tensor3.dtype : torch.float32
tensor4.dtype : torch.float32
tensor5.dtype : torch.float64
tensor6.dtype : torch.int64


PyTorch provides several ways to create a tensor with the same datatype as another tensor:

- `torch.zeros_like()` : Create new tensors with the same shape and type as a given tensor
- `.new_zeros()` : Create tensors the same type but possibly different shapes

In [5]:
x = torch.tensor([[0, 1], [2, 3],[4, 5]], dtype=torch.float64)

# Create new tensors with the same shape and type as a given tensor
zeros_like = torch.zeros_like(x)
ones_like = torch.ones_like(x)
rand_like = torch.rand_like(x)
randn_like = torch.randn_like(x)

# Create tensors the same type but possibly different shapes
new_zeros = x.new_zeros(4, 5)
new_ones = x.new_ones(4,5)

### Tensor Indexing
PyTorch provides many other ways of indexing into tensors. Getting comfortable with these different options makes it easy to modify different parts of tensors with ease.

#### Slice indexing

PyTorch tensors can be **sliced** using the syntax `start:stop` or `start:stop:step`. The `stop` index is always non-inclusive:
Start and stop indices can be negative, in which case they count backward from the end of the tensor.

In [6]:
a = torch.tensor([0, 11, 22, 33, 44, 55, 66])
print(0, a)        # (0) Original tensor
print(1, a[2:5])   # (1) Elements between index 2 and 5
print(2, a[2:])    # (2) Elements after index 2
print(3, a[:5])    # (3) Elements before index 5
print(4, a[:])     # (4) All elements
print(5, a[1:5:2]) # (5) Every second element between indices 1 and 5
print(6, a[:-1])   # (6) All but the last element
print(7, a[-1:])   # (7) list of the last element
print(7, a[-4::2]) # (8) Every second element, starting from the fourth-last

0 tensor([ 0, 11, 22, 33, 44, 55, 66])
1 tensor([22, 33, 44])
2 tensor([22, 33, 44, 55, 66])
3 tensor([ 0, 11, 22, 33, 44])
4 tensor([ 0, 11, 22, 33, 44, 55, 66])
5 tensor([11, 33])
6 tensor([ 0, 11, 22, 33, 44, 55])
7 tensor([66])
7 tensor([33, 55])


In [7]:
mat = torch.tensor([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(f'Original tensor: \n{mat}')

# Get single row
# mat[1, :] same as mat[1]
print(f'\nnmat[1, :] => {mat[1, :]} : {mat[1, :].shape}')
print(f'mat[1:2, :] => {mat[1:2, :]} : {mat[1:2, :].shape}')

# Get single column
# Same as mat[:,1]
print(f'\nmat[:, 1] => {mat[:, 1]} : {mat[:, 1].shape}')
print(f'mat[:, 1:2] => \n{mat[:, 1:2]} : {mat[:, 1:2].shape}')

# Get the first two rows and the last three columns
print(f'\nmat[:2, -3:] : \n{mat[:2, -3:]}')

# Get every other row, and columns at index 1 and 2
print(f'\nmat[::2, 1:3] : \n{mat[::2, 1:3]}')

Original tensor: 
tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])

nmat[1, :] => tensor([5, 6, 7, 8]) : torch.Size([4])
mat[1:2, :] => tensor([[5, 6, 7, 8]]) : torch.Size([1, 4])

mat[:, 1] => tensor([ 2,  6, 10]) : torch.Size([3])
mat[:, 1:2] => 
tensor([[ 2],
        [ 6],
        [10]]) : torch.Size([3, 1])

mat[:2, -3:] : 
tensor([[2, 3, 4],
        [6, 7, 8]])

mat[::2, 1:3] : 
tensor([[ 2,  3],
        [10, 11]])


Slicing a tensor returns a **view** into the same data, so modifying it will also modify the original tensor. To avoid this, you can use the `clone()` method to make a copy of a tensor.

In [8]:
a = torch.tensor([[1, 2], [3, 4], [5, 6]])
b = a[:, 1]
c = a[:, 1].clone()
print(f"a : \n{a}")
print(f"b : {b}")
print(f"c : {c}")

a[0, 1] = 20  # a[0, 1] and b[0] point to the same element
b[1] = 30     # b[1] and a[1, 1] point to the same element
c[2] = 40     # c is a clone, so it has its own data
print('\nAfter mutating:')
print(f"a : \n{a}")
print(f"b : {b}")
print(f"c : {c}")

a : 
tensor([[1, 2],
        [3, 4],
        [5, 6]])
b : tensor([2, 4, 6])
c : tensor([2, 4, 6])

After mutating:
a : 
tensor([[ 1, 20],
        [ 3, 30],
        [ 5,  6]])
b : tensor([20, 30,  6])
c : tensor([ 2,  4, 40])


we can also use slicing to **modify** subtensors by writing assignment expressions where the left-hand side is a slice expression, and the right-hand side is a constant or a tensor of the correct shape:

In [9]:
a = torch.zeros(2, 4, dtype=torch.int64)
a[:, :2] = 1
a[:, 2:] = torch.tensor([[2, 3], [4, 5]])
print(a)

tensor([[1, 1, 2, 3],
        [1, 1, 4, 5]])


#### Integer tensor indexing

When you index into torch tensor using slicing, the resulting tensor view will always be a subarray of the original tensor.

In [10]:
mat = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print('Original tensor:')
print(mat)

# index arrays can be Python lists of integers
idx = [0, 0, 2, 1, 1]
print(f'\nidx => {idx}')
print(f'mat[idx] : \n{mat[idx]}')


# Index arrays can be int64 torch tensors
idx = torch.tensor([3, 2, 1, 0])
print(f'\nidx => {idx}')
print(f'mat[:, idx] : \n{mat[:, idx]}')

Original tensor:
tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])

idx => [0, 0, 2, 1, 1]
mat[idx] : 
tensor([[ 1,  2,  3,  4],
        [ 1,  2,  3,  4],
        [ 9, 10, 11, 12],
        [ 5,  6,  7,  8],
        [ 5,  6,  7,  8]])

idx => tensor([3, 2, 1, 0])
mat[:, idx] : 
tensor([[ 4,  3,  2,  1],
        [ 8,  7,  6,  5],
        [12, 11, 10,  9]])


We can for example use this to get or set the diagonal of a tensor:

In [11]:
a = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print('Original tensor:')
print(a)

idx = [0, 1, 2]
print(f"\nidx => {idx}")
print('Get the diagonal:')
print(a[idx, idx])

# Modify the diagonal
a[idx, idx] = torch.tensor([11, 22, 33])
print('\nAfter setting the diagonal:')
print(a)

Original tensor:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

idx => [0, 1, 2]
Get the diagonal:
tensor([1, 5, 9])

After setting the diagonal:
tensor([[11,  2,  3],
        [ 4, 22,  6],
        [ 7,  8, 33]])


#### One-hot Vector

A one-hot vector for an integer $n$ is a vector that has a one in its $n$th slot, and zeros in all other slots. One-hot vectors are commonly used to represent categorical variables in machine learning models. Now creates a matrix of **one-hot vectors** from a list of Python integers.

In [12]:
idx = [1, 4, 3, 2]
one_hot = torch.zeros((len(idx), max(idx)+1), dtype=torch.float32)
one_hot[range(len(idx)), idx] = 1
print(f"One-hot Vector : \n{one_hot}")

One-hot Vector : 
tensor([[0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 1., 0.],
        [0., 0., 1., 0., 0.]])


#### Masking

Boolean tensor indexing lets you pick out arbitrary elements of a tensor according to a boolean mask. Frequently this type of indexing is used to select or modify the elements of a tensor that satisfy some condition.

In [13]:
mat = torch.tensor([[1,2], [3, 4], [5, 6]])
print('Original tensor:')
print(mat)

# Find the elements of a that are bigger than 3.
# The mask has the same shape as mat
mask = (mat > 3)
print('\nMask tensor:')
print(mask)

# We can use the mask to construct a rank-1 tensor containing the elements of mat
# that are selected by the mask
print('\nSelecting elements with the mask:')
print(mat[mask])

# Set all elements <= 3 to zero:
mat[mat <= 3] = 0
print('\nAfter modifying with a mask:')
print(mat)

Original tensor:
tensor([[1, 2],
        [3, 4],
        [5, 6]])

Mask tensor:
tensor([[False, False],
        [False,  True],
        [ True,  True]])

Selecting elements with the mask:
tensor([4, 5, 6])

After modifying with a mask:
tensor([[0, 0],
        [0, 4],
        [5, 6]])


## Reshaping operations

PyTorch provides many ways to manipulate the shapes of tensors. The simplest example is [`.view()`](https://pytorch.org/docs/stable/generated/torch.Tensor.view.html): This returns a new tensor with the same number of elements as its input, but with a different shape.

We can also use `torch.reshape()` method for a similar purpose. There is a subtle difference between `reshape()` and `view()`: `view()` requires the data to be stored contiguously in the memory. You can refer to [this](https://stackoverflow.com/questions/49643225/whats-the-difference-between-reshape-and-view-in-pytorch) StackOverflow answer for more information. In simple terms, contiguous means that the way our data is laid out in the memory is the same as the way we would read elements from it. This happens because some methods, such as `transpose()` and `view()`, do not actually change how our data is stored in the memory. They just change the meta information about out tensor, so that when we use it we will see the elements in the order we expect.

`reshape()` calls `view()` internally if the data is stored contiguously, if not, it returns a copy. The difference here isn't too important for basic tensors, but if you perform operations that make the underlying storage of the data non-contiguous (such as taking a transpose), you will have issues using `view()`. If you would like to match the way your tensor is stored in the memory to how it is used, you can use the `contiguous()` method.  

In [14]:
x = torch.tensor([[1, 2], [3, 4], [5, 6]])

# Change the shape of x to be 2x3
x_view = x.view(2, 3)
x_reshaped = torch.reshape(x, (2, 3))

# Change the shape of x to be 3x*
x_view = x_view.view(3, -1)
x_reshaped = torch.reshape(x, (3, -1))

As a convenience, calls to `.view()` may include a single -1 argument; this puts enough elements on that dimension so that the output has the same number of elements as the input. This makes it easy to write some reshape operations in a way that is agnostic to the shape of the tensor:

In [15]:
def flatten(x):
    return x.view(-1)

def make_row_vec(x):
    return x.view(1, -1)

In [16]:
x = torch.tensor([[1, 2], [3, 4], [5, 6]])
print(f'Original tensor : {x.shape}')
print(x)

x_flat = flatten(x)
print(f'\nx_flat : {x_flat.shape}')
print(x_flat)

x_row = make_row_vec(x)
print(f'\nx_row : {x_row.shape}')
print(x_row)

Original tensor : torch.Size([3, 2])
tensor([[1, 2],
        [3, 4],
        [5, 6]])

x_flat : torch.Size([6])
tensor([1, 2, 3, 4, 5, 6])

x_row : torch.Size([1, 6])
tensor([[1, 2, 3, 4, 5, 6]])


As its name implies, a tensor returned by `.view()` shares the same data as the input, so changes to one will affect the other and vice-versa:

In [17]:
x = torch.tensor([[1, 2], [3, 4], [5, 6]])
x[0, 0] = 10   # x[0, 0] and x_flat[0] point to the same data
x_flat[1] = 20 # x_flat[1] and x[0, 1] point to the same data

print('x after modifying:')
print(x)
print('\nx_flat after modifying:')
print(x_flat)

x after modifying:
tensor([[10,  2],
        [ 3,  4],
        [ 5,  6]])

x_flat after modifying:
tensor([ 1, 20,  3,  4,  5,  6])


In [18]:
# Add a new dimension of size 1 at the 1st dimension
x_unsqueezed = x.unsqueeze(1)
print(f"x_unsqueezed.shape : {x_unsqueezed.shape}")

# Squeeze the dimensions of x by getting rid of all the dimensions with 1 element
x_squeezed = x_unsqueezed.squeeze()
print(f"x_squeezed.shape : {x_squeezed.shape}")

x_unsqueezed.shape : torch.Size([3, 1, 2])
x_squeezed.shape : torch.Size([3, 2])


### Transposing a Matrix

Another common reshape operation you might want to perform is transposing a matrix. You might be surprised if you try to transpose a matrix with .view(): The view() function takes elements in row-major order, so you cannot transpose matrices with .view().

In [19]:
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
print('Original matrix:')
print(x)
print('\nTransposing with view DOES NOT WORK!')
print(x.view(3, 2))
print('\nTransposed matrix:')

# Same as torch.t(x)
print(x.t())

Original matrix:
tensor([[1, 2, 3],
        [4, 5, 6]])

Transposing with view DOES NOT WORK!
tensor([[1, 2],
        [3, 4],
        [5, 6]])

Transposed matrix:
tensor([[1, 4],
        [2, 5],
        [3, 6]])


In [20]:
# For tensors with more than two dimensions,

# Create a tensor of shape (2, 3, 4)
x = torch.tensor([
     [[1,  2,  3,  4],
      [5,  6,  7,  8],
      [9, 10, 11, 12]],
     [[13, 14, 15, 16],
      [17, 18, 19, 20],
      [21, 22, 23, 24]]])
print(f'Original tensor : {x.shape}')
print(x)

# Swap axes 1 and 2; shape is (2, 4, 3)
x1 = x.transpose(1, 2)
print(f'\nSwap axes 1 and 2 : {x1.shape}')
print(x1)

# Permute axes; the argument (1, 2, 0) means:
# - Make the old dimension 1 appear at dimension 0;
# - Make the old dimension 2 appear at dimension 1;
# - Make the old dimension 0 appear at dimension 2
# This results in a tensor of shape (3, 4, 2)
x2 = x.permute(1, 2, 0)
print(f'\nPermute axes : {x2.shape}')
print(x2)

Original tensor : torch.Size([2, 3, 4])
tensor([[[ 1,  2,  3,  4],
         [ 5,  6,  7,  8],
         [ 9, 10, 11, 12]],

        [[13, 14, 15, 16],
         [17, 18, 19, 20],
         [21, 22, 23, 24]]])

Swap axes 1 and 2 : torch.Size([2, 4, 3])
tensor([[[ 1,  5,  9],
         [ 2,  6, 10],
         [ 3,  7, 11],
         [ 4,  8, 12]],

        [[13, 17, 21],
         [14, 18, 22],
         [15, 19, 23],
         [16, 20, 24]]])

Permute axes : torch.Size([3, 4, 2])
tensor([[[ 1, 13],
         [ 2, 14],
         [ 3, 15],
         [ 4, 16]],

        [[ 5, 17],
         [ 6, 18],
         [ 7, 19],
         [ 8, 20]],

        [[ 9, 21],
         [10, 22],
         [11, 23],
         [12, 24]]])


### Contiguous tensors

Some combinations of reshaping operations will fail with cryptic errors. The exact reasons for this have to do with the way that tensors and views of tensors are implemented, and are beyond the scope of this assignment. However if you're curious, [this blog post by Edward Yang](http://blog.ezyang.com/2019/05/pytorch-internals/) gives a clear explanation of the problem.

What you need to know is that you can typically overcome these sorts of errors by either by calling [`.contiguous()`](https://pytorch.org/docs/stable/generated/torch.Tensor.contiguous.html) before `.view()`, or by using [`.reshape()`](https://pytorch.org/docs/stable/generated/torch.reshape.html) instead of `.view()`.

In [21]:
x = torch.randn(2, 3, 4)

try:
  # This sequence of reshape operations will crash
  x1 = x.transpose(1, 2).view(8, 3)
except RuntimeError as e:
  print(type(e), e)

# We can solve the problem using either .contiguous() or .reshape()
x1 = x.transpose(1, 2).contiguous().view(8, 3)
x2 = x.transpose(1, 2).reshape(8, 3)
print('x1 shape: ', x1.shape)
print('x2 shape: ', x2.shape)

<class 'RuntimeError'> view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
x1 shape:  torch.Size([8, 3])
x2 shape:  torch.Size([8, 3])


### Exercise
Given the 1-dimensional input tensor `x` containing the numbers 0 through 23 in order, it should the following output tensor `y` of shape `(3, 8)` by using reshape operations on x:


```
y = tensor([[ 0,  1,  2,  3, 12, 13, 14, 15],
            [ 4,  5,  6,  7, 16, 17, 18, 19],
            [ 8,  9, 10, 11, 20, 21, 22, 23]])
```

In [22]:
x = torch.arange(24)
print('Here is x:')
print(x)

y = x.view(-1, 3, 4)
print('\nHere is y:')
print(y)

y = y.permute(1, 0, 2)
print('\nHere is y:')
print(y)

y = y.reshape(3, 8)
print('\nHere is y:')
print(y)

Here is x:
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20, 21, 22, 23])

Here is y:
tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

Here is y:
tensor([[[ 0,  1,  2,  3],
         [12, 13, 14, 15]],

        [[ 4,  5,  6,  7],
         [16, 17, 18, 19]],

        [[ 8,  9, 10, 11],
         [20, 21, 22, 23]]])

Here is y:
tensor([[ 0,  1,  2,  3, 12, 13, 14, 15],
        [ 4,  5,  6,  7, 16, 17, 18, 19],
        [ 8,  9, 10, 11, 20, 21, 22, 23]])


## Tensor operations

### Elementwise operations
Basic mathematical functions operate elementwise on tensors, and are available as operator overloads, as functions in the `torch` module, and as instance methods on torch objects; all produce the same results:

In [23]:
x = torch.tensor([[1, 2, 3, 4]], dtype=torch.float32)
y = torch.tensor([[5, 6, 7, 8]], dtype=torch.float32)


# Same as `torch.add(x, y)` and `x.add(y)`
element_sum = x + y

# Same as `torch.sub(x, y)` and `x.sub(y)`
element_diff = x - y

# Same as `torch.mul(x, y)` and `x.mul(y)`
element_mul = x * y

# Same as `torch.div(x, y)` and `x.div(y)`
element_div = x / y

# Same as `torch.pow(x, y)` and `x.pow(y)`
element_pow = x ** y

Torch also provides many standard mathematical functions; these are available both as functions in the `torch` module and as instance methods on tensors:

You can find a full list of all available mathematical functions [in the documentation](https://pytorch.org/docs/stable/torch.html#pointwise-ops); many functions in the `torch` module have corresponding instance methods [on tensor objects](https://pytorch.org/docs/stable/tensors.html).

In [24]:
x = torch.tensor([[1, 2, 3, 4]], dtype=torch.float32)

# Same as `torch.sqrt(x)`
print(x.sqrt())

# Same as `torch.sin(x)`
print(x.sin())

# Same as `torch.cos(x)`
print(x.cos())

tensor([[1.0000, 1.4142, 1.7321, 2.0000]])
tensor([[ 0.8415,  0.9093,  0.1411, -0.7568]])
tensor([[ 0.5403, -0.4161, -0.9900, -0.6536]])


### Reduction operations

We may sometimes want to perform operations that aggregate over part or all of a tensor, such as a summation; these are called reduction operations.

Like the elementwise operations above, most reduction operations are available both as functions in the torch module and as instance methods on tensor objects.

In [25]:
x = torch.tensor([[6, 1, 5],
                  [3, 4, 2]], dtype=torch.float32)
print(f'Original tensor: {x.shape}')
print(x)

# Same as `torch.sum(x)`
print('\nSum over entire tensor:')
print(x.sum())

# Same as `torch.sum(x, dim=0)`
print(f'\nx.sum(dim=0) : \n{x.sum(dim=0)}')

# Same as `torch.sum(x, dim=1)`
print(f'\nx.sum(dim=1) : \n{x.sum(dim=1)}')

Original tensor: torch.Size([2, 3])
tensor([[6., 1., 5.],
        [3., 4., 2.]])

Sum over entire tensor:
tensor(21.)

x.sum(dim=0) : 
tensor([9., 5., 7.])

x.sum(dim=1) : 
tensor([12.,  9.])


In [26]:
print('Original tensor:')
print(x)

# Finding the overall minimum only returns a single value
print('\nOverall minimum: ', x.min())

# Compute the minimum along each column or row
# row_min_vals, row_min_idxs = x.min(dim=1)
col_min_vals, col_min_idxs = x.min(dim=0)
row_argmin_idxs = x.argmin(dim=1)

print('\nMinimum along each column:')
print(f'values: {col_min_vals} \tidxs: {col_min_idxs}')
print(f'argmin_idxs: {row_argmin_idxs}')

Original tensor:
tensor([[6., 1., 5.],
        [3., 4., 2.]])

Overall minimum:  tensor(1.)

Minimum along each column:
values: tensor([3., 1., 2.]) 	idxs: tensor([1, 0, 1])
argmin_idxs: tensor([1, 2])


In [27]:
print('Original tensor:')
print(x)

print("\nOverall Mean: {}".format(x.mean()))
print("Mean in the 0th dimension: {}".format(x.mean(dim=0)))
print("Mean in the 1st dimension: {}".format(x.mean(dim=1)))

Original tensor:
tensor([[6., 1., 5.],
        [3., 4., 2.]])

Overall Mean: 3.5
Mean in the 0th dimension: tensor([4.5000, 2.5000, 3.5000])
Mean in the 1st dimension: tensor([4., 3.])


Some people often get confused by the `dim` argument in reduction operations.

The easiest way to remember is to think about the shapes of the tensors involved.
After summing with `dim=d`, the dimension at index `d` of the input is **eliminated** from the shape of the output tensor:

Reduction operations *reduce* the rank of tensors: the dimension over which you perform the reduction will be removed from the shape of the output. If you pass `keepdim=True` to a reduction operation, the specified dimension will not be removed; the output tensor will instead have a shape of 1 in that dimension.

In [28]:
# Create a tensor of shape (128, 10, 3, 64, 64)
x = torch.randn(128, 10, 3, 64, 64)
print(x.shape)

# Take the mean over dimension 1; shape is now (128, 3, 64, 64)
x = x.mean(dim=1)
print(x.shape)

# Take the sum over dimension 2; shape is now (128, 3, 64)
x = x.sum(dim=2)
print(x.shape)

# Take the mean over dimension 1, but keep the dimension from being eliminated
# by passing keepdim=True; shape is now (128, 1, 64)
x = x.mean(dim=1, keepdim=True)
print(x.shape)

torch.Size([128, 10, 3, 64, 64])
torch.Size([128, 3, 64, 64])
torch.Size([128, 3, 64])
torch.Size([128, 1, 64])


### Matrix operations

PyTorch provides a number of linear algebra functions that compute different types of vector and matrix products. You can find a full list of the available linear algebra operators [in the documentation](https://pytorch.org/docs/stable/torch.html#blas-and-lapack-operations).
All of these functions are also available as Tensor instance methods, e.g. [`Tensor.dot`](https://pytorch.org/docs/stable/generated/torch.Tensor.dot.html) instead of `torch.dot`.

In [29]:
# Shape of vectors : 1x2
vec1 = torch.tensor([9, 10], dtype=torch.float32)
vec2 = torch.tensor([11, 12], dtype=torch.float32)

mat1 = torch.tensor([[1,2],[3,4]], dtype=torch.float32)
mat2 = torch.tensor([[5,6],[7,8]], dtype=torch.float32)

# Inner product of vectors
# `dot` only works for vector-vector products.
# Same as `torch.dot(vec1, vec2)`
print(vec1.dot(vec2))

# Matrix-matrix products:
print('\nMatrix-Matrix product :' )
# Same as `torch.mm(x, y)`
print(mat1.mm(mat2))

# Matrix-vector multiply
print('\nMatrix-vector product (rank 1 output) :')
# Same as `torch.mv(mat1, vec1)`
print(mat1.mv(vec1))

# We can reshape the vector to have rank 2 and use torch.mm to perform
# matrix-vector products, but the result will have rank 2
print('\nMatrix-vector product (rank 2 output) :')
# Same as `torch.mm(mat1, vec1.view(2, 1))`
print(mat1.mm(vec1.view(2, 1)))

tensor(219.)

Matrix-Matrix product :
tensor([[19., 22.],
        [43., 50.]])

Matrix-vector product (rank 1 output) :
tensor([29., 67.])

Matrix-vector product (rank 2 output) :
tensor([[29.],
        [67.]])


#### Batched matrix multiply

PyTorch에서는 행렬 곱셈에 대해 torch.bmm과 torch.matmul이라는 두 가지 주요 함수를 제공한다.

`torch.bmm`은 배치된 행렬 곱셈을 위해 특별히 설계되었다. 입력 텐서의 모양이 `(B, N, M)`이고 두 번째 텐서의 모양이 `(B, M, P)`인 경우, 각 배치 요소에 대한 행렬 곱을 계산하고 모양이 `(B, N, P)`인 텐서를 반환한다.

`torch.matmul`는 더 많은 유연성을 제공하며 여러 차원에서의 행렬 곱셈과 브로드캐스팅을 지원한다. torch.matmul의 동작은 입력 텐서의 차원에 따라 달라지며, 자세한 사항은 [여기](https://pytorch.org/docs/stable/generated/torch.matmul.html#torch.matmul)에서 확인가능하다.



In [30]:
# Perform batched matrix multiplication between the tensor x and y
# x : (B, N, M)
# y : (B, M, P)
# z : (B, N, P)
B, N, M, P = 2, 3, 5, 4
x = torch.randn(B, N, M) # 2x3x5
y = torch.randn(B, M, P) # 2x5x4

# We want a tensor z of shape (2, 3, 4)

In [31]:
loop_z = torch.stack([x[i].mm(y[i]) for i in range(B)])
bmm_z = x.bmm(y)

# Same as `torch.matmul(mat1, vec1)`
# Same as `mat1@vec1`
matmul_z = x.matmul(y)

In [32]:
print(f"loop_z.shape : {loop_z.shape}")
print(f"bmm_z.shape : {bmm_z.shape}")
print(f"matmul_z.shape : {matmul_z.shape}")

loop_z.shape : torch.Size([2, 3, 4])
bmm_z.shape : torch.Size([2, 3, 4])
matmul_z.shape : torch.Size([2, 3, 4])


## Broadcasting

Broadcasting is a powerful mechanism that allows PyTorch to work with arrays of different shapes when performing arithmetic operations.

For example, suppose that we want to add a constant vector to each row of a tensor. PyTorch broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

Ttry reading the explanation from the [documentation](https://pytorch.org/docs/stable/notes/broadcasting.html).

Broadcasting usually happens implicitly inside many PyTorch operators. However we can also broadcast explicitly using the function [`torch.broadcast_tensors`](https://pytorch.org/docs/stable/generated/torch.broadcast_tensors.html#torch.broadcast_tensors):

In [33]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = torch.tensor([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = torch.tensor([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)

xx, vv = torch.broadcast_tensors(x, v)
print(f'\nHere is xx (after) broadcasting): \n{xx}')
print(f'\nHere is vv (after broadcasting): \n{vv}')

tensor([[ 2,  2,  4],
        [ 5,  5,  7],
        [ 8,  8, 10],
        [11, 11, 13]])

Here is xx (after) broadcasting): 
tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])

Here is vv (after broadcasting): 
tensor([[1, 0, 1],
        [1, 0, 1],
        [1, 0, 1],
        [1, 0, 1]])


Broadcasting can let us easily implement many different operations. For example we can compute an outer product of vectors:

In [34]:
# Compute outer product of vectors
v = torch.tensor([1, 2, 3])  # v has shape (3,)
w = torch.tensor([4, 5])     # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:
print(v.view(3, 1) * w)

tensor([[ 4,  5],
        [ 8, 10],
        [12, 15]])


## GPU

One of the most important features of PyTorch is that it can use graphics processing units (GPUs) to accelerate its tensor operations.

We can easily check whether PyTorch is configured to use GPUs:

Tensors can be moved onto any device using the .to method.

In [35]:
import torch

if torch.cuda.is_available():
  print('PyTorch can use GPUs!')
else:
  print('PyTorch cannot use GPUs.')

PyTorch can use GPUs!


All PyTorch tensors also have a `device` attribute that specifies the device where the tensor is stored -- either CPU, or CUDA (for NVIDA GPUs). A tensor on a CUDA device will automatically use that device to accelerate all of its operations.

Just as with datatypes, we can use the [`.to()`](https://pytorch.org/docs/1.1.0/tensors.html#torch.Tensor.to) method to change the device of a tensor. We can also use the convenience methods `.cuda()` and `.cpu()` methods to move tensors between CPU and GPU.

In [36]:
# Construct a tensor on the CPU
cpu_x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)

gpu_x = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.float64, device='cuda')

# Move it to the GPU
# Same as `cpu_x.to('cuda')`
x1 = cpu_x.cuda()
print('x1 device:', x1.device)

# Move it to the CPU
# Same as `gpu_x.to('cpu')`
x2 = gpu_x.cpu()
print('x2 device:', x2.device)


# Calling cpu_x.to(gpu_x) where gpu_x is a tensor will return a copy of cpu_x with the same
# device and dtype as gpu_x
x3 = cpu_x.to(gpu_x)
print('x3 device / dtype:', x3.device, x3.dtype)

x1 device: cuda:0
x2 device: cpu
x3 device / dtype: cuda:0 torch.float64


## Tensor Properties

In [37]:
x = torch.tensor([[1,2,3], [4,5,6]])

print(f"Hear is x : \n{x}")

# Get the number of elements in tensor.
print(f"\nNumber of elements : {x.numel()}")

# Transform to list
print(f"to_list : {x.tolist()}")

# Stack 2 copies of v on top of each other
xx = x.repeat((2, 1))
print(f"\nx.repeat : \n{xx}")

print('\nx is a tensor:', torch.is_tensor(x))
print('x is filled with 0: ', (x == 0).all().item() == 1)
print('rank of a: ', x.dim())
print('type(a[0].item()): ', x[0,0].item())



Hear is x : 
tensor([[1, 2, 3],
        [4, 5, 6]])

Number of elements : 6
to_list : [[1, 2, 3], [4, 5, 6]]

x.repeat : 
tensor([[1, 2, 3],
        [4, 5, 6],
        [1, 2, 3],
        [4, 5, 6]])

x is a tensor: True
x is filled with 0:  False
rank of a:  2
type(a[0].item()):  1


In [38]:
# Concatenate in dimension 0 and 1
x_cat0 = torch.cat([x, x, x], dim=0)
x_cat1 = torch.cat([x, x, x], dim=1)

print("Initial shape: {}".format(x.shape))
print("Shape after concatenation in dimension 0: {}".format(x_cat0.shape))
print("Shape after concatenation in dimension 1: {}".format(x_cat1.shape))

Initial shape: torch.Size([2, 3])
Shape after concatenation in dimension 0: torch.Size([6, 3])
Shape after concatenation in dimension 1: torch.Size([2, 9])


## Autograd

We can see that the `x.grad` is updated to be the sum of the gradients calculated so far. When we run backprop in a neural network, we sum up all the gradients for a particular neuron before making an update. This is exactly what is happening here! This is also the reason why we need to run `zero_grad()` in every training iteration (more on this later). Otherwise our gradients would keep building up from one training iteration to the other, which would cause our updates to be wrong.

In [39]:
# Create an example tensor
# requires_grad parameter tells PyTorch to store gradients
x = torch.tensor([2.], requires_grad=True)

# Print the gradient if it is calculated
# Currently None since x is a scalar
print(f"x.grad : {x.grad}")

x.grad : None


In [40]:
# Calculating the gradient of y with respect to x
y = x * x * 3 # 3x^2
y.backward()
print(f"x.grad : {x.grad}")

x.grad : tensor([12.])


In [41]:
z = x * x * 3 # 3x^2
z.backward()
print(f"x.grad : {x.grad}")

x.grad : tensor([24.])


## Neural Network Module

In [42]:
import torch.nn as nn

### Linear Layer

We can use `nn.Linear(H_in, H_out)` to create a a linear layer. This will take a matrix of `(N, *, H_in)` dimensions and output a matrix of `(N, *, H_out)`. The `*` denotes that there could be arbitrary number of dimensions in between. The linear layer performs the operation `Ax+b`, where `A` and `b` are initialized randomly. If we don't want the linear layer to learn the bias parameters, we can initialize our layer with `bias=False`.

In [43]:
# Create the inputs, (N, *, H_in)
input = torch.ones(2,3,4)

# Make a linear layers transforming (N, *, H_in) dimensinal inputs to (N, *, H_out)
linear = nn.Linear(4, 2) # (H_in, H_out)
linear_output = linear(input)
linear_output

tensor([[[-0.3587, -0.4704],
         [-0.3587, -0.4704],
         [-0.3587, -0.4704]],

        [[-0.3587, -0.4704],
         [-0.3587, -0.4704],
         [-0.3587, -0.4704]]], grad_fn=<ViewBackward0>)

In [44]:
list(linear.parameters()) # Ax + b

[Parameter containing:
 tensor([[ 0.1739, -0.2881, -0.4729,  0.2340],
         [-0.1213, -0.1068,  0.1627, -0.4071]], requires_grad=True),
 Parameter containing:
 tensor([-0.0056,  0.0021], requires_grad=True)]

### Activation Function Layer
We can also use the `nn` module to apply activations functions to our tensors. Activation functions are used to add non-linearity to our network. Some examples of activations functions are `nn.ReLU()`, `nn.Sigmoid()` and `nn.LeakyReLU()`. Activation functions operate on each element seperately, so the shape of the tensors we get as an output are the same as the ones we pass in.

In [45]:
sigmoid = nn.Sigmoid()
output = sigmoid(linear_output)
output

tensor([[[0.4113, 0.3845],
         [0.4113, 0.3845],
         [0.4113, 0.3845]],

        [[0.4113, 0.3845],
         [0.4113, 0.3845],
         [0.4113, 0.3845]]], grad_fn=<SigmoidBackward0>)

### Putting the Layers Together
So far we have seen that we can create layers and pass the output of one as the input of the next. Instead of creating intermediate tensors and passing them around, we can use `nn.Sequentual`, which does exactly that.

In [46]:
block = nn.Sequential(
    nn.Linear(5, 3),
    nn.ReLU(),
    nn.Linear(3, 5),
    nn.Sigmoid()
)

### Optimization
We have showed how gradients are calculated with the `backward()` function. Having the gradients isn't enought for our models to learn. We also need to know how to update the parameters of our models. This is where the optomozers comes in. `torch.optim` module contains several optimizers that we can use. Some popular examples are `optim.SGD` and `optim.Adam`. When initializing optimizers, we pass our model parameters, which can be accessed with `model.parameters()`, telling the optimizers which values it will be optimizing. Optimizers also has a learning rate (`lr`) parameter, which determines how big of an update will be made in every step. Different optimizers have different hyperparameters as well.

In [47]:
# Create the y data
y = torch.ones(10, 5)

# Add some noise to our goal y to generate our x
# We want out model to predict our original data, albeit the noise
x = y + torch.randn_like(y)
x

tensor([[ 2.2196,  1.7745,  3.6335, -0.1157,  2.3483],
        [-0.1487, -0.3550,  0.3920,  2.6643,  1.2875],
        [ 1.3407,  3.3309,  0.3421,  0.2369,  0.7445],
        [ 1.3046,  1.1808, -0.7258,  0.1861,  0.5696],
        [ 1.3840,  1.7666,  1.2018,  0.7487,  0.7294],
        [ 0.0048,  2.3275,  0.3536,  3.2583,  0.5383],
        [-0.1754,  0.7848, -0.7914,  0.4923,  0.4555],
        [ 0.9711,  1.2120,  0.0583, -0.7287,  0.4391],
        [ 0.7514,  1.6861,  0.8437,  3.2241,  0.7131],
        [ 0.1686,  0.6933,  2.4573,  0.9088,  0.8204]])

In [48]:
import torch.optim as optim

# Define the optimizer
adam = optim.Adam(block.parameters(), lr=1e-1)

# Define loss using a predefined loss function
loss_function = nn.BCELoss()

# Calculate how our model is doing now
y_pred = block(x)
loss_function(y_pred, y).item()

0.7555209398269653

Let's see if we can have our model achieve a smaller loss. Now that we have everything we need, we can setup our training loop.

In [49]:
# Set the number of epoch, which determines the number of training iterations
n_epoch = 10

for epoch in range(n_epoch):
  # Set the gradients to 0
  adam.zero_grad()

  # Get the model predictions
  y_pred = block(x)

  # Get the loss
  loss = loss_function(y_pred, y)

  # Print stats
  print(f"Epoch {epoch}: traing loss: {loss}")

  # Compute the gradients
  loss.backward()

  # Take a step to optimize the weights
  adam.step()


Epoch 0: traing loss: 0.7555209398269653
Epoch 1: traing loss: 0.6445321440696716
Epoch 2: traing loss: 0.5937466621398926
Epoch 3: traing loss: 0.5451932549476624
Epoch 4: traing loss: 0.4899294376373291
Epoch 5: traing loss: 0.4206928312778473
Epoch 6: traing loss: 0.34689366817474365
Epoch 7: traing loss: 0.26941123604774475
Epoch 8: traing loss: 0.1810774952173233
Epoch 9: traing loss: 0.10739384591579437


You can see that our loss is decreasing. Let's check the predictions of our model now and see if they are close to our original `y`, which was all `1s`.

In [50]:
# See how our model performs on the training data
y_pred = block(x)
y_pred

tensor([[0.9864, 0.9996, 0.9999, 0.9711, 0.9938],
        [0.9008, 0.9839, 0.9935, 0.8828, 0.9795],
        [0.9240, 0.9895, 0.9957, 0.9248, 0.9548],
        [0.8736, 0.9713, 0.9852, 0.8952, 0.9288],
        [0.9220, 0.9891, 0.9956, 0.9199, 0.9610],
        [0.8861, 0.9783, 0.9903, 0.8809, 0.9692],
        [0.7648, 0.8975, 0.9287, 0.8349, 0.8869],
        [0.8691, 0.9692, 0.9838, 0.8927, 0.9265],
        [0.9198, 0.9891, 0.9958, 0.9049, 0.9775],
        [0.9378, 0.9935, 0.9978, 0.9131, 0.9864]], grad_fn=<SigmoidBackward0>)

In [51]:
# Create test data and check how our model performs on it
x2 = y + torch.randn_like(y)
y_pred = block(x2)
y_pred

tensor([[0.8185, 0.9423, 0.9665, 0.8496, 0.9367],
        [0.9311, 0.9917, 0.9969, 0.9197, 0.9743],
        [0.9351, 0.9926, 0.9973, 0.9225, 0.9758],
        [0.9279, 0.9908, 0.9965, 0.9189, 0.9711],
        [0.8786, 0.9736, 0.9866, 0.8979, 0.9314],
        [0.9633, 0.9976, 0.9993, 0.9414, 0.9892],
        [0.9077, 0.9851, 0.9936, 0.9073, 0.9605],
        [0.8516, 0.9615, 0.9795, 0.8726, 0.9410],
        [0.9543, 0.9962, 0.9988, 0.9357, 0.9845],
        [0.8266, 0.9466, 0.9689, 0.8615, 0.9259]], grad_fn=<SigmoidBackward0>)