# Tensors

In deep learning, you will represent your data using tensors. PyTorch (by Meta) and TensorFlow (by Google) are two of the most commonly used libraries for tensor operations and deep learning. In this course, like most courses in academia, we choose PyTorch because of its benefits in dynamic and flexible programming. Now, let's learn what tensors are and what we can do with them.

You should be familiar with vectors and matrices, which are 1D and 2D structures for data. A tensor is simply a generalization of these concepts to any number of dimensions (or axes).Each element of a vector is located by one index, and each element in a matrix is located using 2 indexes. For a tensor, we need an index for each dimension (axis) to locate a certain element. A vector or a matrix are actually special cases of tensors with 1 or 2 dimensions respectively.

If you are familiar with numpy, the N-dimensional array (`numpy.ndarray`) is basically the same concept as a tensor. However, tensor libraries like PyTorch are equipped with automatic differentiation, which is central to deep learning. 

You will need to create and modify tensors, apply element-wise operations, and most importantly, apply tensor multiplications. We will go over the necessary concepts you need to get comfortable working with tensors.

# Define a tensor manually
The most common way to define a tensor manually, or represent an already available data as tensors is to use `torch.tensor()` or `torch.as_tensor()`. `torch.tensor()` always creates a new tensor from the provided data, while `torch.as_tensor()` is more memory efficient. If you do not change the properties of the data (dtype and device, which will be discussed more later on), this will not create a copy of the data as tensor, but rather point to the same data in the hardware as a tensor. Therefore, if you modify its elements, you will modify the original data as well. You can see an example below.

In [1]:
import torch

# A 0-dimensional float tensor
T0 = torch.tensor(
    1.0,
)

# A 1-dimensional boolean tensor
T1 = torch.tensor(
    [True, False]
)

# A 2-dimensional integer tensor
T2 = torch.tensor([
        [1, 2],
        [3, 4],
        [5, 6],
        ])


In [2]:
"""
torch.as_tensor()
"""
import numpy as np
# A 3-dimensional numpy array. Let's convert it to a tensor 
A3 = np.array([
    [
        [1., 2., 3.,],
        [4., 5., 6.,]
    ],
    [
        [7., 8., 9.,],
        [10., 11., 12.,]
    ],
    [
        [13., 14., 15.,],
        [16., 17., 18.,]
    ]
])

T3 = torch.as_tensor(
    A3
)

# Let's change T3
T3[0,0,0] = 1000.
# we change T3, but A3 also changes. Because they point to the same data in memory
print(A3[0,0,0])

1000.0


# Essential attributes of tensors
You always have to pay close attention to the essential attributes of your tensors, which are:
- **shape**: (also referred to as size) the number of indexes in each axis in order, similar to the size of a matrix. Your shape will contain `ndim` numbers, each corresponding to one dimension.
- **device**: the hardware where the data represented by the tensor is stored. CPU is `cpu` and GPU is `cuda` or `cuda:0` or `cuda:i` where `i` is the index of the GPU if you are using more than one.
- **dtype**: The data type of the tensor (bool, int32, int64, float32, float64, ...)
- **require_grad**: whether PyTorch needs to keep track of the gradient (more accurately, the gradient of some thing with respect to this tensor)

Below is a helpful function to inspect the information about your tensors. You will find this very helpful when debugging code. When performing calculations with tensors, they need to be on the same device, and their dtypes and shapes should be compatible with each other! Sometimes PyTorch takes care of the data types for you, but you are always responsible to keep tensors on the intended device. It is also a good practice to explicitly keep track of the dtypes and not fully trust PyTorch with it, as this may lead to unexpected behavior. You will gain more experience with these as you progress through the course. For now, let's inspect some manually defined tensors.

In [3]:
import torch
def print_tensor_info(
        name: str, 
        T: torch.Tensor
        ):
    ndim = T.ndim # or T.dim()
    shape = T.shape # or T.size()
    device = T.device
    dtype = T.dtype
    grad = T.requires_grad
    print(f"Tensor: {name}")
    print(20*"-")
    print(f"ndim: {ndim}")
    print(f"shape: {shape}")
    print(f"device: {device}")
    print(f"dtype: {dtype}")
    print(f"requires_grad: {grad}")
    print(20*"=")

In [4]:
# A 0-dimensional tensor
T0 = torch.tensor(
    1.0,
)
print_tensor_info('T0', T0)

Tensor: T0
--------------------
ndim: 0
shape: torch.Size([])
device: cpu
dtype: torch.float32
requires_grad: False


In [5]:
T1 = torch.tensor(
    [True, False]
)
print_tensor_info('T1', T1)

Tensor: T1
--------------------
ndim: 1
shape: torch.Size([2])
device: cpu
dtype: torch.bool
requires_grad: False


In [6]:
T2 = torch.tensor([
        [1, 2],
        [3, 4],
        [5, 6],
        ])
print_tensor_info('T2', T2)

Tensor: T2
--------------------
ndim: 2
shape: torch.Size([3, 2])
device: cpu
dtype: torch.int64
requires_grad: False


In [7]:
"""
The default float dtype in numpy is float64 which takes up double the memory!
If you create a tensor from a numpy array, the dtype will be the same as the numpy array.

Since PyTorch uses float32 as default, make sure to convert your data to float32
to save memory and computation time.
"""
# A 3-dimensional numpy array. Let's convert it to a tensor 
A3 = np.array([
    [
        [1., 2., 3.,],
        [4., 5., 6.,]
    ],
    [
        [7., 8., 9.,],
        [10., 11., 12.,]
    ],
    [
        [13., 14., 15.,],
        [16., 17., 18.,]
    ]
])

T3 = torch.as_tensor(A3)
print_tensor_info('T3', T3)

Tensor: T3
--------------------
ndim: 3
shape: torch.Size([3, 2, 3])
device: cpu
dtype: torch.float64
requires_grad: False


# Changing the dtype and device of tensors

You will need to modify tensors and their proeprties. The most useful method for this is `.to()`. You can specify a dtype and/or device, or another tensor with the desired dtype and device. You cannot change the tensor to any arbitrary shape, becuase the total number of elements of the tensor needs to stay the same. Modifying the shapes are more tricky, and will be discussed later. 

**Important**: By default, neural network parameters are defined with `dtype=torch.float32` so it is best to make sure all your data and other tensors are turned to this dtype before being used by your defined models.

**Important**: Virtually all operations between tensors need them to be on the same device. Tee default device is `cpu`, so if you are not utilizing a GPU, you will probably be fine. However, remember to move everything to the same device (in most cases the GPU) before executing any calculations with them. The GPU device is referred to as `cuda` in PyTorch.

Probably, the most practicaly way is to use another tensor, usually one that you are going to do some calculations with. Here are some examples of changing the dtypes and devices of tensors:

In [8]:
import torch
# we will see how we can change the attributes of T1
T1 = torch.tensor([1, 2, 3])

T2 = torch.tensor([4., 5.], device='cuda', requires_grad=True)

In [9]:
# CHANGING DTYPE TO FLOAT
print_tensor_info('T1', T1)
print_tensor_info('T1.float()', T1.float())
print_tensor_info('T1.type(torch.float)', T1.type(torch.float))
print_tensor_info('T1.to(torch.float)', T1.to(torch.float))

print(20*'=',"\nBEST WAY:")
print_tensor_info('T1.to(torch.float32)', T1.to(torch.float32))


Tensor: T1
--------------------
ndim: 1
shape: torch.Size([3])
device: cpu
dtype: torch.int64
requires_grad: False
Tensor: T1.float()
--------------------
ndim: 1
shape: torch.Size([3])
device: cpu
dtype: torch.float32
requires_grad: False
Tensor: T1.type(torch.float)
--------------------
ndim: 1
shape: torch.Size([3])
device: cpu
dtype: torch.float32
requires_grad: False
Tensor: T1.to(torch.float)
--------------------
ndim: 1
shape: torch.Size([3])
device: cpu
dtype: torch.float32
requires_grad: False
BEST WAY:
Tensor: T1.to(torch.float32)
--------------------
ndim: 1
shape: torch.Size([3])
device: cpu
dtype: torch.float32
requires_grad: False


In [10]:
# CHANGING TO CUDA (GPU)

print_tensor_info('T1.cuda()', T2.cuda())

print(20*'=',"\nBEST WAY:")
print_tensor_info('T1.to("cuda")', T2.to("cuda"))

Tensor: T1.cuda()
--------------------
ndim: 1
shape: torch.Size([2])
device: cuda:0
dtype: torch.float32
requires_grad: True
BEST WAY:
Tensor: T1.to("cuda")
--------------------
ndim: 1
shape: torch.Size([2])
device: cuda:0
dtype: torch.float32
requires_grad: True


In [11]:
# CHANGING TO WHATEVER T2 is (THE EASIEST WAY)
print_tensor_info('T1.to(T2)', T1.to(T2))

Tensor: T1.to(T2)
--------------------
ndim: 1
shape: torch.Size([3])
device: cuda:0
dtype: torch.float32
requires_grad: False


# Creating template tensors

There are some predefined tensors like tensors full of zeros, ones, a certain value, or sampled from a certain distribution. You can always find these with a quick Google search or asking ChatGPT :) but you will naturally remember them after using them frequently.

In [12]:
import torch
# Zeros or ones with a desired shape(size)
# providing the shape is mandatory
# The default dtype is float32, and default device is cpu
ones = torch.ones(size=(3,4), dtype=torch.float32, device='cpu')
zeros = torch.zeros(size=(3,4), dtype=torch.float32, device='cpu')

"""
These ones are pretty useful
"""
# creating tensors with the same dtype, device, shape as another tensor
# You can still change dtype, device
T = torch.tensor([1, 2, 3, 4])
ones_like = torch.ones_like(T) 
zeros_like = torch.zeros_like(T)

# full tensors
full = torch.full(size=(3,4), fill_value=3.14, dtype=torch.float32, device='cpu')
full_like = torch.full_like(T, fill_value=3.14)

# arange
# start, end, step
# default dtype is int64
arange = torch.arange(0, 10, 2) # the end is exclusive
# [0 2 4 6 8]

# linspace
# start, end, number of points

linspace = torch.linspace(0, 10, 5)
# [0., 2.5, 5., 7.5, 10.]

# Try printing their info and see for yourself!
print_tensor_info('T', T)
print(T)
print_tensor_info('ones_like', ones_like)
print(ones_like)
print_tensor_info('zeros_like', zeros_like)
print(zeros_like)
print_tensor_info('full', full)
print(full)
print_tensor_info('full_like', full_like)
print(full_like)
print_tensor_info('arange', arange)
print(arange)
print_tensor_info('linspace', linspace)
print(linspace)

Tensor: T
--------------------
ndim: 1
shape: torch.Size([4])
device: cpu
dtype: torch.int64
requires_grad: False
tensor([1, 2, 3, 4])
Tensor: ones_like
--------------------
ndim: 1
shape: torch.Size([4])
device: cpu
dtype: torch.int64
requires_grad: False
tensor([1, 1, 1, 1])
Tensor: zeros_like
--------------------
ndim: 1
shape: torch.Size([4])
device: cpu
dtype: torch.int64
requires_grad: False
tensor([0, 0, 0, 0])
Tensor: full
--------------------
ndim: 2
shape: torch.Size([3, 4])
device: cpu
dtype: torch.float32
requires_grad: False
tensor([[3.1400, 3.1400, 3.1400, 3.1400],
        [3.1400, 3.1400, 3.1400, 3.1400],
        [3.1400, 3.1400, 3.1400, 3.1400]])
Tensor: full_like
--------------------
ndim: 1
shape: torch.Size([4])
device: cpu
dtype: torch.int64
requires_grad: False
tensor([3, 3, 3, 3])
Tensor: arange
--------------------
ndim: 1
shape: torch.Size([5])
device: cpu
dtype: torch.int64
requires_grad: False
tensor([0, 2, 4, 6, 8])
Tensor: linspace
--------------------
ndim:

In [13]:
""" Random Tensors """

# random numbers from a uniform distribution
# between 0 and 1
rand = torch.rand(size=(3,4), dtype=torch.float32, device='cpu')

# random numbers from a normal distribution
# with mean 0 and variance 1
randn = torch.randn(size=(3,4), dtype=torch.float32, device='cpu')
# you can change mean and std by a simple multiplication and addition

# random integers
randint = torch.randint(low=0, high=10, size=(3,4), dtype=torch.int64, device='cpu')

# random permutation
# random permutation of integers from 0 to 9
randperm = torch.randperm(n=10, dtype=torch.int64, device='cpu')

# Try printing their info and content and see for yourself!
print_tensor_info('rand', rand)
print(rand)
print_tensor_info('randn', randn)
print(randn)
print_tensor_info('randint', randint)
print(randint)
print_tensor_info('randperm', randperm)
print(randperm)

Tensor: rand
--------------------
ndim: 2
shape: torch.Size([3, 4])
device: cpu
dtype: torch.float32
requires_grad: False
tensor([[0.7627, 0.6968, 0.0681, 0.1095],
        [0.1142, 0.0633, 0.1200, 0.8693],
        [0.4275, 0.3825, 0.3402, 0.9558]])
Tensor: randn
--------------------
ndim: 2
shape: torch.Size([3, 4])
device: cpu
dtype: torch.float32
requires_grad: False
tensor([[-1.9317,  0.7736, -0.8705,  0.5171],
        [-0.6615,  1.2437,  0.1956, -0.8707],
        [ 0.1867,  1.7038,  0.6633, -0.7487]])
Tensor: randint
--------------------
ndim: 2
shape: torch.Size([3, 4])
device: cpu
dtype: torch.int64
requires_grad: False
tensor([[6, 3, 0, 6],
        [0, 4, 0, 9],
        [0, 8, 8, 5]])
Tensor: randperm
--------------------
ndim: 1
shape: torch.Size([10])
device: cpu
dtype: torch.int64
requires_grad: False
tensor([4, 7, 5, 9, 2, 6, 1, 0, 8, 3])


# Indexing and slicing tensors

Slicing and indexing tensors are very simple and similar to numpy if you are already familiar with it. You can use the `:` notationin ways like this to refer to a certain slice the tensor along some dimension.
- `:` means all elements
- `start:end` specified the inclusive starting index and ending exclusive index of the slice. If any of them is not defined, it is assumed to be the very beginning (for start) or the very end (for end)!
- `start:end:step` begins from starts and selects the elements with the specified step size. If not specified, the default step size is 1.
- You can also use the `slice` function if you are looking to define your slices deparately, and then index your tensors with them.

remember that negative indexes start from the end. For example, -1 refers to the last location, -2 second last, and so on. Here are some examples:

In [15]:
import torch

T = torch.randn(5, 6)

print('T:')
print(T)
print(20*'-')

# some examples of slicing:

# get the first row
print('First row:T[0]')
print(T[0])
print(20*'-')

# get the first column
print('First column:T[:,0]')
print(T[:,0])
print(20*'-')

# get the first two rows
print('First two rows:T[:2]')
print(T[:2])
print(20*'-')

# Get even columns of all rows:
print('Even columns:T[:,::2]')
print(T[:,::2])
print(20*'-')

# Get last two columns of the first two rows
print('Last two columns of the first two rows:T[:2,-2:]')
print(T[:2,-2:])
print(20*'-')

# example with slice:

our_slice =  slice(1, None, 2) # equivalent to 1::2
print('the odd rows:')
print(T[our_slice])
print(20*'-')


T:
tensor([[-0.0785, -1.0346, -0.7904, -0.2524, -0.6221,  0.1752],
        [-1.7105,  0.1018, -0.2919,  0.9947, -0.4232, -0.0767],
        [-0.0153, -2.2955, -0.3709, -0.8739,  0.2484,  0.3959],
        [-0.8739,  0.2752,  1.1091, -0.6809,  0.0941, -1.6234],
        [-1.1378, -0.2238, -1.0278, -1.3972,  0.1065, -1.0364]])
--------------------
First row:T[0]
tensor([-0.0785, -1.0346, -0.7904, -0.2524, -0.6221,  0.1752])
--------------------
First column:T[:,0]
tensor([-0.0785, -1.7105, -0.0153, -0.8739, -1.1378])
--------------------
First two rows:T[:2]
tensor([[-0.0785, -1.0346, -0.7904, -0.2524, -0.6221,  0.1752],
        [-1.7105,  0.1018, -0.2919,  0.9947, -0.4232, -0.0767]])
--------------------
Even columns:T[:,::2]
tensor([[-0.0785, -0.7904, -0.6221],
        [-1.7105, -0.2919, -0.4232],
        [-0.0153, -0.3709,  0.2484],
        [-0.8739,  1.1091,  0.0941],
        [-1.1378, -1.0278,  0.1065]])
--------------------
Last two columns of the first two rows:T[:2,-2:]
tensor([[-0.

In [16]:
"""
Indexing with other tensors!
"""

T1 = torch.randn(3, 4)
print('T1:')
print(T1)
print(20*'-')

# In order to index T1 with another tensor, the two tensors must have the same shape in the shared dimensions
# Then, the content of the index tensor will be used to index the tensor to be indexed.

# Let's create a tensor with the same shape as T1
# and fill it with random integers from 0 to 6
index_tensor = torch.randint(low=0, high=3, size=(3,))
print('index_tensor:')
print(index_tensor)
print(20*'-')

T1_indexed = T1[index_tensor]
print('T1 indexed:')
print(T1_indexed)
print(20*'-')

# you can also do this with random dimensions in the middle of the tensor.

T2 = torch.randn(3, 4, 5, 6)
print('T2:')
print(T2)
print(20*'-')


index_tensor_dim1 = torch.randint(low=0, high=4, size=(7, 8,))
index_tensor_dim2 = torch.randint(low=0, high=5, size=(7, 8,))
print('index_tensor_dim1:')
print(index_tensor_dim1)
print(20*'-')
print('index_tensor_dim2:')
print(index_tensor_dim2)
print(20*'-')

T2_indexed = T2[:, index_tensor_dim1, index_tensor_dim2, :]
print('T2_indexed')
print(T2_indexed)
print(20*'-')

print('T2 shape:')
print(T2.shape)
print(20*'-')

print('T2_indexed shape:')
print(T2_indexed.shape)
print(20*'-')



T1:
tensor([[-1.5326,  0.4593, -0.0143, -0.2424],
        [ 1.8731, -0.0026, -1.0915,  1.0800],
        [ 2.2229, -1.0480,  0.3520,  0.2826]])
--------------------
index_tensor:
tensor([1, 1, 1])
--------------------
T1 indexed:
tensor([[ 1.8731, -0.0026, -1.0915,  1.0800],
        [ 1.8731, -0.0026, -1.0915,  1.0800],
        [ 1.8731, -0.0026, -1.0915,  1.0800]])
--------------------
T2:
tensor([[[[ 1.4762, -0.5973,  0.2201,  0.7101,  0.9679,  0.9225],
          [ 2.0057,  0.4459,  0.4216,  0.5133, -0.0366, -0.1233],
          [-0.2005, -1.0421, -0.2724, -1.4060,  0.0142, -0.8174],
          [ 0.1196, -0.7712,  0.5466, -0.6752, -0.6388, -0.8656],
          [ 0.2086,  0.0208, -0.8938, -0.2721,  1.3330,  0.5093]],

         [[ 0.3714,  1.1129, -0.8092, -0.9926, -1.1405, -0.1932],
          [ 0.6741, -1.1251,  0.2330, -1.2827, -0.5372,  0.2825],
          [-0.6703, -0.5624,  0.2609, -1.9529, -0.2133, -0.3401],
          [ 2.0776,  0.9014,  0.9570,  0.2151,  1.4536,  0.8889],
          [

In [17]:
"""
Indexing with a boolean tensor (masking)
"""

T = torch.randn(3, 4)
print('T:')
print(T)
print(20*'-')

# Let's create a boolean tensor with the same shape as T

mask = T <= 0
print('mask:')
print(mask)
print(20*'-')

# Now we can index T with the mask
T_masked = T[mask]

print('T masked:')
print(T_masked)
print(20*'-')

"""
The masked dimensions will be flattened!
However, you can use this index to assign new values to the masked elements. Like this:
"""

T[mask] = 0
print('T after masking:')
print(T)
print(20*'-')


T:
tensor([[-0.9260, -0.3436, -1.4533, -0.6678],
        [-0.5388, -0.0264, -0.5818,  0.5138],
        [-0.9419,  1.3927,  0.5923,  0.8892]])
--------------------
mask:
tensor([[ True,  True,  True,  True],
        [ True,  True,  True, False],
        [ True, False, False, False]])
--------------------
T masked:
tensor([-0.9260, -0.3436, -1.4533, -0.6678, -0.5388, -0.0264, -0.5818, -0.9419])
--------------------
T after masking:
tensor([[0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.5138],
        [0.0000, 1.3927, 0.5923, 0.8892]])
--------------------


# Manipulating the shape of tensors

This is probably the most tricky job with tensors. Here we go over the most common and frequent ways that tensors have their shapes modified or rearranged.

In [18]:
import torch

"""
Combining and splitting dimensions of a tensor using flatten and reshape:
"""

# flatten is used to flatten several dimensions into one by rolling the higher dimensions into the lower ones
T = torch.randn(5, 6, 7, 8, 9, 10)

T_f1 = T.flatten()
print('T.flatten() has shape', T_f1.shape)

# flatten with start_dim and end_dim (both inclusive)
T_f2 = T.flatten(start_dim=1, end_dim=2)
print('T.flatten(start_dim=1, end_dim=2) has shape', T_f2.shape)

# reshape to turn it back
T_r3 = T_f2.reshape(5, 6, 7, 8, 9, 10)
print('T_f2.reshape(5, 6, 7, 8, 9, 10) has shape', T_r3.shape)


T.flatten() has shape torch.Size([151200])
T.flatten(start_dim=1, end_dim=2) has shape torch.Size([5, 42, 8, 9, 10])
T_f2.reshape(5, 6, 7, 8, 9, 10) has shape torch.Size([5, 6, 7, 8, 9, 10])


In [19]:
"""
Changing the order of the dimensions
"""

T1 = torch.randn(5, 6, 7, 8, 9, 10)

# torch.movedim and torch.moveaxis (the same)
T2 = T1.movedim(0, 2)
print('T2.movedim(0, 2) has shape', T2.shape)

T3 = T1.moveaxis([1, 2], [-2, -1])
print('T3.moveaxis([1, 2], [-2, -1]) has shape', T3.shape)

# torch.transpose to swap two dimensions
T4 = T1.transpose(1, 2)
print('T4.transpose(1, 2) has shape', T4.shape)

# torch.permute to permute the dimensions
T5 = T1.permute(1, 0, 3, 2, 5, 4)
print('T5.permute(1, 0, 3, 2, 5, 4) has shape', T5.shape)


T2.movedim(0, 2) has shape torch.Size([6, 7, 5, 8, 9, 10])
T3.moveaxis([1, 2], [-2, -1]) has shape torch.Size([5, 8, 9, 10, 6, 7])
T4.transpose(1, 2) has shape torch.Size([5, 7, 6, 8, 9, 10])
T5.permute(1, 0, 3, 2, 5, 4) has shape torch.Size([6, 5, 8, 7, 10, 9])


In [20]:
"""
Adding and removing extra dimensions
"""

T = torch.randn(5, 6, 7)

T1 = T.unsqueeze(1)
print('T.unsqueeze(1) has shape', T1.shape)

# My favorite way is indexing with None
T2 = T[:, None, :, :]
print('T[:, None, :, :] has shape', T2.shape)

T3 = T[None, :, None, :, None, :]
print('T[None, :, None, :, None, :] has shape', T3.shape)

# Removing extra dimensions:
T3_squeezed = T3.squeeze()
print('T3.squeeze() has shape', T3_squeezed.shape)

# removing the extra dimensions at specific positions
T3_squeezed_0_2 = T3.squeeze(0)
print('T3.squeeze(0).squeeze(2) has shape', T3_squeezed_0_2.shape)

# or you can simply index it with 0
T3_squeezed_indexed = T3[0, :, 0, :, 0, :]
print('T3[0, :, 0, :, 0, :] has shape', T3_squeezed_indexed.shape)

# you can use ... if there are too many dimensions

# adding a dimension at the beginning and the end
T4 = T3[None, ..., None]
print('T3[None, ..., None] has shape', T4.shape)

T.unsqueeze(1) has shape torch.Size([5, 1, 6, 7])
T[:, None, :, :] has shape torch.Size([5, 1, 6, 7])
T[None, :, None, :, None, :] has shape torch.Size([1, 5, 1, 6, 1, 7])
T3.squeeze() has shape torch.Size([5, 6, 7])
T3.squeeze(0).squeeze(2) has shape torch.Size([5, 1, 6, 1, 7])
T3[0, :, 0, :, 0, :] has shape torch.Size([5, 6, 7])
T3[None, ..., None] has shape torch.Size([1, 1, 5, 1, 6, 1, 7, 1])


# Attaching tensors together

You can stack or concatenate a list of tensors using `torch.stack()` and `torch.cat()` along a specific dimension.

In [21]:
import torch

# Concatenate along a specific dimension (other dimensions must match)
T1 = torch.randn(5, 6, 7)
T2 = torch.randn(5, 6, 8)
T12_cat = torch.cat([T1, T2], dim=2)
print('torch.cat([T1, T2], dim=2) has shape', T12_cat.shape)


# Stacking several tensors of the same shape along a new dimension
# dimensions must match
T3 = torch.randn(4, 5, 6)
T4 = torch.randn(4, 5, 6)
T5 = torch.randn(4, 5, 6)
T345_stack = torch.stack([T3, T4, T5], dim=0)
print('torch.stack([T3, T4, T5], dim=0) has shape', T345_stack.shape)


torch.cat([T1, T2], dim=2) has shape torch.Size([5, 6, 15])
torch.stack([T3, T4, T5], dim=0) has shape torch.Size([3, 4, 5, 6])


# Tensor calculations

You will perform calculations with Tensors, and you have to make sure shapes, dtypes and devices are sorted out so the calculations are executed as intended. Most operations are scalar and arithmetic operations. Here are some examples:

In [22]:
import torch

T1 = torch.randn(5, 6, 7)
T2 = torch.randn(5, 6, 7)

# element-wise operations:
T_sum = T1 + T2
T_diff = T1 - T2
T_prod = T1 * T2
T_div = T1 / T2
T_pow = T1 ** T2
print('T_sum', T_sum.shape)
print('T_diff', T_diff.shape)
print('T_prod', T_prod.shape)
print('T_div', T_div.shape)
print('T_pow', T_pow.shape)

T_sum torch.Size([5, 6, 7])
T_diff torch.Size([5, 6, 7])
T_prod torch.Size([5, 6, 7])
T_div torch.Size([5, 6, 7])
T_pow torch.Size([5, 6, 7])


### What if the shapes don't match?
In many occasions, some tensor may not have the same exact shape as the other tensor in the calculation. For the remaining dimensions, we want to generalize the same calculation for all indexes in those dimnension. That is done using a mechanism called **Broadcasting**. Here are some examples of how you can modify the tensors to make them broadcastable! 

Here is a simple explanation of how broadcasting works:
- First, the two tensors' shapes are put on top of each other, starting from the last dimension
- If the two dimensions are the same, there is no problem
- If we have run out of dimensions on one tensor or the dimension is of size 1, the tensor is broadcased (replicated) with enough copies so that it becomes of the same size as the other tensor.
- This continues until all dimensions are figured out.

You may get this better with some examples:

In [23]:
T1 = torch.randn(5, 6, 7)
T2 = torch.randn(7)

# broadcasting
# T1: [5, 6, 7]
# T2: [      7]
# go over the above mechanism to see how it works
# This is broadcastable so you will not get an error
T1_plus_T2 = T1 + T2


In [24]:
T1 = torch.randn(5, 6, 7)
T2 = torch.randn(6)

# we want to multiply across the dimension with size 6

# T1: [5, 6, 7]
# T2: [      6]

# Broadcasting starts from the last dimension
# There is a mismatch in the last dimension
# So, it will not work

T1_times_T2 = T1 * T2

RuntimeError: The size of tensor a (7) must match the size of tensor b (6) at non-singleton dimension 2

In [25]:
# What if we add a dummy dimension to T2?

T1 = torch.randn(5, 6, 7)
T2 = torch.randn(6)

T2 = T2[..., None] # adding a dummy dimension at the end

# T1: [5, 6, 7]
# T2: [   6, 1]

# Now it is broadcastable

T1_times_T2 = T1 * T2
print(T1_times_T2.shape)

torch.Size([5, 6, 7])


In [26]:
T1 = torch.randn(5, 6, 7)
T2 = torch.randn(5)

# T1: [5, 6, 7]
# T2: [      5]

# There is a mismatch in the second dimension
# So, it will not work

T1_times_T2 = T1 * T2

RuntimeError: The size of tensor a (7) must match the size of tensor b (5) at non-singleton dimension 2

In [27]:
# What if we add dummy dimensions to T2?

T1 = torch.randn(5, 6, 7)
T2 = torch.randn(5)

T2 = T2[:, None, None] # adding two dummy dimensions at the end

# T1: [5, 6, 7]
# T2: [5, 1, 1]

# Now it is broadcastable

T1_times_T2 = T1 * T2


In [28]:
# Another way is to move the dimension of T1 to the end and then put it back:

T1 = torch.randn(5, 6, 7)
T2 = torch.randn(5)

T1 = T1.movedim(0, 2)

# T1: [6, 7, 5]
# T2: [      5]

# Now it is broadcastable

T1_times_T2 = T1 * T2

# Put it back to the original shape
T1 = T1.movedim(2, 0)

So to summarize, broadcasting starts from the last dimension. For broadcasting to be successful, you have to face one of these situations for every dimension:
- Both tensors are of the same size at this dimension, calculations will proceed normally
- One of the tensors has size of 1 in this dimension, so it will be broadcasted
- You run out of dimensions on one of the tensors, so the remaining dimensions will be broadcasted

# Tensor and Matrix Multiplication
Finally at the heart of neural networks, we have tensor multipliation, which is simply a generalization of matrix multiplication. Since tensors can have more than 2 dimensions, tensor multiplications may seem more complicated, but it is not as difficult as it may seem. Just like matrix multiplication, A certain dimension from the two tensors are used for a inner product operation across all the remaining dimensions of both tensors. Here is an example:

In [29]:
import torch
"""
Matrix multiplication
"""

T1 = torch.randn(5, 6)
T2 = torch.randn(6, 7)

# matrix multiplication with 2D tensors (matrices) using @ or torch.matmul
T_matmul = T1 @ T2

print('T1 @ T2 has shape', T_matmul.shape)

T1 @ T2 has shape torch.Size([5, 7])


In [30]:
"""
grouped or batched matrix multiplication:
"""

# torch.matmul and @ consider the last two dimensions as the matrices
# The remaining dimensions should be broadcastable!
T1 = torch.randn(5, 6, 7, 8) # (..., D1, D2)

T2 = torch.randn(5, 1, 8, 9) # (..., D2, D3)

# batch matrix multiplication (torch.matmul or @)
T_matmul = torch.matmul(T1, T2) # (..., D1, D3)
T_at = T1 @ T2

# The 
print('torch.bmm(T1, T2) has shape', T_matmul.shape)
print('T_at has shape', T_at.shape)

torch.bmm(T1, T2) has shape torch.Size([5, 6, 7, 9])
T_at has shape torch.Size([5, 6, 7, 9])


In [31]:
"""
Tensor Multiplication
"""

# For mor complex multiplications, we can use torch.einsum

T1 = torch.randn(5, 6, 7)
T2 = torch.randn(6, 7, 8, 9)
# We want to dot product the dimensions with size 6 and 7 (treating them as a long vector and do the dot product)

T_dot = torch.einsum('ijk, jklm -> ilm', T1, T2)

print('torch.einsum("ijk, jklm -> ilm", T1, T2) has shape', T_dot.shape)

torch.einsum("ijk, jklm -> ilm", T1, T2) has shape torch.Size([5, 8, 9])


# Reduction operations

You can take the sum, mean, standard deviation (std), product, ... of a matrix along certain dimensions. These are called reduction, because they reduce ndim by default. You can choose to keep the reduced dimension by passing `keepdim=True`. All these are both torch functions (`torch.mean(T, dim=..., keepdim=...)`) and also methods of a tensor object (`T.mean(dim=..., keepdim=...)`). Here are some examples:

In [32]:
import torch

T = torch.randn(5, 6, 7)
print('T has shape', T.shape)

T_sum2 = T.sum(dim=2)
print('T.sum(dim=2) has shape', T_sum2.shape)

# Summ across multiple dimensions:
T_sum01 = T.sum(dim=(0, 1))
print('T.sum(dim=(0, 1)) has shape', T_sum01.shape)

# Keeping the dimensions that your perform your operations along. The size of those dimensions will be 1
T_sum_keepdim = T.sum(dim=2, keepdim=True)
print('T.sum(dim=2, keepdim=True) has shape', T_sum_keepdim.shape)

# Same rules for:

# mean()
# std()
# var()
# max()
# min()
# argmax()
# argmin()
# prod()

# Try them out!

T has shape torch.Size([5, 6, 7])
T.sum(dim=2) has shape torch.Size([5, 6])
T.sum(dim=(0, 1)) has shape torch.Size([7])
T.sum(dim=2, keepdim=True) has shape torch.Size([5, 6, 1])
