# Tensor Math

Sometimes it might be useful to perform tensor operations. For example, a Residual Block from "[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)" (He et al., 2016) makes use of such operations to implement a skip connection. 

![Resnet block](https://upload.wikimedia.org/wikipedia/commons/5/59/Resnet.png)

Example:
```python
class ResidualBlock(nn.Module):
    def __init__(self, channels, kernel_size, padding):
        # define your layers here
        # self.layer = mylayer(x, y)
        # ...

    def forward(self, x):
        # saves residual
        residual = x

        # apply a bunch of transformations
        # out = self.layer(x)
        # ...

        # skip/residual connection
        # a.k.a. element wise addition of two matrices (out + residual)
        out += residual
        # same as out = out + residual
        
        # apply activation function
        # out = relu(out)
        return out
```

---
# Acknowledgements

Most of the figures were taken from [Linear Algebra Done Right](https://linear.axler.net/) (Axler et al., 2024), which can be found [here](https://linear.axler.net/) or [here](https://link.springer.com/book/10.1007/978-3-031-41026-0). We thank the author for releasing the book in open access format, under CC BY-NC-ND 4.0 license.

Some of the figures were also taken from [College Algebra 2e](https://openstax.org/books/college-algebra-2e/pages/7-5-matrices-and-matrix-operations) (OpenStax et al., 2024). We thank OpenStax for releasing the book, under CC BY 4.0 license.

In [1]:
# Ensures versions are correct
! pip install torch==2.3.0 numpy==1.25.2 pillow==9.4.0

import torch
import numpy as np
import PIL

print(f"Torch version: {torch.__version__}")
print(f"Numpy version: {np.__version__}")
print(f"PIL version: {PIL.__version__}")
print(f"GPU enabled: {torch.cuda.is_available()}")

Torch version: 2.3.0+cu121
Numpy version: 1.25.2
PIL version: 9.4.0
GPU enabled: True


# Matrix Basic operations

### Matrix addition

Matrix addition, or element-wise addition, can be defined by:

![Matrix addition](https://raw.githubusercontent.com/matheusfvesco/pytorch-studies/dev/assets/LADR4e/Matrix%20addition.png)

$$
\left(
\begin{array}{ccc}
A_{1,1} & \cdots & A_{1,n} \\
\vdots &  & \vdots \\
A_{m,1} & \cdots & A_{m,n}
\end{array}
\right) + \left(
\begin{array}{ccc}
C_{1,1} & \cdots & C_{1,n} \\
\vdots &  & \vdots \\
C_{m,1} & \cdots & C_{m,n}
\end{array}
\right) = \left(
\begin{array}{cccc}
A_{1,1}+{\rm C}_{1,1} & \cdots & A_{1,n}+{\rm C}_{1,n} \\
\vdots &  & \vdots &  \\
A_{m,1}+{\rm C}_{m,1} & \cdots & A_{m,n}+{\rm C}_{m,n} &
\end{array}
\right)
$$

In [2]:
import torch

a = torch.tensor([[1, 2],
                  [3, 4]])
c = torch.tensor([[5, 6],
                  [7, 8]])
addition = a + c

print("Matrix a: ")
print(a.shape)
print(a)
print()
print("Matrix c")
print(c.shape)
print(c)
print()
print("Result of a + c:")
print(addition.shape)
print(addition)

Matrix a: 
torch.Size([2, 2])
tensor([[1, 2],
        [3, 4]])

Matrix c
torch.Size([2, 2])
tensor([[5, 6],
        [7, 8]])

Result of a + c:
torch.Size([2, 2])
tensor([[ 6,  8],
        [10, 12]])


As we can see, matrix addition is done by summing the values of both matrices on their respective position.
In the example above, we can see that:

$$
a + c = \begin{bmatrix}
      1_{1, 1} & 2_{1, 2} \\[0.3em]
      3_{2, 1} & 4_{2, 2}
     \end{bmatrix} + \begin{bmatrix}
       5_{1, 1} & 6_{1, 2} \\[0.3em]
       7_{2, 1} & 8_{2, 2}
      \end{bmatrix} = \begin{bmatrix}
      6_{1, 1} & 8_{1, 2} \\[0.3em]
      10_{2, 1} & 12_{2, 2}
     \end{bmatrix}
$$

Or to help you visualize it:

$$
1_{1, 1} + 5_{1, 1} = 6_{1, 1}
$$
$$
2_{1, 2} + 6_{1, 2} = 8_{1, 2}
$$
$$
3_{2, 1} + 7_{2, 1} = 10_{2, 1}
$$
$$
4_{2, 2} +  8_{2, 2} = 12_{2, 2}
$$

But what happens when the tensor have slightly different dimensions? Like a vector...

In [3]:
a = torch.tensor([1, 2])
c = torch.tensor([[3, 4],
                  [5, 6]])
addition = a+c

print("Matrix a: ")
print(a.shape)
print(a)
print()
print("Matrix c")
print(c.shape)
print(c)
print()
print("Result of a + c:")
print(addition.shape)
print(addition)

Matrix a: 
torch.Size([2])
tensor([1, 2])

Matrix c
torch.Size([2, 2])
tensor([[3, 4],
        [5, 6]])

Result of a + c:
torch.Size([2, 2])
tensor([[4, 6],
        [6, 8]])


In the example above, we can see that:

$$
a + c = \begin{bmatrix}
      1_{1, 1} & 2_{1, 2} \\[0.3em]
     \end{bmatrix} + \begin{bmatrix}
       3_{1, 1} & 4_{1, 2} \\[0.3em]
       5_{2, 1} & 6_{2, 2}
      \end{bmatrix} = \begin{bmatrix}
      4_{1, 1} & 6_{1, 2} \\[0.3em]
      6_{2, 1} & 8_{2, 2}
     \end{bmatrix}
$$

Or to help you visualize it:

$$
1_{1, 1} + 3_{1, 1} = 4_{1, 1}
$$
$$
2_{1, 2} + 4_{1, 2} = 6_{1, 2}
$$
$$
1_{1, 1} + 5_{2, 1} = 6_{2, 1}
$$
$$
2_{1, 2} +  6_{2, 2} = 8_{2, 2}
$$

In this case, we apply a sum of tensor a to each line of tensor c.

What if our tensors are really different?

In [4]:
a = torch.tensor([1, 2, 3])
c = torch.tensor([[4, 5],
                  [6, 7]])
addition = a + c

print("Matrix a: ")
print(a.shape)
print(a)
print()
print("Matrix c")
print(c.shape)
print(c)
print()
print("Result of a + c:")
print(addition.shape)
print(addition)

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

#### Scalar addition

And what if instead we want to sum a scalar to a tensor?

In [5]:
a = torch.tensor([[1, 2],
                  [3, 4]])
addition = a + 10

print("Matrix a: ")
print(a.shape)
print(a)
print()
print("Result of a + 10:")
print(addition.shape)
print(addition)

Matrix a: 
torch.Size([2, 2])
tensor([[1, 2],
        [3, 4]])

Result of a + 10:
torch.Size([2, 2])
tensor([[11, 12],
        [13, 14]])


In the example above, we can see that:

$$
a + 10 = \begin{bmatrix}
      1_{1, 1} & 2_{1, 2} \\[0.3em]
      3_{2, 1} & 4_{2, 2}
     \end{bmatrix} + 10 = \begin{bmatrix}
      11_{1, 1} + 10 & 12_{1, 2} +10 \\[0.3em]
      13_{2, 1} + 10 & 14_{2, 2} + 10
     \end{bmatrix} = \begin{bmatrix}
      11_{1, 1} & 12_{1, 2} \\[0.3em]
      13_{2, 1} & 14_{2, 2}
     \end{bmatrix}
$$

#### Vector addition

Adding vetors to matrices is another common operations in deep learning. We already saw previously what happens if we have tensors of slightly different dimensions, like vectors. Let's see a practical example below:

We will try to sum a vector of shape $(m, 1)$ to a matrice of shape $(m, n)$

In [33]:
import torch
torch.manual_seed(42)
a = torch.randint(
    low=1,
    high=5,
    size=(5,3)
)
b = torch.randint(
    low=1,
    high=3,
    size=(5, 1)
)

In [34]:
print(a)
b

tensor([[3, 4, 1],
        [3, 3, 4],
        [1, 1, 3],
        [2, 3, 3],
        [3, 3, 4]])


tensor([[1],
        [2],
        [2],
        [2],
        [1]])

In [35]:
a + b

tensor([[4, 5, 2],
        [5, 5, 6],
        [3, 3, 5],
        [4, 5, 5],
        [4, 4, 5]])

In the example above, we can see that:

$$
a + 10 = \begin{bmatrix}
      a_{1, 1} & a_{1, 2} \\[0.3em]
      a_{2, 1} & a_{2, 2}
     \end{bmatrix} + \begin{bmatrix}
      b_{1, 1} \\[0.3em]
      b_{2, 1}
     \end{bmatrix} = \begin{bmatrix}
      a_{1, 1}+b_{1, 1} & a_{1, 2} + b_{1, 1} \\[0.3em]
      a_{2, 1}+b_{2, 1} & a_{2, 2} + b_{2, 1}
     \end{bmatrix}
$$

Or, in other words, we sum the column of $b$ to each column of $a$

Now let's try adding a vector of shape $(1, n)$

In [36]:
c = torch.randint(
    low=1,
    high=3,
    size=(1, 3)
)

In [37]:
print(a)
c

tensor([[3, 4, 1],
        [3, 3, 4],
        [1, 1, 3],
        [2, 3, 3],
        [3, 3, 4]])


tensor([[2, 1, 2]])

In [38]:
a + c

tensor([[5, 5, 3],
        [5, 4, 6],
        [3, 2, 5],
        [4, 4, 5],
        [5, 4, 6]])

In the example above, we can see that:

$$
a + 10 = \begin{bmatrix}
      a_{1, 1} & a_{1, 2} \\[0.3em]
      a_{2, 1} & a_{2, 2}
     \end{bmatrix} + \begin{bmatrix}
      c_{1, 1} & c_{1, 2} \\[0.3em]
     \end{bmatrix} = \begin{bmatrix}
      a_{1, 1}+c_{1, 1} & a_{1, 2} + c_{1, 2} \\[0.3em]
      a_{2, 1}+c_{1, 1} & a_{2, 2} + c_{1, 2}
     \end{bmatrix}
$$

Or, in other words, we sum the row of $c$ to each row of $a$

What happens if we have "imcompatible" dimensions, like adding a vector of shape $(p,1)$ to a matrice of dimensions $(m, n)$

In [40]:
import torch
import torch
torch.manual_seed(42)
a = torch.randint(
    low=1,
    high=5,
    size=(5,3)
)
b = torch.randint(
    low=1,
    high=3,
    size=(6, 1)
)

In [41]:
print(a)
b

tensor([[3, 4, 1],
        [3, 3, 4],
        [1, 1, 3],
        [2, 3, 3],
        [3, 3, 4]])


tensor([[1],
        [2],
        [2],
        [2],
        [1],
        [2]])

In [42]:
a + b

RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 0

Or if we add a vector of shape $(1,p)$ to a matrice of dimensions $(m, n)$

In [43]:
c = torch.randint(
    low=1,
    high=3,
    size=(1, 6)
)

In [44]:
print(a)
c

tensor([[3, 4, 1],
        [3, 3, 4],
        [1, 1, 3],
        [2, 3, 3],
        [3, 3, 4]])


tensor([[1, 2, 2, 2, 2, 2]])

In [45]:
a + c

RuntimeError: The size of tensor a (3) must match the size of tensor b (6) at non-singleton dimension 1

As we can see, the dimensions must be the same.

### Matrix Multiplication

There are multiple different ways to multiply tensors or matrices. A few of them are:
 - Element-wise multiplication
 - Matrix multiplication

#### Matrix multiplication definition

Matrix multiplication is given by:

![Matrix multiplication](https://raw.githubusercontent.com/matheusfvesco/pytorch-studies/dev/assets/LADR4e/Matrix%20multiplication.png)
![Matrix multiplication 2](https://raw.githubusercontent.com/matheusfvesco/pytorch-studies/dev/assets/LADR4e/Matrix%20multiplication%202.png)
![Matrix multiplication 3](https://raw.githubusercontent.com/matheusfvesco/pytorch-studies/dev/assets/LADR4e/Matrix%20multiplication%203.png)

You might remember this as the row $\times$ column multiplication.

A few properties:
 1. **Resulting tensor shape**: Considering tensors $( A )$ and $( B )$ of shape ($m \times n$) and ($n \times p$), then $( A * B )$ is a tensor of shape ($m \times p$). In other words, the shape of the resulting tensor is the row dimension of the first tensor and the column dimension of the second tensor. 
 
 $$
 A_{m,n} \times B_{n,p} = AB_{m,p}
 $$
 
 2. **Internal dimensions "rule"**: Matrix multiplication can only be done if the number of rows from the first tensor is equal to the number of columns from the second tensor. Which means that in the example above, $( m )$ and $( p )$ can be of any value, as long as $( n )$ represents both the rows of the first tensor and the columns of the second tensor.

 $$
 A_{m,n} \times B_{n,p} = AB_{m,p} \implies n_A = n_B
 $$
 
 3. **Non-commutative**: Matrix multiplication is not commutative. This means that $( A )\times ( B )$ might not yield the same result as $( B )\times ( A)$. In other words, the order of the tensors does matter.

 4. **Associative**: Matrix multiplication is [associative](https://www.youtube.com/watch?v=8Ryfe82DTcM). This means that: 

 $$
 (A B) C = A (B C)
 $$

 5. **Distributive**: Matrix multiplication is [distributive](https://www.varsitytutors.com/hotmath/hotmath_help/topics/distributive-property-of-matrices). This means that:
 
 $$
 A (B + C) = AB + AC
 $$



You might be able to visualize the result of matrix multiplication as follows:
$$
A = \begin{bmatrix}
a & b \\[0.3em]
c & d
\end{bmatrix},
B = \begin{bmatrix}
e & f \\[0.3em]
g & h
\end{bmatrix}
$$
The matrix multiplication (AB) of A and B is defined as:
$$
AB = \begin{bmatrix}
ae + bg & af + bh \\[0.3em]
ce + dg & cf + dh
\end{bmatrix}
$$

Example of a matrix multiplication:
Considering the following matrices:
$$
A = \begin{bmatrix}
1 & 2 \\[0.3em]
3 & 4 \\[0.3em]
5 & 6
\end{bmatrix},
B = \begin{bmatrix}
7 & 8 & 9 & 10 \\[0.3em]
11 & 12 & 13 & 14
\end{bmatrix}
$$

Their multiplication would result:
$$
A \times B = \begin{bmatrix}
      \textcolor{orange}{1} & \textcolor{orange}{2} \\[0.3em]
      \textcolor{orange}{3} & \textcolor{orange}{4} \\[0.3em]
      \textcolor{orange}{5} & \textcolor{orange}{6}
      \end{bmatrix} \times \begin{bmatrix}
      \textcolor{blue}{7} & \textcolor{blue}{8} & \textcolor{blue}{9} & \textcolor{blue}{10} \\[0.3em]
      \textcolor{blue}{11} & \textcolor{blue}{12} & \textcolor{blue}{13} & \textcolor{blue}{14}
      \end{bmatrix} = \begin{bmatrix}
      \textcolor{orange}{1}*\textcolor{blue}{7} + \textcolor{orange}{2}*\textcolor{blue}{11} & \textcolor{orange}{1}*\textcolor{blue}{8} + \textcolor{orange}{2}*\textcolor{blue}{12} & \textcolor{orange}{1}*\textcolor{blue}{9} + \textcolor{orange}{2}*\textcolor{blue}{13} & \textcolor{orange}{1}*\textcolor{blue}{10} + \textcolor{orange}{2}*\textcolor{blue}{14} \\[0.3em]
      \textcolor{orange}{3}*\textcolor{blue}{7} + \textcolor{orange}{4}*\textcolor{blue}{11} & \textcolor{orange}{3}*\textcolor{blue}{8} + \textcolor{orange}{4}*\textcolor{blue}{12} & \textcolor{orange}{3}*\textcolor{blue}{9} + \textcolor{orange}{4}*\textcolor{blue}{13} & \textcolor{orange}{3}*\textcolor{blue}{10} + \textcolor{orange}{4}*\textcolor{blue}{14} \\[0.3em]
      \textcolor{orange}{5}*\textcolor{blue}{7} + \textcolor{orange}{6}*\textcolor{blue}{11} & \textcolor{orange}{5}*\textcolor{blue}{8} + \textcolor{orange}{6}*\textcolor{blue}{12} & \textcolor{orange}{5}*\textcolor{blue}{9} + \textcolor{orange}{6}*\textcolor{blue}{13} & \textcolor{orange}{5}*\textcolor{blue}{10} + \textcolor{orange}{6}*\textcolor{blue}{14}
      \end{bmatrix} \\[0.8em]
      = \begin{bmatrix}
      29 & 32 & 35 & 38 \\
      65 & 72 & 79 & 86 \\
      101 & 112 & 123 & 134
      \end{bmatrix}
$$

You can check this [interactive visualizer](http://matrixmultiplication.xyz/) or see this [blog post](https://pytorch.org/blog/inside-the-matrix/) to help you visualize how matrix multiplication works.

#### Matrix and Tensor Multiplication in Pytorch

In PyTorch, matrix multiplication can be achieved by:

```python
import torch
# considering a and b tensors of shape (m x n) and (n x p), respectively:
torch.matmul(a, b) # with broadcasting
a @ b # same as torch.matmul

torch.mm(a, b) # without broadcasting
```

In [6]:
import torch

# 3x2
a = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]])
# 2x1
b = torch.tensor([[5],
                  [1]])
print(a.shape)
print(b.shape)
multiplication = a @ b
print()

# 3x1, as expected
print(multiplication.shape)
print(multiplication)

# proves @ operator is a shorthand for torch.mm
print(torch.all(torch.matmul(a, b) == a @ b))
print(torch.all(torch.mm(a, b) == a @ b))

torch.Size([3, 2])
torch.Size([2, 1])

torch.Size([3, 1])
tensor([[ 7],
        [19],
        [31]])
tensor(True)
tensor(True)


In [7]:
import torch

a = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]])
b = torch.tensor([[5],
                  [1]])
c = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]])
d = torch.tensor([[1, 2, 3], 
                  [4, 5, 6]])

print(a.shape)
print(b.shape)
print(c.shape)
print(d.shape)

print()

# 3x2 and 2x1
print(torch.mm(a, b))
# same as above (in this case)
print(torch.matmul(a, b))

print()

# 3x2 and 3x2
print(torch.mul(a, c))
# shorthand for above
print(a*c)

print()

# 3x2 and 2x3
print(torch.matmul(a, d))
print(torch.mm(a, d))

torch.Size([3, 2])
torch.Size([2, 1])
torch.Size([3, 2])
torch.Size([2, 3])

tensor([[ 7],
        [19],
        [31]])
tensor([[ 7],
        [19],
        [31]])

tensor([[ 1,  4],
        [ 9, 16],
        [25, 36]])
tensor([[ 1,  4],
        [ 9, 16],
        [25, 36]])

tensor([[ 9, 12, 15],
        [19, 26, 33],
        [29, 40, 51]])
tensor([[ 9, 12, 15],
        [19, 26, 33],
        [29, 40, 51]])


Great, we verified all of these work as expected. You should play around and test with more tensors.

#### When Matrix Multiplication Fails

Let's now check what happens when we multiply tensors with:
 ##### 1. Different internal dimensions

 If we multiply tensors $A$ $(m_A \times n_A)$ and $C$ $(n_C \times p_C)$ and the internal dimensions of the two matrices are not equal $(n_A \neq n_C)$;

In [8]:
# 3x2 and 3x2
print(torch.matmul(a, c))

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

In [3]:
# 3x2 and 3x2
print(torch.mm(a, c))

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

##### 2. Elementwise (Hadamard) multiplication of Tensors with different dimensions

If we use elementwise multiplication $A \odot B$ and the dimensions of the tensors are not equal $(A_{m,n}, B_{p,q} \implies m \neq p \ \text{or} \ n \neq q)$: 

In [4]:
# 3x2 and 2x1
print(a * b)

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0

In [5]:
# 3x2 and 2x1
print(torch.mul(a, b))

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0

# Tensor Basic Operations

Until now we have worked with *2-dimensional matrices*. But in this section we are going to see that things get a little trickier as we add more dimensions and move to math operations using **n-dimensional matrices**, also known as **tensors**.

## Tensor Addition/Subtraction and Hadamard Product

Let's start with the basics.

### Tensor Addition/Subtraction and Hadamard product with a Scalar

In [104]:
import torch
torch.manual_seed(42)
a = torch.randint(
    low=1,
    high=3,
    size=(2, 3, 4)
)
print(a)
a + 5

tensor([[[1, 2, 1, 1],
         [1, 2, 1, 1],
         [1, 2, 1, 1]],

        [[1, 1, 2, 1],
         [2, 2, 2, 1],
         [2, 1, 2, 2]]])


tensor([[[6, 7, 6, 6],
         [6, 7, 6, 6],
         [6, 7, 6, 6]],

        [[6, 6, 7, 6],
         [7, 7, 7, 6],
         [7, 6, 7, 7]]])

In [105]:
print(a)
a - 5

tensor([[[1, 2, 1, 1],
         [1, 2, 1, 1],
         [1, 2, 1, 1]],

        [[1, 1, 2, 1],
         [2, 2, 2, 1],
         [2, 1, 2, 2]]])


tensor([[[-4, -3, -4, -4],
         [-4, -3, -4, -4],
         [-4, -3, -4, -4]],

        [[-4, -4, -3, -4],
         [-3, -3, -3, -4],
         [-3, -4, -3, -3]]])

In [106]:
print(a)
a * 2

tensor([[[1, 2, 1, 1],
         [1, 2, 1, 1],
         [1, 2, 1, 1]],

        [[1, 1, 2, 1],
         [2, 2, 2, 1],
         [2, 1, 2, 2]]])


tensor([[[2, 4, 2, 2],
         [2, 4, 2, 2],
         [2, 4, 2, 2]],

        [[2, 2, 4, 2],
         [4, 4, 4, 2],
         [4, 2, 4, 4]]])

As we can see, it all works as expected. We now encourage you to mess a little with tensor dimension and scalars

In [107]:
# Test out some operations of n-dimensional tensors with scalars

### Tensor Element-wise addition/subtraction and Hadamard product

Now let's mess a little bit with simpler tensor to tensor operations

In [115]:
torch.manual_seed(42)
a = torch.randint(
    low=1,
    high=3,
    size=(2, 3, 4)
)
b = torch.randint(
    low=4,
    high=6,
    size=(2, 3, 4)
)
print(a)
print(b)

tensor([[[1, 2, 1, 1],
         [1, 2, 1, 1],
         [1, 2, 1, 1]],

        [[1, 1, 2, 1],
         [2, 2, 2, 1],
         [2, 1, 2, 2]]])
tensor([[[5, 5, 5, 5],
         [5, 5, 4, 4],
         [5, 5, 5, 4]],

        [[5, 4, 4, 4],
         [4, 4, 5, 5],
         [5, 5, 5, 4]]])


In [116]:
print(a)
print(b)
a + b

tensor([[[1, 2, 1, 1],
         [1, 2, 1, 1],
         [1, 2, 1, 1]],

        [[1, 1, 2, 1],
         [2, 2, 2, 1],
         [2, 1, 2, 2]]])
tensor([[[5, 5, 5, 5],
         [5, 5, 4, 4],
         [5, 5, 5, 4]],

        [[5, 4, 4, 4],
         [4, 4, 5, 5],
         [5, 5, 5, 4]]])


tensor([[[6, 7, 6, 6],
         [6, 7, 5, 5],
         [6, 7, 6, 5]],

        [[6, 5, 6, 5],
         [6, 6, 7, 6],
         [7, 6, 7, 6]]])

In [117]:
print(a)
print(b)

a - b

tensor([[[1, 2, 1, 1],
         [1, 2, 1, 1],
         [1, 2, 1, 1]],

        [[1, 1, 2, 1],
         [2, 2, 2, 1],
         [2, 1, 2, 2]]])
tensor([[[5, 5, 5, 5],
         [5, 5, 4, 4],
         [5, 5, 5, 4]],

        [[5, 4, 4, 4],
         [4, 4, 5, 5],
         [5, 5, 5, 4]]])


tensor([[[-4, -3, -4, -4],
         [-4, -3, -3, -3],
         [-4, -3, -4, -3]],

        [[-4, -3, -2, -3],
         [-2, -2, -3, -4],
         [-3, -4, -3, -2]]])

In [118]:
print(a)
print(b)

a * b

tensor([[[1, 2, 1, 1],
         [1, 2, 1, 1],
         [1, 2, 1, 1]],

        [[1, 1, 2, 1],
         [2, 2, 2, 1],
         [2, 1, 2, 2]]])
tensor([[[5, 5, 5, 5],
         [5, 5, 4, 4],
         [5, 5, 5, 4]],

        [[5, 4, 4, 4],
         [4, 4, 5, 5],
         [5, 5, 5, 4]]])


tensor([[[ 5, 10,  5,  5],
         [ 5, 10,  4,  4],
         [ 5, 10,  5,  4]],

        [[ 5,  4,  8,  4],
         [ 8,  8, 10,  5],
         [10,  5, 10,  8]]])

As expected, everything work like what we saw with 2-dimensional matrices. Now things will get a little trickier. Thankfully, most of the math operations we already saw form the bulk of what you will see when reading through code for deep learning.

You might encounter Tensor Dot Product in a LOT of situations, especially when dealing with attention. We will cover it briefly in the next section, and encourage you to experiment and try to figure out the patterns by yourself.

## Tensor Dot Product

Until now we've worked with 2-dimensional matrices, which did not work as a simple addition or Hadamard product as we initially expected. What if we wanted to multiply n-dimensional matrices (A.K.A. tensors)? In this section we will explore a little bit about Tensor products.

In [9]:
torch.manual_seed(42)
a = torch.randint(
    low=1,
    high=3,
    size=(2, 2, 3)
)
a

tensor([[[1, 2, 1,  ..., 2, 2, 2],
         [2, 2, 1,  ..., 2, 1, 2],
         [2, 1, 1,  ..., 2, 1, 2],
         ...,
         [2, 2, 1,  ..., 1, 1, 1],
         [1, 2, 1,  ..., 2, 2, 2],
         [2, 1, 1,  ..., 1, 2, 2]],

        [[2, 1, 1,  ..., 1, 1, 1],
         [1, 2, 2,  ..., 1, 2, 1],
         [1, 2, 1,  ..., 1, 1, 2],
         ...,
         [1, 1, 1,  ..., 2, 1, 1],
         [1, 1, 2,  ..., 2, 2, 1],
         [1, 1, 2,  ..., 1, 1, 2]],

        [[1, 2, 2,  ..., 2, 2, 2],
         [2, 1, 2,  ..., 2, 2, 2],
         [2, 1, 1,  ..., 2, 1, 2],
         ...,
         [1, 2, 1,  ..., 2, 2, 2],
         [1, 1, 1,  ..., 2, 2, 2],
         [1, 1, 2,  ..., 2, 2, 1]]])

In [120]:
torch.manual_seed(42)
b = torch.randint(
    low=5,
    high=10,
    size=(2, 2, 3)
)
b

tensor([[[7, 7, 6],
         [9, 6, 5]],

        [[5, 9, 5],
         [8, 8, 9]]])

In [122]:
print(a.shape)
print(b.shape)
c = a @ b
c

torch.Size([2, 2, 3])
torch.Size([2, 2, 3])


RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [2, 3] but got: [2, 2].

Well, let's update b to the desired dimension and see

In [39]:
b = torch.randint(
    low=5,
    high=10,
    size=(2, 3, 2)
)
b

tensor([[[9, 8],
         [6, 9],
         [7, 9]],

        [[7, 5],
         [5, 9],
         [8, 9]]])

In [40]:
print(a.shape)
print(b.shape)
c = a @ b
c

torch.Size([2, 2, 3])
torch.Size([2, 3, 2])


tensor([[[28, 35],
         [29, 35]],

        [[20, 23],
         [27, 28]]])

In [41]:
c.shape

torch.Size([2, 2, 2])

In [42]:
b = torch.randint(
    low=5,
    high=10,
    size=(2, 3, 3)
)
b

tensor([[[9, 6, 7],
         [5, 6, 7],
         [7, 9, 7]],

        [[8, 8, 9],
         [8, 7, 5],
         [9, 5, 9]]])

In [43]:
print(a.shape)
print(b.shape)
c = a @ b
c

torch.Size([2, 2, 3])
torch.Size([2, 3, 3])


tensor([[[26, 27, 28],
         [28, 30, 28]],

        [[25, 20, 23],
         [33, 28, 32]]])

In [44]:
c.shape

torch.Size([2, 2, 3])

### Exploring the difference in the shape

In [54]:
a = torch.randint(
    low=5,
    high=10,
    size=(6, 7, 3, 8)
)
b = torch.randint(
    low=5,
    high=10,
    size=(6, 7, 8, 9)
)
c = a @ b
print(c.shape)
c

torch.Size([6, 7, 9, 9])


tensor([[[[362, 364, 365,  ..., 432, 350, 394],
          [404, 407, 391,  ..., 474, 390, 432],
          [365, 374, 361,  ..., 431, 356, 399],
          ...,
          [350, 363, 351,  ..., 413, 346, 374],
          [324, 326, 327,  ..., 384, 313, 355],
          [327, 333, 331,  ..., 378, 318, 343]],

         [[442, 419, 441,  ..., 491, 449, 422],
          [362, 346, 368,  ..., 409, 362, 349],
          [401, 386, 383,  ..., 447, 390, 390],
          ...,
          [373, 339, 371,  ..., 410, 365, 353],
          [457, 428, 459,  ..., 514, 453, 438],
          [419, 391, 408,  ..., 459, 420, 395]],

         [[453, 426, 402,  ..., 425, 424, 469],
          [422, 388, 372,  ..., 396, 388, 431],
          [422, 385, 365,  ..., 392, 382, 424],
          ...,
          [384, 362, 348,  ..., 359, 369, 398],
          [427, 391, 372,  ..., 399, 394, 438],
          [413, 382, 362,  ..., 380, 382, 432]],

         ...,

         [[308, 353, 402,  ..., 364, 401, 388],
          [325, 372, 4

In [72]:
a = torch.randint(
    low=5,
    high=10,
    size=(6, 7, 8, 8, 10)
)
b = torch.randint(
    low=5,
    high=10,
    size=(6, 7, 8, 10, 11)
)
c = a @ b
print(c.shape)
c

torch.Size([6, 7, 8, 8, 11])


tensor([[[[[474, 503, 483,  ..., 443, 517, 476],
           [497, 548, 519,  ..., 471, 554, 516],
           [503, 553, 534,  ..., 489, 554, 509],
           ...,
           [480, 528, 496,  ..., 454, 532, 492],
           [493, 522, 503,  ..., 474, 542, 503],
           [511, 566, 531,  ..., 486, 553, 511]],

          [[474, 506, 519,  ..., 448, 463, 525],
           [457, 504, 502,  ..., 433, 472, 507],
           [440, 486, 490,  ..., 416, 440, 492],
           ...,
           [450, 494, 491,  ..., 431, 452, 501],
           [486, 533, 517,  ..., 458, 494, 539],
           [462, 490, 483,  ..., 435, 469, 502]],

          [[482, 503, 514,  ..., 537, 460, 520],
           [464, 491, 508,  ..., 541, 464, 506],
           [446, 456, 469,  ..., 495, 422, 473],
           ...,
           [458, 477, 496,  ..., 513, 434, 501],
           [430, 441, 457,  ..., 487, 414, 476],
           [442, 472, 480,  ..., 495, 426, 476]],

          ...,

          [[504, 523, 468,  ..., 519, 453, 474],

In [74]:
a = torch.randint(
    low=5,
    high=10,
    size=(5, 3, 4)
)
b = torch.randint(
    low=5,
    high=10,
    size=(5, 4, 7)
)
c = a @ b
print(c.shape)
c

torch.Size([5, 3, 7])


tensor([[[165, 151, 177, 170, 182, 177, 150],
         [167, 151, 176, 171, 181, 175, 151],
         [216, 193, 220, 215, 235, 223, 192]],

        [[224, 228, 200, 168, 237, 184, 208],
         [184, 182, 169, 135, 200, 149, 162],
         [244, 244, 217, 184, 262, 201, 229]],

        [[182, 168, 162, 174, 193, 180, 173],
         [245, 228, 219, 238, 264, 248, 240],
         [229, 216, 207, 227, 250, 235, 228]],

        [[229, 225, 174, 171, 193, 188, 222],
         [205, 197, 177, 165, 174, 188, 201],
         [241, 237, 196, 190, 209, 206, 240]],

        [[219, 160, 223, 174, 253, 212, 184],
         [205, 155, 214, 171, 253, 213, 182],
         [165, 125, 171, 137, 203, 170, 146]]])

considering tensors of shapes:

$$A: (m_1, m_2, ..., m_{n-1}, m_n),$$
$$\ B: (p_1, p_2, ...,p_{n-1}, p_n)$$

For the tensor multiplication to be valid, all of the following criteria must be satisfied:
1. *All shapes preceding $m_{n-1}$ must match all the shapes preceding $p_{n-1}$*: 

$$[m_1, m_2, ..., m_{n-2}] = [p_1, p_2, ..., p_{n-2}]$$

2. The last dimension of the first tensor ($m_n$) must match the dimension preceding ($p_{n-1}$) the second tensor's last dimension: 
$$m_n = p_{n-1}$$

The resulting tensor dimensions will be:
$$C: (m_1, m_2, ..., m_{n-1}, p_{n})$$

# Quick summary

## Basic elementwise operations

Considering $a$ a tensor of dimensions $(m, n)$, and $b$ a scalar or tensor with same dimensions as $a$, all of the following operations are possible:

- a + b
- a - b
- a * b $\rightarrow$ Hadamard product
- a / b

## Matrix and Tensor dot product

### For bi-dimensional tensors (matrices):

Considering $a$ a **bi-dimensional**tensor of dimensions $(m, n)$ and $b$ a tensor of dimensions $(n, p)$, the following criteria must be true for the dot product to be valid:

- The number of columns in $a$ must be equal to the number of rows in $b$, meaning $\rightarrow$ $n_a = n_b$

The following are expected:

- The resulting tensor $c$ will have the dimensions $(m, p)$
- $a \times b$ might not be equal to $b \times a$