## 2.3. Linear Algebra

*Studying and coding along with the printed book __„Dive into Deep Learning“__ by Aston Zhang, Zachary C. Lipton, Mu Li & Alexander J. Smola. The accompanying website for the chapter Preliminaries > Linear Algebra can be found at [d2l.ai](https://d2l.ai/chapter_preliminaries/linear-algebra.html).*

__In order to build sophisticated models with tensors we will need some knowledge of linear algebra. *There's no way around it :)*__

In [2]:
import torch

### 2.3.1. Scalars

- The values in mathematical operations are called __scalars__
- Known values (like 5 or 9 in an equation) are __constant scalars__. Unknow variables (like c or f in an equation) represent __unknown scalars__
- Scalars are denoted by lower case letters like x, y or z
- The space of all (continuous) real-valued scalars is <math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow data-mjx-texclass="ORD">
    <mi mathvariant="double-struck">R</mi>
  </mrow>
</math>
- <math xmlns="http://www.w3.org/1998/Math/MathML">
  <mi>x</mi>
  <mo>&#x2208;</mo>
  <mrow data-mjx-texclass="ORD">
    <mi mathvariant="double-struck">R</mi>
  </mrow>
</math> is a formal way to say that x is a real-valued scalar
- The symbol <math xmlns="http://www.w3.org/1998/Math/MathML">
  <mo>&#x2208;</mo>
</math> (pronounced “in”) denotes membership in a set
- <math xmlns="http://www.w3.org/1998/Math/MathML">
  <mi>x</mi>
  <mo>,</mo>
  <mi>y</mi>
  <mo>&#x2208;</mo>
  <mo fence="false" stretchy="false">{</mo>
  <mn>0</mn>
  <mo>,</mo>
  <mn>1</mn>
  <mo fence="false" stretchy="false">}</mo>
</math> for example, indicates that x and y are variables that can only take values 0 or 1.

In [3]:
# scalars are implemented as tensors that contain only one element
x = torch.tensor(3.0)
y = torch.tensor(2.0)

In [4]:
# performing the addition, multiplication, division, and exponentiation operations
x + y

tensor(5.)

In [5]:
x - y

tensor(1.)

In [6]:
x * x

tensor(9.)

In [7]:
x / y

tensor(1.5000)

In [8]:
x ** y

tensor(9.)

### 2.3.2. Vectors

- A vector as like a fixed-length array of scalars
- These scalars are the elements of the vector (synonyms: entries or components)
- As an example, studying the risk of heart attack: Each patient might get a vector assigned with the elements "most recent vital signs", "cholesterol levels" or "minutes of exercise per day"
- In the book vectors are denoted by bold lowercase letters like **x**, **y** or **z**
- Vectors are implemented as 1<sup>st</sup>-order tensors which can have arbitrary length
- Python vector indices start at 0 (zero-based indexing)
- In linear algebra subscripts begin at 1 (one-based indexing)
- By default vectors are visualized by stacking their elements __vertically__
- In general there are column vectors and row vectors whose elements are stacked horizontally

In [9]:
x = torch.arange(3)
x

tensor([0, 1, 2])

The elements of a vector can be denoted by using a subscript.

<img src="../assets/images/0231_vector.png" style="width:150px;vertical-align:middle" />

x<sub>2</sub> (a scalar), denotes the second element of vector **x**. (But we would access it in Python with x[1].)

The vector contains n elements (n is the dimesionality of the vector): x ⋲ ℝ<sup>n</sup>

In [10]:
# acessing a tensors element via indexing
x[2]

tensor(2)

In [11]:
# a tensor’s length is accessible via Python’s built-in len function
len(x)

3

In [12]:
# accessing the length via the shape attribute
# it returns a tuple that indicates a tensor’s length along each axis
# tensors with just one axis have shapes with just one element
x.shape

torch.Size([3])

__Clarifying the use of the word “dimension”:__

- “dimension” is often used to mean both, the number of axes and the length along a particular axis
- in this book (or tutorial), <span style="color:red">
  - ***“order” is used to refer to the number of axes***
  - ***dimensionality exclusively is used to refer to the number of components***</span>

### 2.3.3. Matrices

- Scalars are 0<sup>th</sup>-order tensors
- Vectors are 1<sup>st</sup>-order tensors
- Matrices are 2<sup>nd</sup>-order tensors

- Matrices are denoted by bold capital letters (e.g., **X**, **Y** or **Z**)
- In code the are reprsesentd by tensors with two axes
- Matrices are often used for representing datasets with rows corresponding to individual records and columns corresponding to attributes

- A matrix **A** containing m * n real-valued scalars is expressed as **A** ⋲ ℝ<sup>m * n</sup>
- The scalars are arranged as m rows and n columns
- A matrix is *square* when m = n

Visual representation of a matrix as a table:

<img src="../assets/images/0232_matrix.png" style="width:300px;vertical-align:middle" />

Referring to an individal element: a <sub>ij</sub> is the value at **A**'s i<sup>th</sup> and j<sup>th</sup> column.

In code a matrix **A** ⋲ ℝ<sup>m * n</sup> is represented by a 2<sup>nd</sup> order tensor with shape (m, n).

In [13]:
A = torch.arange(6).reshape(3,2)
A

tensor([[0, 1],
        [2, 3],
        [4, 5]])

#### __Transpose___

In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix **A*+ by producing another matrix, often denoted by **AT** (among other notations) (Source: https://en.wikipedia.org/wiki/Transpose).

The transpose of a *m * n* matrix is a *n * m* matrix.

A matrix **A**'s transpose is signifyd by **A<sup>T</sup>**. If **B** = **A<sup>T</sup>** then b<sub>ij</sub> = a<sub>ji</sub>.

<img src="../assets/images/0233_matrix_transpose.png" style="width:400px;vertical-align:middle" />


In [14]:
# accessing any matrix’s transpose
A.T

tensor([[0, 2, 4],
        [1, 3, 5]])

Symmetric matrices are the subset of square matrices that are equal to their own transposes: **A** = **A<sup>T</sup>**. 

In [15]:
# example of a symmetric matrix
A = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
A

tensor([[1, 2, 3],
        [2, 0, 4],
        [3, 4, 5]])

In [16]:
A == A.T

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

### 2.3.4. Tensors

- Tensors allow it to describe extensions to n<sub>th</sub>-order arrays
- Software objects of the tensor class can have an arbitrary numbers of axes
- The word tensor for both the mathematical object and its realization in code (and therefore might be a bit confusing for the novice learner)
- Tensors will become important when working with images
- Each image arrives as a 3<sub>rd</sub>-order tensor with axes corresponding to the height, width, and channel

In [17]:
torch.arange(24).reshape(2, 3, 4)

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

### 2.3.5. Basic Properties of Tensor Arithmetic

Elementwise operations with scalars, vectors, matrices and higher-order tensors produce outputs that have the same shape as their operands.

In [18]:
A = torch.arange(6, dtype=torch.float32).reshape(2, 3) # 2x3 matrix
A

tensor([[0., 1., 2.],
        [3., 4., 5.]])

In [19]:
B = A.clone() # cloning A and receiving another 2x3 matrix
B

tensor([[0., 1., 2.],
        [3., 4., 5.]])

In [20]:
# addition of two matrices
A + B

tensor([[ 0.,  2.,  4.],
        [ 6.,  8., 10.]])

The elementwise product of two matrices is called their __Hadamard product__ (⊙ symbol).

In [21]:
A * B

tensor([[ 0.,  1.,  4.],
        [ 9., 16., 25.]])

When adding a scalar to a tensor, the scalar is added to each element of the tensor. The resulting tensor has the same shape as the original tensor.

In [31]:
a = 2
X = torch.arange(24).reshape(2, 3, 4)
a + X

tensor([[[ 2,  3,  4,  5],
         [ 6,  7,  8,  9],
         [10, 11, 12, 13]],

        [[14, 15, 16, 17],
         [18, 19, 20, 21],
         [22, 23, 24, 25]]])

When multiplying a scalar and a tensor, each element of the tensor is multiplied by the scalar. The resulting tensor has the same shape as the original tensor.

In [26]:
b = 66
Y = torch.arange(66).reshape(2, 3, 11)
b * Y

tensor([[[   0,   66,  132,  198,  264,  330,  396,  462,  528,  594,  660],
         [ 726,  792,  858,  924,  990, 1056, 1122, 1188, 1254, 1320, 1386],
         [1452, 1518, 1584, 1650, 1716, 1782, 1848, 1914, 1980, 2046, 2112]],

        [[2178, 2244, 2310, 2376, 2442, 2508, 2574, 2640, 2706, 2772, 2838],
         [2904, 2970, 3036, 3102, 3168, 3234, 3300, 3366, 3432, 3498, 3564],
         [3630, 3696, 3762, 3828, 3894, 3960, 4026, 4092, 4158, 4224, 4290]]])

### 2.3.6. Reduction

Expressing the sum of the elements in a vector **x** of length *n*:

<img src="../assets/images/sum_vec_x_n.png" style="width:100px;vertical-align:middle" />

In [36]:
x = torch.arange(10, dtype=torch.float32)
x

tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [38]:
# sum of the elements in a vector x
x.sum()

tensor(45.)

Expressing sums over the elements of tensors of arbitrary shape by calculating the sums over all its axes. 
The sum of a *m * n* matrix **A**:

<img src="../assets/images/sum_over_sum.png" style="width:200px;vertical-align:middle" />

In [42]:
A = torch.arange(6, dtype=torch.float32).reshape(2, 3) # example from above
A

tensor([[0., 1., 2.],
        [3., 4., 5.]])

In [43]:
A.shape

torch.Size([2, 3])

In [44]:
A.sum()

tensor(15.)

In [48]:
print(A[0])
print(A[1])
A[0].sum() + A[1].sum()

tensor([0., 1., 2.])
tensor([3., 4., 5.])


tensor(15.)

Invoking the sum function ***reduces a tensor along all of its axes and produces a scalar***. The `sum` function takes the axes along which the tensor should be reduced as an argument (axis=0 in sum means we sum over all elements along the rows).

If we specify axis=0 in sum, the input matrix ***reduces along axis 0*** to generate the output vector. Therefore axis 0 is missing from the shape of the output vector.

In [49]:
A.shape

torch.Size([2, 3])

In [50]:
A.sum(axis=0)

tensor([3., 5., 7.])

In [51]:
A.sum(axis=0).shape

torch.Size([3])

In [54]:
# passing axis=1 as a parameter will reduce the column dimension (axis 1)
# this reduction will be done by summing up elements of all the columns
A.sum(axis=1)

tensor([ 3., 12.])

In [53]:
A.sum(axis=1).shape

torch.Size([2])

In [55]:
# reducing a matrix along both rows and columns via summation
# equivalent to summing up all the elements of the matrix
A.sum(axis=[0,1]) == A.sum()

tensor(True)

__Calculating the mean of a tensor:__

The mean (aka the average) is calculated by dividing the sum by the total number of elements.

In [58]:
mean = A.sum() / A.numel()
mean

tensor(2.5000)

In [59]:
# dedicated library function that works analogously to sum
A.mean() == mean

tensor(True)

In [62]:
# calculating the mean by reducing a tensor along specific axes
mean_axis_zero = A.sum(axis=0) / A.shape[0]
mean_axis_zero

tensor([1.5000, 2.5000, 3.5000])

In [63]:
A.mean(axis=0) == mean_axis_zero

tensor([True, True, True])

### 2.3.7. Non-Reduction Sum

Keeping the number of axes unchanged when invoking the function for calculating the sum or mean, for example when we want to use the broadcast mechanism.

In [66]:
A

tensor([[0., 1., 2.],
        [3., 4., 5.]])

In [67]:
A.shape

torch.Size([2, 3])

In [73]:
sum_A = A.sum(axis=1) # sum over axis 1 regular style, doesn't keep shape
sum_A

tensor([ 3., 12.])

In [74]:
sum_A = A.sum(axis=1, keepdims=True) # stays in shape with keepdims=True 
sum_A

tensor([[ 3.],
        [12.]])

In [69]:
sum_A.shape

torch.Size([2, 1])

In [70]:
# sum_A keeps its two axes after summing each row
# now lets divide A by sum_A with broadcasting to create a matrix where each row sums up to 1
A / sum_A

tensor([[0.0000, 0.3333, 0.6667],
        [0.2500, 0.3333, 0.4167]])

In [75]:
# calculating the cumulative sum of elements of A axis=0 (row by row)
# by calling the cumsum function
# the cumsum function doesn't reduce the input tensor along any axis
A.cumsum(axis=0)

tensor([[0., 1., 2.],
        [3., 5., 7.]])

In [76]:
# once again A for comparision's sake
A

tensor([[0., 1., 2.],
        [3., 4., 5.]])

In [79]:
B = torch.arange(12).reshape(3,4)
B

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [85]:
B.cumsum(axis=0) # last row gets replaced with cumlulative sum of all rows

tensor([[ 0,  1,  2,  3],
        [ 4,  6,  8, 10],
        [12, 15, 18, 21]])

In [82]:
C = torch.arange(12).reshape(3,4)
C

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [84]:
C.cumsum(axis=1) # last column gets replaced with cumlulative sum of all columns

tensor([[ 0,  1,  3,  6],
        [ 4,  9, 15, 22],
        [ 8, 17, 27, 38]])

### 2.3.8. Dot Products

- The dot product is one of the most fundamental operations in linear algebra
- Given two products **x, y** ⋲ ℝ<sup>d</sup>, their dot product *x <sup>T</sup> y* ) is a sum over the products of the elements at the same position:

<img src="../assets/images/0238_dot_product.png" style="width:200px;vertical-align:middle" />

- The dot product is also know as inner product, (**x**, **y**)


In [89]:
x = torch.arange(3, dtype=torch.float32)
x

tensor([0., 1., 2.])

In [87]:
y = torch.ones(3, dtype = torch.float32)
y

tensor([1., 1., 1.])

In [90]:
torch.dot(x, y)

tensor(3.)

In [92]:
z = x * y
z

tensor([0., 1., 2.])

In [93]:
z.sum() # is this how it's calculated?
# first the product, then the sum over the result?

tensor(3.)

In [94]:
# calculating the dot product of two vectors by performing an elementwise multiplication followed by a sum
torch.sum(x * y)

tensor(3.)

__Now for some very abstract explanation of what can be done with the dot product:__

- Let's say we have a vector **x** ⋲ ℝ<sup>n</sup> and a set of weights denoted by **w** ⋲ ℝ<sup>n</sup>
- Now the weighted sum of the values in **x** according to the weights **w** could be expressed as the dot product **x**<sup>T</sup>**w**
- When the weights are nonnegative and sum to 1, then the dot product expresses a weighted average
- If we normalizing two vectors to have unit length, the dot products express the cosine of the angle between them