### Linear Alzebra

To build sophisticated models we will be needing tools from `Linear Algebra`

$x$ is a scalar : lower case

vectors as fixed-length array of scalars. Basically scalars are elements of vectors

Response : Loan Default
Features : [Income, Length of employment, Previous Defaults] -> Vector of features

In [1]:
import torch

In [5]:
# Scalars
x = torch.tensor(1.0)
y = torch.tensor(2.0)
z = x + y
print(z)  # Output: tensor(3.)

# Vectors
vec = torch.tensor([1.0, 2.0, 3.0])
vec

tensor(3.)


tensor([1., 2., 3.])

In [None]:
# Matrices R \belong 3*3 -> 3*3 Scalars aranged as 3 rows and 3 columns
mat = torch.arange(10).reshape(2,5)
print(mat)

# Transpose
print(mat.T) # Columns <-> Rows

tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
tensor([[0, 5],
        [1, 6],
        [2, 7],
        [3, 8],
        [4, 9]])


Symmetric matrices $A = A^T$

In [11]:
a = torch.ones((2,2))
b = a.T
a == b # Symmetric matricess

tensor([[True, True],
        [True, True]])

In [16]:
# Tensors can have arbitary number of axes
# Image as a tensor : Height, width & channel, collection of image as 4th order tensor

# Tensor Arithmetic
a = torch.tensor([[1, 2], [3, 4]])
b = a.clone() # new memory location assigned
a, a + b

# Hadamard Product
a * b # Same shape

tensor([[ 1,  4],
        [ 9, 16]])

Elementwise product of tensors is called hadamard product

\begin{split}\mathbf{A} \odot \mathbf{B} =
\begin{bmatrix}
    a_{11}  b_{11} & a_{12}  b_{12} & \dots  & a_{1n}  b_{1n} \\
    a_{21}  b_{21} & a_{22}  b_{22} & \dots  & a_{2n}  b_{2n} \\
    \vdots & \vdots & \ddots & \vdots \\
    a_{m1}  b_{m1} & a_{m2}  b_{m2} & \dots  & a_{mn}  b_{mn}
\end{bmatrix}.\end{split}

In [31]:
# Reduction
a.sum() , a.sum(axis = 0) # Reduce the row dimensions
a.sum(axis = [0,1]) # Reduce all dimensions
b = torch.arange(4, dtype = torch.float32).reshape(2,2)
b.mean(axis = 0) # Mean of each column

# Non reduction : Useful to keep the dimensions
a.sum(axis = 1, keepdims = True) # Keep the dimensions of the columns, useful for broadcasting purposes

tensor([[3],
        [7]])

#### Dot product
$\mathbf{x}^\top \mathbf{y} = \sum_{i=1}^{d} x_i y_i$
Product then product of the terms are summed.

In [38]:
x = torch.ones(3, dtype = torch.float32)
y = torch.ones(3, dtype = torch.float32)

x, y, torch.dot(x, y) , torch.sum(x * y) # Equivalently

# Uses 
# Weighted sum of values as dot product
# sum(weights) = 1, dot product express weighted average
# after normalizing, dot product represent the cosine angel between them.

(tensor([1., 1., 1.]), tensor([1., 1., 1.]), tensor(3.), tensor(3.))

### Matrix Vector product

Row mulitplied by the column of the vectors - nothing special

\begin{split}\mathbf{A}\mathbf{x}
= \begin{bmatrix}
\mathbf{a}^\top_{1} \\
\mathbf{a}^\top_{2} \\
\vdots \\
\mathbf{a}^\top_m \\
\end{bmatrix}\mathbf{x}
= \begin{bmatrix}
 \mathbf{a}^\top_{1} \mathbf{x}  \\
 \mathbf{a}^\top_{2} \mathbf{x} \\
\vdots\\
 \mathbf{a}^\top_{m} \mathbf{x}\\
\end{bmatrix}.\end{split}


#### Matrix Mulitplication

\begin{split}\mathbf{C} = \mathbf{AB} = \begin{bmatrix}
\mathbf{a}^\top_{1} \\
\mathbf{a}^\top_{2} \\
\vdots \\
\mathbf{a}^\top_n \\
\end{bmatrix}
\begin{bmatrix}
 \mathbf{b}_{1} & \mathbf{b}_{2} & \cdots & \mathbf{b}_{m} \\
\end{bmatrix}
= \begin{bmatrix}
\mathbf{a}^\top_{1} \mathbf{b}_1 & \mathbf{a}^\top_{1}\mathbf{b}_2& \cdots & \mathbf{a}^\top_{1} \mathbf{b}_m \\
 \mathbf{a}^\top_{2}\mathbf{b}_1 & \mathbf{a}^\top_{2} \mathbf{b}_2 & \cdots & \mathbf{a}^\top_{2} \mathbf{b}_m \\
 \vdots & \vdots & \ddots &\vdots\\
\mathbf{a}^\top_{n} \mathbf{b}_1 & \mathbf{a}^\top_{n}\mathbf{b}_2& \cdots& \mathbf{a}^\top_{n} \mathbf{b}_m
\end{bmatrix}.\end{split}



In [None]:
x = torch.arange(6, dtype = torch.float32).reshape(2,3)
y = torch.tensor([2.0, 3.0, 4.0], dtype = torch.float32)

x, y, torch.mv(x,y), x@y # Each row mulitplied by the column vector

# @ is the matrix convenience operator - mat vec / mat mat

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([2., 3., 4.]),
 tensor([11., 38.]),
 tensor([11., 38.]))

In [54]:
x = torch.arange(6, dtype = torch.float32).reshape(2,3)
y = torch.arange(6, dtype = torch.float32).reshape(3,2)

y,x,y@x

(tensor([[0., 1.],
         [2., 3.],
         [4., 5.]]),
 tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([[ 3.,  4.,  5.],
         [ 9., 14., 19.],
         [15., 24., 33.]]))

#### Norms - most useful operators
- Tells us how big the vector is : Measures the euclidean distance

Defination : Norm is a function vector -> scalar
- $\|\alpha \mathbf{x}\| = |\alpha| \|\mathbf{x}\|.$
- $\|\mathbf{x} + \mathbf{y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\|.$ Triange inequality
- $\|\mathbf{x}\| > 0 \textrm{ for all } \mathbf{x} \neq 0.$

Different norms encode different notions of size.

Euclidean Norm : $\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2}.$
Manhattan Norm : $\|x|_1 = \sum_{i=1}^n |x_i|$

$l_p$ Norm : $\|\mathbf{x}\|_p = \left(\sum_{i=1}^n \left|x_i \right|^p \right)^{1/p}.$

In matrix norms are complicated because they can be viewed as collection of vectors. 
- Spectral Norm 
- Frobenius Norm : $\|\mathbf{X}\|_\textrm{F} = \sqrt{\sum_{i=1}^m \sum_{j=1}^n x_{ij}^2}.$

Behaves as if the $l_2$ norm


#### Optimization problems
Often involve in either maximizing or minimizing of the certain things, often those certain things are distances and distances are represented by norms. 

In [67]:
x = torch.arange(6, dtype = torch.float32)
torch.norm(x), torch.abs(x).sum() # L2 / L1

y = torch.arange(12, dtype = torch.float32).reshape(3,4)
torch.norm(y) # L1 norm of each row

tensor(22.4944)

1. $(A^T)^T = A$
2. $A^T + B^T = (A+B)^T$
5. Distance i need to cover is the sum of distances of all streets and avenues
10. $(AB)C$ or $A(BC)$ No difference in terms of memory
    - (AB)C : 2^10 * 2^14
    - A(BC) : 2^10 * 2^14
    - Speed yes : A(BC) < (AB)C : Depending upon the size of the intermediate matrix

In [None]:
# Question 1
a = torch.arange(6, dtype = torch.float32).reshape(2,3)
b = a.T
c = b.T
a==c

# Question 2
a = torch.arange(6, dtype = torch.float32).reshape(2,3)
b = torch.arange(6, dtype = torch.float32).reshape(2,3)
(a.T + b.T) == (a + b).T # Sum of transpose is equal to transpose of sum

# Question 3 

len(torch.zeros((2,3,4))) # 2 Blocks of 3*4, axis = 0
torch.numel(torch.zeros((2,3,4))) # 2*3*4 = 24

# Question 4
a = torch.arange(6, dtype = torch.float32).reshape(2,3)
# a / a.sum(axis = 1) # because of dimension reduction we cannot do the operation

# Question 8
a = torch.arange(24, dtype = torch.float32).reshape(2,3,4)
a.sum(axis = 0), a.sum(axis = 1), a.sum(axis = 2) # Basically sum across that axis - axis gone.

# Question 12
a = torch.ones((100,200))
b = torch.ones((100,200))
c = torch.ones((100,200))

torch.cat([a,b,c]).shape # 300 * 200
torch.cat([a,b,c], axis = 1).shape # 100 * 600, stacked along the columns

torch.Size([100, 600])