In [4]:
import numpy as np
import torch

## Linear Algebra Basics

It refers to solving for unknowns within a system of linear equations where we can have many equations (multiple data points) and many unknowns in a equation (multiple parameters). 

Current Uses: 

- Solving for unknowns in ML/DL algorithms
- Reducing dimensionality of data while preserving information (PCA)
- Eigenvector scoring of webpages
- Recommender systems (SVD) 
- NLP like topic modelling or semantic analysis (SVD, Matrix Factorization)


There can be different number of solutions: 

- One solution (intersecting graphs)
- No solution (parallel graphs)
- Infinite solutions (overlapping graphs)

## Common Data Structures 

The most important structure for linear algbera is Tensors, which are arrays of numbers. They are the ML generalization of vectors/matrices to any number of dimensions. 

- 0 dim tensor: Scalar which has a magnitude only
- 1 dim tensor: Vector which is an array or a list of numbers
- 2 dim tensor: Matrix which is a flat table of numbers
- 3 dim tensor: Tensor which is a 3D table of numbers

Libraries: Pytorch and tensorflow are the most popular automatic differentiation libraries where Pytorch is more popular due to its pythonic tensors which behave like NumPy arrays but are better suited for parallel computation in GPUs.

### Scalars

It is a single number with no dimensions which is denoted in lowercase like $x$. Scalars are typically typed.

For pytorch, it is easy to create tensors while for tensorflow we need to use a wrapper like `tf.Variable` or `tf.constant`.

In [23]:
# scalars in pytorch
x_pt=torch.tensor(25,dtype=torch.float16)
print(x_pt)
print(type(x_pt))
print(x_pt.shape) #no dimensionality

y_pt=torch.tensor(20,dtype=torch.float16)
print(y_pt)
print(type(y_pt))
print(y_pt.shape) #no dimensionality

# adding tensors
z_pt= x_pt+ y_pt
print(z_pt)
print(type(z_pt))
print(z_pt.shape)

tensor(25., dtype=torch.float16)
<class 'torch.Tensor'>
torch.Size([])
tensor(20., dtype=torch.float16)
<class 'torch.Tensor'>
torch.Size([])
tensor(45., dtype=torch.float16)
<class 'torch.Tensor'>
torch.Size([])


### Vectors

It is an one dimensional array of numbers arranged in order which can be considered to represent a point in n-dimensional space.

In [24]:
# Vectors in Numpy 
# one dim vector
x= np.array([25,2,5])
print(x)
print(len(x))
print(x.shape)
print(type(x))

# matrix style vector
# each inner bracket is a row and number of elements within the inner bracket is number of columns
y=np.array([[25,2,5]])
print(y)
print(len(y))
print(y.shape)
print(type(y))

# zero vector
z=np.zeros(3)
print(z)

[25  2  5]
3
(3,)
<class 'numpy.ndarray'>
[[25  2  5]]
1
(1, 3)
<class 'numpy.ndarray'>
[0. 0. 0.]


#### Vector Transpose

It consists of reversing the row and column identities for each element in a vector. 

#### Vector Normalization 

It refers to dividing the elements of a vector by its norm which represents length of the vector from the origin. 

- Distance Calculation: The norm can also be used to express distances between two vectors.
- Unit Vectorization: The norm can be used to create a unit vector after normalization when length is 1. 


The general Lp Norm Formula is

$$ ||x||_p = (\sum|x_i|^p)^{1/p}  $$

There are different types of norm calculations which are shown below

#### Vector Regularization

Regularization or cost function regularization refers to the process of adding a norm-based penalty term to the cost function to control the values of model parameters/features during model training. Depending on the type of norm used in the penalty, there are different types of regularization.  


**L1/ Lasso Regression**:  When $||\beta||_1$ is added as the penalty term, its called lasso regression.  Since Lasso uses the absolute values of the coefficients, it has the ability to shrink some coefficients to exactly zero, effectively performing feature selection. This makes Lasso regression useful when you want a sparse model that selects only the most important features.

**L2/ Ridge Regression**: When $||\beta||_2$ is added as penalty, its called ridge regression. Since the penalty is based on the squared values of the coefficients, ridge tends to shrink the coefficients but does not drive any coefficients to zero. As a result, all features generally remain in the model, but their effect is reduced.

#### Orthogonal Vectors

Any two vectors can be considered orthogonal if they are at 90 degrees to each other. In terms of vector operations, their transpose dot product is zero i.e  $x^T.y=0$. 

- For any n-dimensional vector space, there are a maximum n orthogonal vectors (assuming non-zero norms).
- Orthonormal vectors are orthogonal and all have a unit L2 norm.

#### Basis Vectors

The basis vectors for a vector space indicate the set of vectors such that any other vector in that space can be uniquely represented as a linear combination of these vectors by scaling and adding.

Some features of basis vectors: 

- All basis vectors must be linearly independent
- The combination of basis vectors must span the whole vector space
  
Typically, the basis vectors are n orthonormal vectors along the n axes of a n-dimensional space though there can be other basis vectors too. 

In [1]:
# Vector Transpose
x_t= x.T
print(x_t)
print(x_t.shape)


y_t= y.T
print(y_t)
print(y_t.shape)

NameError: name 'x' is not defined

In [26]:
# Vector Norms
# L1 norm/ Absolute norm/ Taxicab Norm/ Manhattan Norm
# It varies linearly at all locations in space i.e its useful when difference between zero and non-zero is key
x= np.array([25,2,5])
l1= np.abs(25)+ np.abs(2) + np.abs(5)
print(f"L1 norm is {l1}")

# L2 norm/ Root square norm/ Euclidean norm (Most Popular)
# It calculates the euclidean distance of the vector from the origin
x= np.array([25,2,5])
l2= (25**2 + 2**2 + 5**2)**0.5
print(f"L2 norm is {l2}")
l2= np.linalg.norm(x)
print(f"L2 norm is {l2}")

# Squared L2 norm
# It is equivalent to getting the dot product between the transpose of x and itself i.e xT.x
# It is computationally cheaper since it doesn't involve root calculation
# It is easily differentiable since calculation of element x requires that element only and not root over all elements
x= np.array([25,2,5])
sl2= (25**2 + 2**2 + 5**2)
print(f"Squared L2 norm is {sl2}")
sl2= np.dot(x.T,x)
print(f"Squared L2 norm is {sl2}")

# Max norm/ L-infinity norm
# It takes maximum of the absolute values of each individual element
x= np.array([25,2,5])
lmax=np.max([np.abs(25), np.abs(2), np.abs(5)])
print(f"LMax norm is {lmax}")

L1 norm is 32
L2 norm is 25.573423705088842
L2 norm is 25.573423705088842
Squared L2 norm is 654
Squared L2 norm is 654
LMax norm is 25


### Matrices

It is two dimensional array of numbers which are denoted in uppercase $X$

- The notation of matrix shape is in form of (rows, columns)
- The n th row can be accessed with $X_{n,:}$
- The nth column can be accessed with $X_{:,n}$

In [7]:
X= np.array ([[25,2],[5,26],[3,7]])
print(f"The size of X is {X.size}")
print(f"The shape of X is {X.shape}")
print(f"The 1st row of X is {X[0,:]}")
print(f"The 1st column of X is {X[:,0]}")

The size of X is 6
The shape of X is (3, 2)
The 1st row of X is [25  2]
The 1st column of X is [25  5  3]


#### Matrix Rank

The rank of a matrix is the number of linearly independent rows in a matrix and indicates the amount of information that the matrix contains. 

### Matrix Spaces

There are two important matrix spaces that we usually talk about: 

#### Row Space

The row space of a matrix indicates that the span of its row vectors i.e all possible linear combinations of the row vectors. 

- It gives an insight into the relations between equations in a system.
  
#### Column Space

The column space of a matrix indicates the span of its column vectors i.e all possible linear combinations of the column vectors. 

- It represents all the vectors that can be reached by the linear transformation defined by the matrix. For a matrix A, when we multiply with vector x then $Ax$ lies in the columnspace of A.
- Because of this the columspace is the range or image of the matrix since it shows where the matrix will send any input from its domain.
- For $Ax=B$ to have a solution, B must lie in the column space of A or the system has no solutions.

Both row space and column space are helpful in understanding different aspects of a matrix. 

#### Matrix Kernel

The kernel or null space of a matrix consists of all vectors x such that $Ax=0$ i.e the null space indicates whether a matrix has non trivial solutions to the homogenous equation. It is closely related to rank by the rank-nullity theorem. 

#### Matrix Inverse

The inverse of a matrix is a matrix whose dot product with the original matrix gives the identity matrix. If A is a matrix and $A^{-1}$ is its inverse matrix then

$$ AA^{-1}= I_A$$

The inverse of a matrix is important because it helps in in determining solutions of matrix equations of the form $AX=B$ where X is solved by $A^{-1}B$ if the inverse exists. 

#### Matrix Determinant

The determinant of a matrix is a scalar value that helps determine whether an matrix is invertible.

- Zero Determinant: It means the matrix is not invertible/ singular
- Non-zero Determinant: It means the matrix is invertible and that its rows/columns are linearly independent

Determinants are also useful in solving linear equations with Cramer's rule and understanding permitted geometric transformations. 

### Tensors

Tensors are higher dimensional arrays which are commonly used to represent real world data. 

Example: Images in a training set for a model are 4-dimension tensors:

- Dim 1: Number of images in a training batch eg 32
- Dim 2: Image height in pixels eg 28
- Dim 3: Image width in pixels eg 28
- Dim 4: Number of color channels eg 3

In [9]:
images= torch.zeros([32,28,28,3])
images

tensor([[[[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]],

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]],

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]],

         ...,

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]],

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]],

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]]],


        [[[0., 0.

## Common Tensor Operations