### Import PyTorch and numpy libraries

In [2]:
import torch
import numpy as np
_ = torch.manual_seed(0)

Let's create a Weight matrix W as a product of two matrices of shape (d, 2) (2, k). The rank of the matrix W will be 2. 

[More about Rank of a matrix](https://en.wikipedia.org/wiki/Rank_(linear_algebra))

In [7]:
d, k = 10, 10

rank = 2
W = torch.randn(d, rank) @ torch.randn(rank, k)
print('Shape of matrix W:', W.shape)
W_rank = np.linalg.matrix_rank(W)
print(f'Rank of W: {W_rank}')

Shape of matrix W: torch.Size([10, 10])
Rank of W: 2


To understand how can we decompose the matrix W to matrices with lower rank, 
which means that we need lesser number of parameters to represent W

Lets apply SVD on W. More about SVD here: [SVD](https://en.wikipedia.org/wiki/Singular_value_decomposition)

In [10]:
U, S, V = torch.svd(W)

# Perform SVD on W (W = UxSxV^T)

print(U.shape, S.shape, V.shape)

torch.Size([10, 10]) torch.Size([10]) torch.Size([10, 10])


### For rank-r factorization, keep only the first r singular values (and corresponding columns of U and V)

In [12]:
U_r = U[:, :W_rank]
S_r = torch.diag(S[:W_rank])
V_r = V[:, :W_rank].t()  # Transpose V_r to get the right dimensions

In [13]:
# Compute B = U_r * S_r and A = V_r
B = U_r @ S_r
A = V_r
print(f'Shape of B: {B.shape}')
print(f'Shape of A: {A.shape}')

Shape of B: torch.Size([10, 2])
Shape of A: torch.Size([2, 10])


Now we got B and A, thourgh the singular value decomposition of matrix W. 

Let's perform simple linear regression with randomly generated  input(x) and bias(b) vectors and the weight matrix W.

In [17]:
# Generate random bias and input
bias = torch.randn(d)
x = torch.randn(d)

In [16]:
# Compute y = Wx + bias
y = W @ x + bias
# Compute y' = (B*A)x + bias
y_prime = (B @ A) @ x + bias

print("Original y using W:\n", y)
print("y' computed using BA:\n", y_prime)

Original y using W:
 tensor([ 0.9137, -1.9848,  1.8892,  2.4349, -2.2102, -2.1251,  1.1640,  2.1034,
         0.2520, -2.4652])
y' computed using BA:
 tensor([ 0.9137, -1.9848,  1.8892,  2.4349, -2.2102, -2.1251,  1.1640,  2.1034,
         0.2520, -2.4652])


In [15]:
print("Total parameters of W: ", W.nelement())
print("Total parameters of B and A: ", B.nelement() + A.nelement())

Total parameters of W:  100
Total parameters of B and A:  40


Instead of 100 parameters in W, we can represent W using BA using only 40 parameters (We can infer this as y==y').
Now that we have an idea of matrix decomposition, Let's proceed to LoRA, a trainable way to introduce rank decomposition matrices into each
layer of theneural network architectures. 

[LoRA paper](https://arxiv.org/pdf/2106.09685.pdf)e