# Linear Algebra

That linear algebra is fun, is a widely accepted fact. This notebooks will guide you through some of the linear algebra fun you can realize with Heat. 

In [None]:
from ipyparallel import Client
rc = Client(profile="default")
rc.ids

if len(rc.ids) == 0:
    print("No engines found")
else:
    print(f"{len(rc.ids)} engines found")

In [None]:
%%px
import heat as ht

## Matrix-Matrix Multiplication

The most basic operation in linear algebra is matrix-matrix multiplication ("matmul"). Doing it by hand for a small matrix is not difficult and in fact not very spectacular. However, in the distributed setting (e.g., on 4 GPUs) even such a simple operation is not trivial any more: just imagine you work together with 3 other people and each of you only knows one fourth of the columns of a matrix $A$ and one fourth of the rows of a matrix $B$. Together, you have to compute the product $AB$ such that in the end each of you only has one fourth of the columns of $AB$...

In [None]:
%%px
split_A=0 
split_B=1 
M = 10000
N = 10000
K = 10000
A = ht.random.randn(M, N, split=split_A, device="gpu")
B = ht.random.randn(N, K, split=split_B, device="gpu")
C = ht.matmul(A, B)
C

## QR Decomposition and Triangular Solve

Given a matrix $A$, its QR decomposition is given by $A=QR$ where $Q$ is an orthogonal matrix (i.e. its columns are pairwise orthonormal) and $R$ is an upper triangular matrix. 

Further information: [QR on Wikipedia](https://en.wikipedia.org/wiki/QR_decomposition)

In [None]:
%%px
A = ht.random.randn(100000, 1000, split=0, device="gpu")
Q,R = ht.linalg.qr(A)

With a little bit of linear algebra fun, you find out that a linear least squares problem of type $\min \lVert Ax - b \rVert_2$ boils down to computing the QR decomposition $A=QR$ and then solving for $Rx = Q^T b$. (Of course, we need to assume that if $A \in \mathbb{R}^{m \times n}$ that $m \geq n$ and $R$ is invertible...)

In [None]:
%%px
b = ht.random.randn(100000,split=None, device="gpu")
Qtb = Q.T @ b
x = ht.linalg.solve_triangular(R,Qtb) 

If you want to solve a LASSO-regularized version of this linear regression problem, try out `heat.regression.Lasso`!

## Singular Value Decomposition

Given a matrix $X$, its singular value decomposition is defined to be $X = U \Sigma V^T$ with orthogonal matrices $U, V$ and a diagonal matrix $\Sigma$ with positive entries (the "singular values" of $X$). Further information: [SVD on Wikipedia](https://en.wikipedia.org/wiki/Singular_value_decomposition)

Computing the **full** SVD in a distributed environment can be quite **expensive**; nevertheless, we have implemented it: 

In [None]:
%%px
A = ht.random.rand(2000,2000, split=1, device="gpu")
U, S, V = ht.linalg.svd(A)
U, S, V

If the number of rows is much higher than the number of columns (we call such matrices "tall-skinny"), a more efficient implementation of SVD is available than in the general case:

In [None]:
%%px 
X = ht.random.rand(100000,2000, split=0, device="gpu")
U, S, V = ht.linalg.svd(X) 
U, S, V

Nevertheless, if you dont have a tall-skinny matrix, but only require an approximation of the largest singular values (and vectors)---and in many situations this should suffice---you can use, e.g., randomized SVD instead. In the following we use randomized SVD with a certain number of oversamples and one power iteration in order to compute an approximation to the leading 10 singular values and vectors of the same matrix $A$ as above.  

In [None]:
%%px 
Ur, Sr, Vr = ht.linalg.svd_randomized(X, 10, n_oversamples=10, power_iter=1)
Ur, Sr, Vr

## Exercises 

1. Try out different split combinations `split_A=0,1,None`, `split_B=0,1,None` for matrix-matrix multiplication. What do you observe for the split of the outcome? Does every combination take the same computing time? (Whats the likely cause for this?)

2. Compare the approximate singular values computed by randomized SVD with the reference ones computed by full SVD. What do you observe for different choices of the number of oversamples and power iterations? 