## Problem 1

In Matlab or Python, implement a version of the TSQR that divides an input matrix up into four blocks of rows (using row sub-indexing) and computes the QR-factorisation in the way shown in the lectures on communication-avoiding factorisations.

TSQR defines a family of algorithms, in which the QR factorization of A is obtained by performing a sequence of QR factorizations until the lower trapezoidal part of A is annihilated and the final R factor is obtained. The QR factorizations are performed on block rows of A and on previously obtained R factors, stacked atop one another. We call the pattern followed during this sequence of QR factorizations a reduction tree.

### Sequential TSQR

The first set of algorithms, “Tall Skinny QR” (TSQR), are for matrices for which the number of rows is much larger than the number of columns, and which have their rows dis-
tributed over processors in a one-dimensional (1-D) block row layout.

Sequential TSQR uses a similar factorization process, but with a “flat tree” (a linear chain). We start with the same block row decomposition as with parallel TSQR,
but begin with a QR factorization of $A_0$, rather than of all the block rows:

$$
A =
\begin{pmatrix}
A_0 \\
A_1 \\
A_2 \\
A_3
\end{pmatrix}
=
\begin{pmatrix}
Q_{00} R_{00} \\
A_1 \\
A_2 \\
A_3
\end{pmatrix}
$$

This is “stage 0” of the computation, hence the second subscript 0 of the $Q$ and $R$ factor. We then combine $R_{00}$ and $A_1$ using a QR factorization:

$$
\begin{pmatrix}
R_{00} \\
A_1 \\
A_2 \\
A_3
\end{pmatrix}
=
\begin{pmatrix}
R_{00} \\
A_1 \\
\hline
A_2 \\
A_3
\end{pmatrix}
=
\begin{pmatrix}
Q_{01} R_{01} \\
\hline
A_2 \\
A_3
\end{pmatrix}
$$

We continue this process until we run out of $A_i$ factors. Here, the $A_i$ blocks are $m/P \times n$. If we were to compute all the above $Q$ factors explicitly as square matrices, which we do not, then $Q_{00}$ would be $m/P \times m/P$ and $Q_{0j}$ for $j > 0$ would be $2m/P \times 2m/P$ . The final R factor, as in the parallel case, would be $m \times n$ upper triangular (or $n \times n$ upper triangular in a “thin QR”).

In [17]:
import numpy as np


# Sequential implementation of TSQR
p = 4                                       # 4 block rows
m = 4 * p                                   # So each block has 4 rows
n = 5                                       # So each block has 5 columns
scaling_factor = 100                        # Scales elements, otherwise [0, 1)
A = scaling_factor * np.random.rand(m, n)   # Matrix
is_thin = True                              # Outputs a Thin-QR (n x n) matrix
print(f"A ({m} x {n}):\n{A}\n")

# Partioning A into p blocks
m = A.shape[0] // p
A_partitioned = [None] * p
for i in range(p):
  A_partitioned[i] = A[m * i: m * (i + 1), :]
  print(f"A_{i} ({m} x {n}):")
  print(f"{A_partitioned[i]}\n")

R = None
for i in range(p):
  if i == 0:
    # QR factorization of A_0
    _, R = np.linalg.qr(A_partitioned[i], "reduced" if is_thin else "complete")
    print(f"R_0{i} ({R.shape[0]} x {R.shape[1]}):\n{R}\n")
  else:
    # Combine R_i-1 and A_i using a QR factorization
    stacked_matrix = np.vstack((R, A_partitioned[i]))
    _, R = np.linalg.qr(stacked_matrix, "reduced" if is_thin else "complete")
    print(f"R_0{i} ({R.shape[0]} x {R.shape[1]}):\n{R}\n")

A (16 x 5):
[[58.53349999 88.42089695 12.82532575 40.61157464 22.40800689]
 [24.51354352 16.53395331 96.39468142 13.71181688  3.50752132]
 [33.86832371 25.73668137 80.44033633 18.08295237 92.24711753]
 [20.64349387 69.69353981  2.916204   58.665973   93.01198732]
 [40.31163915 11.93834246 72.24052454 93.64964444 81.36784117]
 [ 5.06878922 44.90567116 34.32396244 77.78521312 38.66211803]
 [19.89933233 76.31266345 29.97953635 35.53252812 59.48891768]
 [34.28306129 34.34437439 89.78893969 26.70895617 26.17772715]
 [ 1.5479483  56.00301572  2.1451451  94.6896275  73.1563357 ]
 [96.95471517  7.94907908 66.78736314 13.3974737  78.25693216]
 [60.21906999 21.39828089 94.12489267 94.51912604 33.93554974]
 [61.68342475 75.06972763  2.14119113 41.80249993 51.96563237]
 [52.46807673 17.62938226 20.74956398 44.28847085 52.4703498 ]
 [19.0809091  25.28585502 49.1675797  96.02725829 12.31429051]
 [55.30151763 39.3868871  79.93471707  0.45232731 46.53816409]
 [52.44749872 20.22446177 48.75975109 79.63

In [18]:
# Sequential implementation of TSQR
p = 4                                       # 4 block rows
m = 4 * p                                   # So each block has 4 rows
n = 5                                       # So each block has 5 columns
scaling_factor = 100                        # Scales elements, otherwise [0, 1)
A = scaling_factor * np.random.rand(m, n)   # Matrix
is_thin = False                             # Outputs a Full-QR (m x n) matrix
print(f"A ({m} x {n}):\n{A}\n")

# Partitioning A into p blocks
m = A.shape[0] // p
A_partitioned = [None] * p
for i in range(p):
  A_partitioned[i] = A[m * i: m * (i + 1), :]
  print(f"A_{i} ({m} x {n}):")
  print(f"{A_partitioned[i]}\n")

R = None
for i in range(p):
  if i == 0:
    # QR factorization of A_0
    _, R = np.linalg.qr(A_partitioned[i], "reduced" if is_thin else "complete")
    print(f"R_0{i} ({R.shape[0]} x {R.shape[1]}):\n{R}\n")
  else:
    # Combine R_i-1 and A_i using a QR factorization
    stacked_matrix = np.vstack((R, A_partitioned[i]))
    _, R = np.linalg.qr(stacked_matrix, "reduced" if is_thin else "complete")
    print(f"R_0{i} ({R.shape[0]} x {R.shape[1]}):\n{R}\n")

A (16 x 5):
[[21.23174828 76.04702134 11.85218571 64.90812265 95.47436312]
 [59.60568762 68.24819413 97.77735861 26.82616224 89.12871102]
 [96.4329713  23.04488324 83.75266762 95.18804405  2.40524975]
 [10.24285472  0.55982992 39.53503927 20.60335248 31.15579189]
 [50.48722892 15.93282684 58.18978182 91.43790113 25.17005484]
 [63.8051006  37.1902777  85.0024035  93.65295314  1.64699034]
 [23.7217667  97.53014673 90.0279199  97.48250487 51.43918441]
 [83.2493308  35.62143565 97.27957507 61.37533496  3.79768252]
 [ 1.22800186 80.31995598 44.7729469  91.11571356 24.10982419]
 [35.11723308 22.53474916 24.38285548 60.62742648 83.89361636]
 [87.62622085 42.02921515 73.38342817 16.70500542 85.95225935]
 [63.59083231 86.03202152 47.16903553  3.51176619 59.99676095]
 [36.51423056 18.98038157 64.11500389 25.3078171  87.4257058 ]
 [84.37137557 37.75816316 35.30713594 74.75826349 14.26655214]
 [52.02310234 98.10528845 14.5224535  73.79725474 63.7235424 ]
 [36.91796856 12.51193721  8.10179467 69.58

References:
- [1] Jammes Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou. Implementing Communication-Optimal Parallel And Sequential QR Factorizations. https://arxiv.abs/pdf/0809.2407, 2008. arXiv:0809.2407 [math.NA]