---
title: 4.3 Orthogonal Matrices and the QR Factorization
subject:  Orthogonality
subtitle: A new matrix factorization
short_title: 4.3 Orthogonal Matrices and the QR Factorization
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: Orthogonal Matrix, QR Factorization
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/03_Ch_4_Orthogonality/053-orthogonal_matrices.ipynb)

{doc}`Lecture notes <../lecture_notes/Lecture 07 - Orthogonality, Gram-Schmidt, Orthogonal Matrices, and QR-Factorization.pdf>`

## Reading

Material related to this page, as well as additional exercises, can be found in ALA 4.3.

## Learning Objectives

By the end of this page, you should know:
- an orthogonal matrix
- the QR factorization of a square matrix
- how to use the QR factorization to solve systems of equatitons of the form $A\vv{x} = \vv b$, with $A$ square

## Orthogonal Matrices

Rotations and reflections play key roles in geometry, physics, robotics, quantum mechanics, airplans, compute graphics, data science, and more. These transformations are encoded via *orthogonal matrices*, that is matirces whose columns form an orthonormal basis for $\mathbb{R}^n$. They also play a central role in one of the most important methods of linear algebra, the *QR factorization*.

We start with a definition.

:::{prf:definition} Orthogonal Matrix
:label: orthogonal-matrix-defn

A square matrix $Q$ is called *orthogonal* if it satisfies 

\begin{align*}
    QQ^{\top} = Q^{\top}Q = I.
\end{align*}

This means that $Q^{-1} = Q^{\top}$ (in fact, we could define orthogonal matrices this way instead), and that solving linear systems of the form $Q\vv x = \vv b$ is very easy: simply set $\vv x = Q^\top \vv b$!

Notice that $Q^\top Q = I$ implies that the columns of $Q$ are orthonormal. If $Q = [\vv{q_1}, ..., \vv{q_n}]$, then 

\begin{align*}
    (Q^\top Q)_{ij} = \vv{q_i}^\top \vv{q_j} = I_{ij} = \begin{cases} 1 \quad\text{if $i \neq j$}\\ 0\quad\text{if $i = j$}\end{cases}
\end{align*}

which is exactly the definition of an orthonormal collcetion of vectors. Further, since ther eare $n$ such vectors, they must form an [orthonormal basis](./051-orthogonal_orthonormal_bases.ipynb#orthonormal-basis-defn) for $\mathbb{R}^n$. 
:::

Now, let's explore some of the consequences of this definition.

:::{prf:example} $2 \times 2$ orthogonal matrices
:label: orthogonal-matrices-ex1

A $2\times 2$ matric $Q = \bm a&b\\c&d\em$ is orthogonal if and only if

\begin{align*}
 Q^\top Q = \bm a^2 + c^2 & ab + cd \\ ac + cd & b^2 + d^2\em = \bm 1& 0\\ 0& 1\em 
\end{align*}

or equivalently

\begin{align*}
    a^2 + c^2 = 1, \quad ab + cd = 0, \quad b^2 + d^2 = 1
\end{align*}

The first and last equations say that $\bm a\\ c \em$ and $\bm b\\ d \em$ lie on the unit circle in $\mathbb{R}^2$: a convenient and revealing way of writing this is by setting

\begin{align*}
    a = \cos \theta, \quad c= \sin \theta, \quad b = \cos \phi, \quad d = \sin \phi
\end{align*}

since $\cos^2 \theta + \sin^2\theta = 1$ for all $\theta \in \mathbb{R}$.

Our last condition is $0 = ad + cd = \cos\theta \cos \phi +\sin\theta \sin\phi = \cos(\theta - \phi)$. Now 
\begin{align*}
    \cos (\theta - \phi) = 0 &\iff \theta - \phi = \frac{\pi}{2} + 2 n \pi\quad \text{or} \quad \theta - \phi = -\frac{\pi}{2} \\
    &\iff \pi = \theta \pm \frac{\pi}{2} 
\end{align*}

This means either:

* $b = -\sin\theta$ and $d = \cos\theta$ 

* or $b = \sin\theta$ and $d = -\cos \theta$

As a result, every $2\times 2$ orthogonal matrix has one of two possible forms:

\begin{align*}
    \bm \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \em \quad\text{or}\quad \bm \cos\theta & \sin\theta \\ \sin\theta & -\cos\theta \em
\end{align*}

where by convention, we restrict $\theta \in [0, 2\pi)$.

The columns of both matrices form an orthonormal basis for $\mathbb{R}^2$. The first is obtained by rotating the [standard basis](../01_Ch_2_Vector_Spaces_and_Bases/034-basis_dim.ipynb#basis_eg) $\vv{e_1}, \vv{e_2}$ through angle $\theta$, the second by first reflexting about the x-axis and the rotating.

![Orthogonal matrices in $\mathbb{R}^2$](../figures/04-orthogonal_matrix.png)

:::

If we think about the map $\vv x \mapsto Q \vv x$ defined by multiplication with an orthogonal matrix as rotating and/or reflectingthe vector $\vv x$, then the following property should not be surprising:

:::{important}
The product of two orthogonal matrices is also orthogonal!
:::

Before grinding through some algebra, let's think about this through the lens of rotation and reflections. Multiply $\vv x$ by a product of orthogonal matrices $Q_2Q_1$ is the same as first rotation/reflecting $\vv x$ by $Q_1$ to obtain $Q_1 \vv x$, and then rotating/reflecting $Q_1 \vv x$ by $Q_2$ to get $Q_2 Q_1 \vv x$. Now a sequence of rotations and reflections is still ultimately a rotation and/or reflection so we must have $Q_2 Q_1 \vv x = Q \vv x$ for some orthogonal $Q = Q_2 Q_1$.

Let's check that this intuition carries over in the math. Since $Q_1$ and $Q_2$ are orthogonal, we have that

\begin{align*}
    Q^\top_1 Q_1 = I = Q_2^\top Q_2.
\end{align*}

Let's check that $(Q_1Q_2)^\top (Q_1Q_2) = I$:

\begin{align*}
    (Q_1Q_2)^\top (Q_1Q_2) = Q_2^\top \underbrace{Q_1^\top Q_1}_{I}Q_2 = \underbrace{Q_2^\top Q_2}_I = I
\end{align*}

Therefore $(Q_1 Q_2)^{-1} = (Q_1 Q_2)^\top$, and we indeed have $Q_1Q_2$ is orthogonal.

:::{important}

This multiplicative property combined with the fact that the inverse of an orthogonal matrix is orthogonal (why?) says that the set of all orthogonal matrices (of dimension $n$) forms a *group* (under matrix multiplication). 

Group theory underlies much of modern physics and quantum mechanics and plays a central role in robotics. Although we will not spend too much time on groups in this class, you are sure to see them again in the future. 

The aforementioned *orthogonal group* in particular is central to rigid body mechanics, atomic structure and chemistry, and computer graphics, among many other applications.

:::

## The QR Factorization

The [GSP](./052-gram_schmidt.ipynb#gram-schmidt-alg), when applied to orthonormalize a basis of $\mathbb{R}^n$, in fact gives us the famous, incredibly useful *QR factorization* of a matrix:

:::{prf:definition} The QR Factorization of an Invertible Square Matrix
:label: square-qr-factorization-defn

Any real invertible square matrix $A$ may be written in the form

\begin{align*}
    A = QR,
\end{align*}

where $Q$ is [orthogonal](#orthogonal-matrix-defn) and $R$ is upper triangular and invertible. This is known as a *QR* factorization of $A$.

:::

Let us start with a basis $\vv{b_1}, ..., \vv{b_n}$ for $\mathbb{R}^n$, and let $\vv{u_1}, ..., \vv{u_n}$ be the result of applying the GSP to it. Define the matrices:

\begin{align*}
    A = \bm \vv{b_1} & \vv{b_2} & ... & \vv{b_n} \em, \quad Q = \bm \vv{u_1} & \vv{u_2} & ... & \vv{u_n} \em.
\end{align*}

$Q$ is an [orthogonal matrix](#orthogonal-matrix-defn) because the $\vv{u_i}$ form an orthonormal basis.

Now, let's revisit the GSP equations:

\begin{align*}
    \vv{v_1} &= \vv{b_1} \\
    \vv{v_2} &= \vv{b_2} - \frac{\langle \vv{b_2}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1}\\
    \vv{v_3} &= \vv{b_3} - \frac{\langle \vv{b_3}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1} - \frac{\langle \vv{b_3}, \vv{v_2} \rangle}{\langle \vv{v_2}, \vv{v_2} \rangle}\vv{v_2}\\
    \vdots\\
    \vv{v_n} &= \vv{b_n} - \frac{\langle \vv{b_n}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1} - ... - \frac{\langle \vv{b_n}, \vv{v_{n-1}} \rangle}{\langle \vv{v_{n-1}}, \vv{v_{n-1}} \rangle}\vv{v_{n-1}}\\
\end{align*}

We start by replacing each element $\vv{v_i}$ with its normalized form, $\vv{u_i} = \frac{\vv{v_i}}{\| \vv{v_i} \|}$. Rearranging the above, we can write the original basis elements $\vv{b_i}$ in terms of the orthonormal basis $\vv{u_i}$ via the triangular system

\begin{align*}\label{expr:gram-schmidt-system}
    \vv{b_1} &= r_{11}\vv{u_1}\\
    \vv{b_2} &= r_{12}\vv{u_1} + r_{22}\vv{u_2}\\
    \vv{b_3} &= r_{13}\vv{u_1} + r_{23}\vv{u_2} = r_{33}\vv{u_3}\\
    \vdots\\
    \vv{b_n} &= r_{1n}\vv{u_1} + r_{2n}\vv{u_2} + ... + r_{nn}\vv{u_n}
\end{align*}

Using our usual trick of taking inner products with both sides we see that

\begin{align*}
    \langle \vv{b_j}, \vv{u_i}\rangle &= \langle r_{1j}\vv{u_1} + ... + r_{jj}\vv{u_j}, \vv{u_i}\rangle \\
    &= r_{1j} \langle \vv{u_1}, \vv{u_i} \rangle + ... + r_{ij} \langle \vv{u_i}, \vv{u_i} \rangle + ... + r_{jj} \langle \vv{u_i}, \vv{u_j} \rangle \\
    &= r_{ij}
\end{align*}

So we conclude that $r_{ij} = \langle \vv{b_j}, \vv{u_i} \rangle$.

Now, returning to [](#expr:gram-schmidt-system), we observe that if we define the upper triangular matrix

\begin{align*}
    R = \bm
        r_{11} & r_{12} & \dots & r_{1n}\\
        0 & r_{22} & \dots & r_{2n}\\
        \vdots & \vdots & \ddots & \vdots\\
        0 & 0 & \dots & r_{nn}
    \em
\end{align*}

we can write $A = QR$. Since the GSP works on any basis, the only requirement for $A$ to have a QR factorization is that its columns form a basis for $\mathbb{R}^n$, i.e., that $A$ be nonsingular.

## Pseudocode for the QR Factorization Algorithm

We can condense the above process into an algorithm, for which we give pseudocode below. Note that this algorithm assumes that the underlying inner product is the dot prodcut, but it can easily be adapted to any inner product.

:::{prf:algorithm} QR Factorization 
:label: qr-alg

**Inputs** An invertible $n \times n$ matrix $A$ (with entries $a_{ij}$) 

**Output** Invertible $n \times n$ matrices $Q$ (with entries $q_{ij}$) and $R$ (with entries $r_{ij}$) such that $A = QR$, where $Q$ is orthogonal and $R$ is upper triangular

$Q \gets A$ \
$R \gets $ empty $n \times n$ matrix\
**for** $j=1$ to $n$:\
$\quad$ $r_{jj} \gets \sqrt{q_{1j}^2 + \dots + q_{nj}^2}$\
$\quad$ **if** $r_{jj} = 0$, **stop**; **print** "A has linearly dependent columns"\
$\quad$ $\quad$ **else for** $i = 1$ to $n$\
$\quad$ $\quad$ $\quad$ $q_{ij} \gets q_{ij} / r_{jj}$\
$\quad$ **for** $k = j + 1$ to $n$\
$\quad$ $\quad$ $r_{jk} \gets q_{1j}q_{1k} + \dots + q_{nj}q_{nk}$\
$\quad$ $\quad$ **for** $i = 1$ to $n$\
$\quad$ $\quad$ $\quad$ $q_{ik} \gets q_{ik} - q_{ij}r_{jk}$\
**return** $Q$, $R$
:::

At first glance, this algorithm might look a little different that the process we just outlined. However, it's really the same idea, just the order of subtracting off components is different. Instead of visiting each basis vector and subtracting off the components parallel to vectors **before** it (which we did in the [Gram-Schmidt Process](#gram-schmidt-alg)), we visit each basis vector and subtract the components parallel to it from every vector **after** it.

#### Python break!

We give an implementation of [](#qr-alg) in NumPy below, and run it on some test cases. 

In [1]:
import numpy as np

def qr_factorization(A):                       
    if (A.shape[0] != A.shape[1]):
        print('A is not square')
        return None, None

    n = A.shape[0]                              
    Q = A.copy()                       
    R = np.zeros((n, n))

    for j in range(n):
        R[j, j] = np.linalg.norm(Q[:, j])       
        if R[j, j] < 1e-8:
            print('A has linearly dependent columns')
            return None, None
        else:
            for i in range(n):
                Q[i, j] = Q[i, j] / R[j, j]
        for k in range(j + 1, n):
            R[j, k] = np.dot(Q[:, j], Q[:, k])
            for i in range(n):
                Q[i, k] = Q[i, k] - Q[i, j] * R[j, k]

    return Q, R

print('Test case with invertible matrix:')

A = np.array([[2.0, 1.0, 3.0], [-1.0, 0.0, 1.0], [0.0, 2.0, -1.0]])
print('A:')
print(A)
print('Q:')

Q, R = qr_factorization(A)
print(Q)
print('R:')
print(R)
print('QR:')
print(np.round(Q @ R, 2))
print('(Q^T)Q:')
print(np.round(Q.T @ Q, 2))

print('\nTest case with noninvertible matrix:')

A = np.array([[2.0, 1.0, 3.0], [-1.0, 0.0, 1.0], [1.0, 1.0, 4.0]])
print('A:')
print(A)

Q, R = qr_factorization(A)
print('Q:')
print(Q)
print('R:')
print(R)

Test case with invertible matrix:
A:
[[ 2.  1.  3.]
 [-1.  0.  1.]
 [ 0.  2. -1.]]
Q:
[[ 0.89442719  0.09759001  0.43643578]
 [-0.4472136   0.19518001  0.87287156]
 [ 0.          0.97590007 -0.21821789]]
R:
[[ 2.23606798  0.89442719  2.23606798]
 [ 0.          2.04939015 -0.48795004]
 [ 0.          0.          2.40039679]]
QR:
[[ 2.  1.  3.]
 [-1. -0.  1.]
 [ 0.  2. -1.]]
(Q^T)Q:
[[ 1.  0. -0.]
 [ 0.  1. -0.]
 [-0. -0.  1.]]

Test case with noninvertible matrix:
A:
[[ 2.  1.  3.]
 [-1.  0.  1.]
 [ 1.  1.  4.]]
A has linearly dependent columns
Q:
None
R:
None


## Solving linear systems with a QR factorization

Solving linear systems using a QR factorization is easy. Observe that if our goal is to solve $A\vv x = \vv b$ and a QR factorization is available we first notice that

\begin{align*}
 QR \vv x = \vv b \iff R \vv x = Q^T \vv b = \vv{\tilde b}
\end{align*}

since $Q^\top Q = I$. Now, solving $R\vv x = \vv{\tilde b}$ can be easily accomplished via backsubstitution since $R$ is an upper triangular matrix! 

::::{exercise}  Solving a linear system via QR factorization
:label: orthogonal-matrices-ex2

Using a [QR factorization](#square-qr-factorization-defn), factor the matrix

\begin{align*}
    A = \bm 1 & 3 \\ 2 & 1\em 
\end{align*}

as $A = QR$. Then, use the QR factorization of $A$ to solve the linear system

\begin{align*}
    A \vv x = \bm 8 \\ 1 \em
\end{align*}

:::{solution} orthogonal-matrices-ex2
:class: dropdown

Denote the columns of $A$ as

\begin{align*}
    \vv{a_1} = \bm 1\\ 2\em, \quad \vv{a_2} = \bm 3 \\ 1 \em
\end{align*}

First, we apply the [Gram-Schmidt process](#gram-schmidt-alg) to find an orthogonal basis for $\text{span}(\vv{a_1}, \vv{a_2})$ (the columnspace of $A$). We that one such orthogonal basis is given by

\begin{align*}
    \vv{v_1} &= \vv{a_1} = \bm 1\\2 \em\\
     \vv{v_2} &= \vv{a_2} - \frac{\langle \vv{a_2}, \vv{v_1} \rangle}{\langle \vv{v_1}, \vv{v_1} \rangle}\vv{v_1} = \bm 2\\ -1 \em
\end{align*}

And an orthonormal basis is given by 

\begin{align*}
    \vv{u_1} = \bm \frac{1}{\sqrt 5}\\\frac{2}{\sqrt 5} \em, \quad\vv{u_2} = \bm \frac{2}{\sqrt 5}\\-\frac{1}{\sqrt 5} \em
\end{align*}

So we have that our $Q$ matrix is given by

\begin{align*}
    Q = \boxed{\bm \frac{1}{\sqrt 5} & \frac{2}{\sqrt 5} \\ \frac{2}{\sqrt 5} & -\frac{1}{\sqrt 5} \em }
\end{align*}

Now we compute the $R$ matrix. Recall that

\begin{align*}
    R = \bm \langle \vv{u_1}, \vv{a_1} \rangle &\langle \vv{u_1}, \vv{a_2} \rangle \\ 0 &\langle \vv{u_2}, \vv{a_2}\rangle \em 
\end{align*}

Substituting in values for $\vv{u_i}$ and $\vv{b_j}$, we have that our $R$ matrix is given by

\begin{align*}
    R = \boxed{\bm  
        \sqrt 5 & \sqrt 5 \\ 0 & \sqrt 5
    \em }
\end{align*}

You can check that $Q$ is orthogonal ($Q^\top Q = I$), $R$ is invertible upper triangular, and $A = QR$.

Next, we solve $A\vv x = \bm 8 \\ 1\em$:

\begin{align*}
    A\vv x = \bm 8 \\ 1\em &\iff QR \vv x = \bm 8 \\ 1 \em\\
    &\iff \underbrace{(Q^\top Q)}_I R \vv x = Q^\top \bm 8 \\ 1 \em\\
    &\iff \bm \sqrt 5 & \sqrt 5 \\ 0 & \sqrt 5\em \bm x_1 \\ x_2\em = \bm \frac{1}{\sqrt 5} & \frac{2}{\sqrt 5} \\ \frac{2}{\sqrt 5} & -\frac{1}{\sqrt 5} \em\bm 8\\ 1\em = \bm 2\sqrt 5 \\ 3\sqrt 5 \em 
\end{align*}

Solving this with backsubstitution, we find that $(x_1, x_2) = \boxed{(-1, 3)}$.
:::
::::

## Optional: Generalized QR factorizations

In an [earlier section](#square-qr-factorization-defn), we stated that any real invertible square matrix $A$ had a QR decomposition $A = QR$, where $Q$ was orthogonal and $R$ is upper triangular and invertible.

While this special case (where the $R$ matrix is invertible) is incredibly useful for solving systems of linear equations, it turns out that *every* $m\times n$ matrix $A$ has a decomposition of the form $A = QR$, where $Q$ is and orthogonal $m\times m$ matrix and $R$ is an upper triangular $m\times n$ matrix. 

:::{prf:definition} The QR Factorization
:label: qr-factorization-defn

Any real square matrix $A$ may be written in the form

\begin{align*}
    A = QR
\end{align*}

where $Q$ is [orthogonal](#orthogonal-matrix-defn) and $R$ is upper triangular.

Furthermore, if $A$ is a real, invertible, square matrix then $R$ is invertible.

:::

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/03_Ch_4_Orthogonality/053-orthogonal_matrices.ipynb)