Very often a matrix can be represented as a small block matrix, perhaps even as a $2\times2$ one, and the blocks themselves can possess additional desirable properties (such as being symmetric positive definite). Another common case is that of a matrix that can be represented as a low-rank perturbation of another, probably simpler one. That latter case possesses interesting spectral properties and can be inverted efficiently. 

Let us first consider the spectral properties thereof. We will require some preliminaries.

Firstly, consider the Rayeligh quotient $R_A(x) = \frac{\|x\|_A^2}{\|x\|^2}$ for a Hermitian matrix $A$. You have seen in power method that maximising this value over $x$ yields the largest eigenvalue (to see this, expand $x$ in eigenvectors of $A$, existence of which is guaranteed by assumption of Hermitian nature, and recognise that the expansion is maximised when $x$ and the first eigenvector are aligned). Moreover, by the same token, the quotient is minimised at the eigenpair for the smallest eigenvalue. A natural question to ask is whether one can provide such a variational characterisation of any eigenvalue. 

**Theorem** (Courant, Fischer, Weyl). Suppose $A\in\mathbb{C}^{n\times n}$ is Hermitian with $n$ eigenvalues. Then $\forall i \in \{1,2, \dots, n\}$ the following characterisations hold:
$$1. \lambda_i = \min_V\max_x\{R_A(x)|x\ne 0; x\in V; \dim(V)=i\}$$
$$2. \lambda_i = \max_V\min_x\{R_A(x)|x\ne 0; x\in V; \dim(V)=n-i+1\}$$
In other words, the $i$-th eigenvalue is given by the lower bound of the maximal value in a subset of any $i$ eigenvalues of A over these subsets, and likewise with the upper bound of the minimal value in a subset of any $n-i+1$ eigenvalues. 

Though the above formulation makes the statement rather obvious, we show it to hold. Choose any eigenbasis $\{v_j\}$. Consider any subspace $V$ of dimension $i$. It will necessarily intersect $\{v_i, v_{i+1}, \dots, v_n\}$ nontrivially. Therefore, this subspace contains a vector $v$ expandable in $\{v_i, v_{i+1}, \dots, v_n\}$. Then its Rayleigh quotient will be bounded from below by $\lambda_i$, and the maximum thereof will likewise possess this lower bound. So $\max_{x\in V} R_A(x) \ge \lambda_i \forall V | \dim(V) = i $, and the claim is recovered by minimising the expression (the minimum is achieved at $V = span\{v_1, v_2, \dots, v_i\}$). The same line of argumentation goes for the second variational formulation (hint: consider $B = -A$). 

Since the covariance matrices are Hermitian by definition, application of the CFW theorem to them yields that for singular values, an analogous theorem also holds:

Suppose $A = B^\dagger B$. Then
$$\lambda_i = \min_V\max_x\{R_A(x)|x\ne 0; x\in U; \dim(V)=i\}$$
$$\sigma_i^2 = \min_V\max_x\{\frac{\langle Ax, x\rangle}{\langle x,x\rangle}|x\ne 0; x\in U; \dim(V)=i\}$$
$$\sigma_i^2 = \min_V\max_x\{\frac{\langle B^\dagger Bx, x\rangle}{\langle x,x\rangle}|x\ne 0; x\in U; \dim(V)=i\}$$
$$\sigma_i = \sqrt{\min_V\max_x\{\frac{\langle Bx, Bx\rangle}{\langle x,x\rangle}|x\ne 0; x\in U; \dim(V)=i\}}$$
$$\sigma_i = \min_V\max_x\{\frac{\|Bx\|}{\|x\|}|x\ne 0; x\in U; \dim(V)=i\}$$
And a similar expression holds for the max-min formulation. 

The variational formulation of the eigenvalues allows us to produce meaningful statements about the changes certain transforms produce in spectra of matrices. Consider, for instance, this result about spectra of additively perturbed matrices.

**Theorem** (Weyl). Suppose $A,B\in \mathbb{C}^{n\times n}$ are Hermitian. Let $C = A+B$. Let $\{\alpha_i\}, \{\beta_i\}, \{\gamma_i\}$ be the ordered sets of eigenvalues thereof. Then:
$$\alpha_i+\beta_1\le\gamma_i\le\alpha_i+\beta_n$$
$$\alpha_j+\beta_k\le\gamma_i\le\alpha_m+\beta_p; j+k-1\le i\le m+p-n$$

We endeavour to show that this holds for the first formulation, since for the second one, the approach is similar, but the algebra is a bit more involved.

For the former case, however, the result is a direct consequence of the CFW principle:
$$\gamma_i = \min_V\max_{x| \|x\| = 1} (\langle (A+B)x, x\rangle)$$
$$\gamma_i = \min_V\max_{x| \|x\| = 1} (\langle Ax, x\rangle+\langle Bx, x\rangle)$$
$$\gamma_i \ge \min_V\max_{x| \|x\| = 1} (\langle Ax, x\rangle+\beta_1)$$
Because the maximum over a subspace of dimension $i$ is certainly no less than a maximum over a subspace of dimension 1.
$$\gamma_i \ge \alpha_i+\beta_1$$
Since the maximum over a subspace of dimension $i$ is certanly no greater than a maximum over the entire space, the second bound follows by the same logic. 

An immediate corollary (monotonicity theorem), by the way, is that any positive semidefinite perturbation leaves all eigenvalues nondecreased. 

We can milk Weyl's inequality for a few more results on singular value spectra. 

By using the variational characteristic of the singular values, 
$$\sigma_i = \min_V\max_x\{\frac{\|Bx\|}{\|x\|}|x\ne 0; x\in U; \dim(V)=i\}$$
And letting $\alpha,\beta,\gamma$ now denote the singular values of $A,B,C$, we argue that 
$$\gamma_i\le\alpha_i+\beta_n$$
With a proof similar to what we had for eigenvalues
Moreover, we can claim that
$$\gamma_{i+j+1}\ge\alpha_{i+1}+\beta_{i+1}$$
To see this, consider that 
$$\gamma_{i+j+1} = \max_V\min_x\{\frac{\|(A+B)x\|}{\|x\|}|x\ne 0; x\in U; \dim(V)=n-i-j\}$$
$$\gamma_{i+j+1} = \max_V\min_x\{\|(A+B)x\||x\ne 0; \|x\| = 1; x\in U; \dim(V)=n-i-j\}$$
Consider now the subspaces $V_i$ and $V_j$ that define $\alpha_{i}$ and $\beta_{j}$. They are of dimensionalities $n-i$ and $n-j$, respectively. Consider their intersection $V_{ij}$. It is of dimensionality $n-i-j$ at least, since elsewise the dimensionality of the entire space would be violated.

Recall that minimum of a sum is no less than the sum of minima
$$\min_{x\in V_{ij}}\|(A+B)x\| \ge \min_{x\in V_{ij}}\|Ax\|+\|Bx\| \ge \alpha_{i}+\beta_{j}$$
Where the last inequality holds by construction of $V_{ij}$. 

Since the dimensionality of $V_{ij}$ is at least $n-i-j$, the obtained lower bound holds for any subspace of this dimensionality, wherefrom the claim follows immediately. 

Note that if we order the singular values in descending order instead, we would obtain that
$$\gamma_{i+j+1}\le\alpha_{i+1}+\beta_{j+1}$$

**Note**. An interesting immediate consequence is that if we take the singular value evaluation as a linear functional, it is bounded (here take the singlar values as ordered in descending order). Let $\sigma_i^A$ be the $i$-th singular value of $A$.
$$\sigma^A_i\le \sigma^B_i+\sigma_1^{A-B}$$
$$\sigma^B_i\le \sigma^A_i+\sigma_1^{B-A}$$
Since $\sigma_1$ is the greatest singular value here, $\sigma_1^{A-B} = \sigma_1^{B-A} = \|A-B\|$.
$$\sigma_i^A-\sigma_i^B \le \|A-B\|$$
$$\sigma_i^A-\sigma_i^B \ge -\|A-B\|$$
Therefore 
$$|\sigma_i^A-\sigma_i^B|\le\|A-B\|$$

We see that singular values are quite stable under perturbations. We also know (recall the bit about pseudospectra from the lecture) that eigenvalues are not as stable (except for normal matrices), and small perturbations can induce large shifts of spectra. Singular values are stable due to their relation to eigenvalues of Hermitian matrices (all Hermitian matrices are normal)

Finally, we can glean some more results from special cases of Weyl's inequality. Consider again the singular values to be in a descending order and let there be matrices $A,B$ such that $A-B$ is singular, with the last nonzero singular value having index $i$. Then:
$$\sigma^A_{j+i} \le \sigma^B_{j}+\sigma^{A-B}_{i+1} = \sigma^B_{j}$$
And
$$\sigma^B_{j} \le \sigma^A_{j-i}+\sigma^{B-A}_{i+1} = \sigma^A_{j-i}$$
So
$$\sigma^A_{j+i}\le\sigma_j^B\le\sigma^A_{i-j}$$
Results of this kind are called interlacing theorems, since they yield that two matrices related by an additive perturbation alternate their singular values in some two-sided inequality. While the results on stability of singular values described the behaviour of singular values under a low-norm perturbation, interlacing theorems describe behaviour under a low-*rank* perturbation (since the assumption is that the perturbation $A-B$ is of low rank). In particular, any clustering of singular (and eigen-) values will be preserved except for maybe the $r$ trailing or leading ones, where $r$ is the rank of the perturbation matrix. 

From this general interlacing theorem a special case is of interest; this case is of a principal submatrix; let $A$ be Hermitian and $B$ be its principal submatrix; let them be of dimensionality $n$ and $m$, respectively. The special case of interlacing here is called the Cauchy interlacing; its singular-value form (which does not demand Hermitian nature of $A$) can be obtained from the general interlacing above rather easily (the $k$-th singular value of the compression is bounded from below as in the general interlacing result, but the upper bound can be tightened by writing $\|Bx\|\le\|Ax\|\forall x$ and applying the singular value variational definition to both, which will yield the $k$-th singular value of $A$ as the upper bound).

I wish, however, to also demonstrate the Cauchy interlacing for the eigenvalues, since you'll definitely see it proven another way in some future lecture =)

Let now $\alpha, \beta$ be eigenvalues. We will claim that 
$$\alpha_i\ge\beta_i\ge\alpha_{n-m+i} \forall i\le m$$

Suppose wlog that $B$ is formed by omitting rows and columns indexed from $1$ to $n-m$. 
$$\alpha_{i} = \min_{V_{i}}\max_{x\in V_{i}} \frac{\|x\|_A^2}{\|x\|^2}$$
This value is nonincreased if we demand that the projection of $x$ onto the standard basis vectors indexed from $1$ to $n-m$ is zero; however, the resultant value is precisely $\beta_i$. 
Therefore, $\alpha_i\ge\beta_i$. 
The lower bound will then follow from the other variational formulation (max of min). I suggest you show it if you're curious. 

In particular, the Cauchy theorem holds if the principal submatrix is obtained by an orthogonal projection (Poincare separation theorem).

## A little bit on structured matrices

A block matrix commonly seen in numerical solution of PDEs is the saddle-point problem matrix, which reads
$$\begin{bmatrix}
A&B^T\\
B& 0
\end{bmatrix}$$
Or in a generalised form
$$\begin{bmatrix}
A&B^T\\
B& C
\end{bmatrix}$$
With SPD matrix $A$. 

Evidently, this is only solvable if the discrete gradient is nonsingular (technically, $A$ must also be coercive on the kernel of the discrete divergence, but we'll omit that as being out of scope). 

We would like to solve this more efficiently than just throwing GMRES at it. Can we do it?

We can.

Let us conduct formal Gaussian elimination
$$\begin{bmatrix}
A&B^T\\
B& C
\end{bmatrix}$$
$$\begin{bmatrix}
A&B^T\\
0& C-BA^{-1}B^T
\end{bmatrix}$$
YOu may recognise the expression $C-BA^{-1}B^T$. It is precisely the Schur complement of the block $A$! Now we can indeed solve the system efficiently. 
Consider
$$\begin{bmatrix}
A&B^T\\
B& C
\end{bmatrix}\begin{bmatrix}
u\\
p
\end{bmatrix} = \begin{bmatrix}
f\\
g
\end{bmatrix}$$
$$\begin{bmatrix}
A&B^T\\
0& C-BA^{-1}B^T
\end{bmatrix}\begin{bmatrix}
u\\
p
\end{bmatrix} = \begin{bmatrix}
f\\
g
\end{bmatrix}$$

We can first solve 
$$(C-BA^{-1}B^T)p = g$$
And then
$$Au = f-B^Tp$$

In particular, if $C$ is symmetric semidefinite, then the Schur complement is symmetric definite, and we can solve both problems by efficient methods for symmetric definite matrices (you'll learn of them soon enough). This wonderful trick is called the Uzawa method and is well worth learning. 

Moreover, the saddle point system of the kind 
$$\begin{bmatrix}
M&B^T\\
B& 0
\end{bmatrix}$$
Is spectrally equivalent with the Uzawa matrix thereof, where only the diagonal blocks are retained. In fact, if $A$ be the saddle point system, and $B$ be such matrix, then $B^{-1}A$ has only three distinct eigenvalues and will converge extremely quickly. 
Let the diagonal part of the Uzawa matrix be written
$$
\begin{bmatrix}
M & 0\\
0 & S
\end{bmatrix}$$
Then
$$B^{-1}A = 
\begin{bmatrix}
I & M^{-1}B^T\\
S^{-1}B & 0
\end{bmatrix}
$$
For an arbitrary eigenvector,
$$B^{-1}Av = 
\begin{bmatrix}
I & M^{-1}B^T\\
 (S)^{-1}B & 0
\end{bmatrix}v = \begin{bmatrix}
v_1+M^{-1}B^Tv_2\\
(S)^{-1}Bv_1
\end{bmatrix}$$
Ergo,
$$v_1+M^{-1}B^Tv_2 = \lambda v_1$$
$$(S)^{-1}Bv_1 = \lambda v_2$$
Let first $\lambda = 1$. Then $v_2 \in Ker(M^{-1}B)$. We seek $v_1$ s.t. $M^{-1}B^TS^{-1}Bv_1 = 0$. By invertibility of $M$, $B^TS^{-1}Bv_1 = 0$. Assume that $B\in R^{n\times m}$ is a fat matrix ($n<m$) (that does hold when B is a discrete divergence). Then $S^{-1}Bv_1$ is in the kernel of $B^T$. Since this kernel is normal to the range of $B$, and $S$ is bijective, $S^{-1}Bv_1 = 0$. Since by assumption $B$ is not bijective, the kernel dimension is more than zero (by Fredholm alternative, as if we consider $B$ an endomorphism on some $R^n$, where $R^m$ is imbedded into $R^n$ for that purpose, we readily see that since by assumption $n<m$, $B$ cannot possibly be surjective). Pick any vector $v_1$ therefrom. $v_2$ is zero, but $v_1$ need not be. We then recover a non-trivial eigenvector for $\lambda = 1$. Let now $\lambda \ne 1$

Whence:
$$v_1 = \frac{1}{\lambda - 1}M^{-1}B^Tv_2$$
$$\frac{1}{\lambda - 1}(M^{-1}B^T)^{-1}(M^{-1}B^T)v_2 = \lambda v_2$$
$$\frac{1}{\lambda - 1}v_2 = \lambda v_2$$
$$v_2 = (\lambda^2-\lambda)v_2$$
For nontrivial eigenvectors:
$$\lambda^2-\lambda - 1 = 0$$
Whence
$$\lambda = \frac{1}{2}(1\pm\sqrt 5)$$

But what is the property of spectral equivalence?

We call two matrices $A,B$ spectrally equivalent iff $\exists c_1,c_2>0$ such that $\forall x$
$$c_1\|x\|_B\le\|x\|_A\le c_2\|x\|_B$$

It is immediately seen that
$$c_1\le\frac{\|x\|_A}{\|x\|_B}\le c_2$$
Whence by consideration of eigenvectors
$$c_1\le\lambda(B^{-1}A) \le c_2$$
$$\kappa(B^{-1}A)\le\frac{c_2}{c_1}$$
This is an extremely desirable property for something called preconditioning, so do keep it in mind; spectrally equivalent preconditioners are so precious precisely due to this property, which allows to decouple iteration counts for iterative methods from the spatial discretisation parameters. 