# Symmetric Eigenvalue Decomposition - Jacobi Method and High Relative Accuracy

---

The Jacobi method is the oldest method for EVD computations, dating back from 1864. 
The method does not require tridiagonalization.
Instead, the method computes a sequence of orthogonally similar 
matrices which converge to a diagonal matrix of eigenvalues. In each step a simple plane rotation
which sets one off-diagonal element to zero is performed. 

For positive definite matrices, the method computes eigenvalues with high relative accuracy.

For more details, see 
[I. Slapničar, Symmetric Matrix Eigenvalue Techniques][Hog14] and
[Z. Drmač, Computing Eigenvalues and Singular Values to High Relative Accuracy][Hog14a]
and the references therein.

[Hog14]: #1 "L. Hogben, ed., 'Handbook of Linear Algebra', pp. 55.1-55.25, CRC Press, Boca Raton, 2014."

[Hog14a]: #1 "L. Hogben, ed., 'Handbook of Linear Algebra', pp. 59.1-59.21, CRC Press, Boca Raton, 2014."


## Prerequisites

The reader should be familiar with concepts of eigenvalues and eigenvectors, related perturbation theory, and algorithms. 

 
## Competences 

The reader should be able to recognise matrices which warrant high relative accuracy and to apply 
Jacobi method to them.

---

## Jacobi method

$A$ is a real symmetric matrix of order $n$ and $A= U \Lambda  U^T$ is its EVD.

### Definitions

The __Jacobi method__ forms a sequence of matrices,
$$
A_0=A, \qquad A_{k+1}=G(i_k,j_k,c,s) A_k G(i_k,j_k,c,s)^T, \qquad
k=1,2,\ldots,
$$
where $G(i_k,j_k,c,s)$ is the orthogonal __plane rotation matrix__.
The parameters $c$ and $s$ are chosen such that 
$[A_{k+1}]_{i_k j_k}=[A_{k+1}]_{j_k i_k}=0$.

The plane rotation is also called the __Jacobi rotation__. 

The __off-norm__ of $A$ is 
$$
off(A)=\big(\sum_{i}\sum_{j\neq i} a_{ij}^2\big)^{1/2},
$$
that is, off-norm is the Frobenius norm of the
matrix consisting of all off-diagonal elements of $A$.

The choice of __pivot elements__ $[A_k]_{i_kj_k}$ is called the 
__pivoting strategy__.

The __optimal pivoting strategy__, originally used by Jacobi, chooses pivoting
elements such that $|[A_k]_{i_k j_k}|=\max_{i<j} |[A_k]_{ij}|$.

The __row-cyclic__ pivoting strategy chooses pivot elements
  in the systematic row-wise order,
$$
(1,2), (1,3), \ldots,(1,n),(2,3),
(2,4),\ldots,(2,n),(3,4),\ldots,(n-1,n).
$$
Similarly, the column-cyclic strategy chooses pivot elements column-wise.

One pass through all matrix elements is called __cycle__ or __sweep__.


### Facts

1. The Jacobi rotations parameters $c$ and $s$ are computed as follows:
 If $[A_k]_{i_kj_k}=0$, then $c=1$ and $s=0$, otherwise
$$
\tau=\frac{[A_k]_{i_ki_k}-[A_k]_{j_kj_k} }{2[A_k]_{i_kj_k} },\qquad
t=\frac{\mathop{\mathrm{sign}}(\tau)}{|\tau|+\sqrt{1+\tau^2}},\qquad
c=\frac{1}{\sqrt{1+t^2}},\qquad s=c\cdot t.
$$ 

2. After each rotation, the off-norm decreases,
$$
off^2(A_{k+1})=off^2(A_k)-2[A_k]_{i_kj_k}^2.
$$
With the appropriate pivoting strategy, the method converges in the sense that
$$
off(A_k)\to 0,\qquad A_k\to\Lambda, \qquad 
\prod_{k=1}^{\infty} G(i_k,j_k,c,s)^T \to U.
$$

3. For the optimal pivoting strategy
the square of the pivot element is greater than the average squared
element, $[A_k]_{i_kj_k}^2\geq
off^2(A) \frac{1}{n(n-1)}$. Thus,
$$
off^2(A_{k+1})\leq\left(1-\frac{2}{n(n-1)}\right)off^2(A_k)
$$
and the method converges.

4. For the row cyclic and the column cyclic pivoting strategies, the method
converges. The convergence is ultimately __quadratic__ in the sense that 
$off(A_{k+n(n-1)/2})\leq\ const\cdot  off^2(A_k)$, 
provided $off(A_k)$ is sufficiently small.

5. The EVD computed by the Jacobi method satisfies the standard error bounds.

6. The Jacobi method is suitable for parallel computation. There exist convergent parallel
strategies which enable simultaneous execution of several rotations.
  
7. The Jacobi method is simple, but it is slower than the methods based on tridiagonalization. It is
conjectured that standard implementations require $O(n^3\log n)$ operations. More precisely, each cycle clearly requires $O(n^3)$ operations and it is conjectured that $\log n$ cycles are needed until convergence.
 
8. If $A$ is positive definite, the method can be modified such that it reaches
the speed of the methods based on tridiagonalization and at the same time
computes the EVD with high relative accuracy.

### Examples

In [1]:
function myJacobi{T}(A::Array{T})
    n,m=size(A)
    U=eye(T,n)
    # Tolerance for rotation
    tol=sqrt(n)*eps(T)
    # Counters
    p=n*(n-1)/2
    sweep=0
    pcurrent=0
    # First criterion is for standard accuracy, second one is for relative accuracy
    # while sweep<30 && vecnorm(A-diagm(diag(A)))>tol
    while sweep<30 && pcurrent<p
        sweep+=1
        # Row-cyclic strategy
        for i = 1 : n-1 
            for j = i+1 : n
                # Check the tolerance - the first criterion is standard,
                # the second one is for relative accuracy for PD matrices               
                # if A[i,j]!=zero(T)
                if abs(A[i,j])>tol*sqrt(abs(A[i,i]*A[j,j]))
                    # Compute c and s
                    τ=(A[i,i]-A[j,j])/(2*A[i,j])
                    t=sign(τ)/(abs(τ)+sqrt(1+τ^2))
                    c=1/sqrt(1+t^2)
                    s=c*t
                    G=LinAlg.Givens(i,j,c,s)
                    A=G*A
                    # @show
                    A*=G'
                    A[i,j]=zero(T)
                    A[j,i]=zero(T)
                    U*=G'
                    pcurrent=0
                else
                    pcurrent+=1
                end
            end
        end
    end
    # λ, U
    # @show A
    diag(A), U
end

myJacobi (generic function with 1 method)

In [2]:
n=4
A=full(Symmetric(rand(n,n)))

4x4 Array{Float64,2}:
 0.859825  0.625873  0.245435  0.561921
 0.625873  0.97313   0.854558  0.413555
 0.245435  0.854558  0.377692  0.696606
 0.561921  0.413555  0.696606  0.386355

In [3]:
λ,U=myJacobi(A)

([-0.5110259907281365,2.3931528143541243,0.5038727589045998,0.2110026480898031],
4x4 Array{Float64,2}:
  0.264145  0.479409  -0.821547   -0.159544
 -0.366291  0.613623   0.357112   -0.601472
  0.706437  0.464375   0.43904     0.304213
 -0.544981  0.421888  -0.0691035   0.72127 )

In [4]:
U'*U

4x4 Array{Float64,2}:
  1.0           0.0          -5.55112e-17  -3.33067e-16
  0.0           1.0           1.2837e-16   -1.11022e-16
 -5.55112e-17   1.2837e-16    1.0           1.04083e-16
 -3.33067e-16  -1.11022e-16   1.04083e-16   1.0        

In [5]:
A*U-U*diagm(λ)

4x4 Array{Float64,2}:
  1.94289e-16  -4.44089e-16   5.55112e-16   4.16334e-17
  8.32667e-17  -6.66134e-16   5.55112e-17   0.0        
  1.11022e-16  -4.44089e-16  -5.55112e-17   8.32667e-17
 -2.77556e-16  -4.44089e-16   1.66533e-16  -5.55112e-17

In [6]:
# Positive definite matrix
n=100
A=rand(n,n)
A=full(Symmetric(A'*A));

In [7]:
λ,U=myJacobi(A)
norm(U'*U-I),norm(A*U-U*diagm(λ))

(2.918223906221685e-14,3.9587120290006577e-11)

In [8]:
λ

100-element Array{Float64,1}:
 2532.18      
    0.0131241 
   31.1896    
    0.00189852
   29.8623    
    0.147798  
   28.4812    
    0.00333202
   27.7636    
    0.0919647 
    0.341706  
   24.3592    
    3.83947   
    ⋮         
    2.13628   
    2.06048   
    5.04817   
    4.37197   
    5.36254   
    5.7869    
    4.02991   
    3.21938   
    4.51509   
    4.29691   
    3.33341   
    2.84398   

In [9]:
cond(A)

1.3337682453386835e6

In [10]:
# Now the standard QR method
λ,U=eig(A);

In [11]:
norm(U'*U-I),norm(A*U-U*diagm(λ))

(3.050836346283519e-13,3.0408891376813924e-12)

## Relative perturbation theory

$A$  is a real symmetric PD matrix of order $n$  and $A=U\Lambda U^T$ is its EVD.

### Definition

The __scaled matrix__ of the matrix $A$ is the matrix
$$
A_S=D^{-1} A D^{-1}, \quad D=\mathop{\mathrm{diag}}(\sqrt{A_{11}},\sqrt{A_{22}},\ldots,\sqrt{A_{nn}}).
$$

### Facts

1. The above diagonal scaling is nearly optimal: 
$\kappa_2(A_S)\leq  n \min\limits_{D=\mathrm{diag}} \kappa(DHD) \leq n\kappa_2(H)$.

2. Let $A$ and $\tilde A=A+\Delta A$ both be positive definite, and let 
their eigenvalues have the same ordering. Then
$$
\frac{|\lambda_i-\tilde\lambda_i|}{\lambda_i}\leq 
\frac{\| D^{-1} (\Delta A) D^{-1}\|_2}{\lambda_{\min} (A_S)}\equiv
\|A_S^{-1}\|_2 \| \Delta A_S\|_2.
$$
If $\lambda_i$ and $\tilde\lambda_i$ are simple, 
$$
\|U_{:,i}-\tilde U_{:,i}\|_2 \leq \frac{\| A_S^{-1}\|_2 \|\Delta A_S\|_2}
{\displaystyle\min_{j\neq i}\frac{|\lambda_i-\lambda_j|}{\sqrt{\lambda_i\lambda_j}}}.
$$
These bounds are much sharper than the standard bounds for matrices for which $\kappa_2(A_S)\ll \kappa_2(A)$.

3. Jacobi method with the relative stopping criterion $|A_{ij}|\leq tol \sqrt{A_{ii}A_{jj}}$ for all $i\neq j$ and some user defined tolerance $tol$ (usually $tol=n\varepsilon$), computes the EVD with small scaled  backward error $\|\Delta A_S\|\leq \varepsilon\, O(\|A_S\|_2)\leq O(n)\varepsilon$, _provided_ that $\kappa_2([A_k]_S)$  does not grow much during the iterations. There is overwhelming numerical evidence that the scaled condition does not grow much, and the growth can be monitored, as well.

### Example - Scaled matrix


In [12]:
n=10
A=rand(n,n)
A=full(Symmetric(A'*A));
AS=map(Float64,[A[i,j]/sqrt(A[i,i]*A[j,j]) for i=1:n, j=1:n])

10x10 Array{Float64,2}:
 1.0       0.775925  0.78287   0.860785  …  0.795853  0.88939   0.789678
 0.775925  1.0       0.809157  0.862699     0.879892  0.84235   0.713707
 0.78287   0.809157  1.0       0.735156     0.810281  0.766874  0.706796
 0.860785  0.862699  0.735156  1.0          0.721396  0.839212  0.800589
 0.83921   0.860412  0.869796  0.885945     0.853105  0.837846  0.774648
 0.673876  0.807851  0.499119  0.710431  …  0.754399  0.628388  0.463822
 0.5802    0.864702  0.641829  0.700768     0.739358  0.752179  0.731881
 0.795853  0.879892  0.810281  0.721396     1.0       0.895524  0.636262
 0.88939   0.84235   0.766874  0.839212     0.895524  1.0       0.863853
 0.789678  0.713707  0.706796  0.800589     0.636262  0.863853  1.0     

In [13]:
cond(AS)

6438.469701303686

In [14]:
# Strong scaling
D=exp(50*(rand(n)-0.5))

10-element Array{Float64,1}:
    1.3056e10 
  187.247     
    0.519645  
 7480.92      
    8.47713e6 
    4.76788e6 
    0.0105304 
    9.31052e-9
    0.0179848 
    4.95575e-5

In [15]:
H=diagm(D)*AS*diagm(D)

10x10 Array{Float64,2}:
  1.7046e20       1.89691e12  …      2.08837e8      5.10941e5  
  1.89691e12  35061.4                2.83669        0.00662284 
  5.31139e9      78.7326             0.00716696     1.82016e-5 
  8.40739e13      1.20845e6        112.91           0.296807   
  9.28818e16      1.36575e9          1.27737e5    325.434      
  4.19486e16      7.21226e8   …  53883.8          109.594      
  7.9769e7        1.70501            0.000142453    3.8194e-7  
 96.7427          1.53397e-6         1.49953e-10    2.93575e-13
  2.08837e8       2.83669            0.000323451    7.69934e-7 
  5.10941e5       0.00662284         7.69934e-7     2.45594e-9 

In [16]:
cond(H)

5.6811326331675126e38

In [17]:
λ,U=myJacobi(H)

([1.7046028869315386e20,5758.217122299141,0.019613240938357607,9.292167610009085e6,2.5166592786427484e13,8.49444247304353e12,1.551280568030537e-5,3.000449942831261e-19,2.4164452031416344e-5,3.212073835317919e-10],
10x10 Array{Float64,2}:
 1.0           5.2848e-10   -1.478e-11   …  -1.27742e-12  -4.84392e-16
 1.11282e-8    0.999947     -0.00211354     -0.000122616   2.52239e-7 
 3.11591e-11   0.00211793    0.99933         0.0318588    -3.62716e-5 
 4.93217e-7   -0.0100759     3.99904e-5      1.43881e-6   -4.29729e-9 
 0.000544889  -6.35136e-6   -5.16255e-8     -1.92148e-9    1.84727e-12
 0.00024609   -1.31079e-5    6.34499e-8  …   2.954e-9      3.21631e-13
 4.67963e-13   5.58004e-5   -0.0205358       0.084339     -0.003132   
 5.67538e-19   3.68398e-11  -5.2104e-9       4.35468e-7   -7.23864e-5 
 1.22514e-12   5.06493e-5   -0.0302288       0.995926     -0.0013797  
 2.99742e-15   6.91997e-8   -6.97774e-5      0.0016394     0.999994   )

In [18]:
λ1,U1=eig(H)

([1.7046028869315386e20,2.516659278642747e13,8.494442473043522e12,9.292167610008907e6,5758.217151995548,0.01961224046949336,2.4162605482998504e-5,1.5512050894008412e-5,3.2120306213952206e-10,3.000573406232901e-19],
10x10 Array{Float64,2}:
 -1.0           0.000595889  -4.8785e-5    …  -4.84256e-16   1.53703e-19
 -1.11282e-8   -1.64435e-5   -7.25104e-6       2.52265e-7   -4.98721e-12
 -3.11591e-11  -3.12262e-8    6.07341e-8      -3.6278e-5    -5.51285e-9 
 -4.93217e-7   -0.000450076   0.000112905     -4.2976e-9     4.85689e-13
 -0.000544889  -0.874739      0.484594         1.84764e-12  -2.60639e-16
 -0.00024609   -0.484594     -0.874739     …   3.21098e-13  -6.02047e-16
 -4.67963e-13  -1.09884e-9   -9.63041e-11     -0.0031321     5.80933e-8 
 -5.67538e-19  -6.94531e-16  -1.63013e-16     -7.23861e-5    1.0        
 -1.22514e-12  -5.32631e-10   5.38972e-10     -0.00137978   -5.61164e-7 
 -2.99742e-15  -1.32373e-12   4.34529e-12      0.999994      7.23883e-5 )

In [19]:
[sort(λ) sort(λ1)]

10x2 Array{Float64,2}:
    3.00045e-19     3.00057e-19
    3.21207e-10     3.21203e-10
    1.55128e-5      1.55121e-5 
    2.41645e-5      2.41626e-5 
    0.0196132       0.0196122  
 5758.22         5758.22       
    9.29217e6       9.29217e6  
    8.49444e12      8.49444e12 
    2.51666e13      2.51666e13 
    1.7046e20       1.7046e20  

In [20]:
# Check with BigFloat
λ2,U2=myJacobi(map(BigFloat,H))

(BigFloat[1.704602886931538122856351185871238430256551219373736947824283237015803669631453e+20,5.758217122299126406461226239468433881216305268918285140353105574835239413630613e+03,1.961324093835752931604017185551867272983501811770942283461549914904304227267755e-02,9.292167610009090260662261143293723391679526727643864998618661439106457337905984e+06,2.516659278642745819680560300405046437683720299177722417291664324223142363007188e+13,8.494442473043527303655540002213016223940291479218051333055607432909364980257059e+12,1.551280568030533419322434899294658188958496114035252442188782028723514485502047e-05,3.000449942831141075570423457682782461166823182810430828100624400149373840306707e-19,2.416445203141586139525969140066794316781201716732667162342314187346897265019025e-05,3.212073835317911577788315892719004978421767334634959484617223468628611988765146e-10],
10x10 Array{BigFloat,2}:
 9.999998212678885557743832713966007372373812163851926291492249369904541548866298e-01  …  -4.84392286610666584860

In [21]:
# Relative error is eps()*cond(AS)
(sort(λ2)-sort(λ))./sort(λ2)

10-element Array{BigFloat,1}:
 -3.999179931145441112702375974089419511298710550913244690217470891489332018129192e-14
 -2.306100765461286095081901666002979972306439586915259834290159435133537534470686e-15
 -2.253284861014899976751857483263722176432996190091471233673661381397945937285884e-15
 -1.998970492048416430045027175766613016378993370181437895279101368694365541505427e-14
 -3.982669614836893846312651647180995759458308334260618798595021499950504334863851e-15
 -2.536198724229273532807567562620040349893096007761613519969131798997053420308456e-15
  5.244923428887690677137345100226672103928146045758583318276622090228122075604643e-16
 -3.496147003669947861156990725333293991158798653517648610859410537849789851807331e-16
 -1.040196208487708199373214190631126598117557991739115376316957831008931507892269e-15
 -2.654598629881955855775874571362378843782143927382395304004922771867755034629747e-16

## Indefinite matrices

### Definition

__Spectral absolute value__ of the matrix $A$ is the matrix is $|A|_{spr}=(A^2)^{1/2}$ (positive definite part of the polar decomposition of $A$).

### Facts

1. The above perturbation bounds for positive definite matrices essentially hold with $A_S$ replaced by $[|A|_{spr}]_S$.

2. Jacobi method can be modified to compute the EVD with small backward error 
$\| \Delta [|A|_{spr}]_S\|_2$.

The details of the indefinite case are beyond the scope of this course, and the reader should consider references.