# Algorithme du gradient conjugué

In [1]:
using LinearAlgebra
using Optim

Résolvez
$$
\min_x f(x) = \frac{1}{2} x^T A x + b^T x + a
$$
où $A \succ 0$. En posant $\nabla f(x) = 0$, c'est équivalent à résoudre le système linéaire $Ax = -b$.

Construisons la fonction quadratique associée au programme précédent.

In [2]:
f = x -> 0.5*dot(x,A*x)+dot(b,x)

#1 (generic function with 1 method)

## Un exemple simple

Adapté de https://www.rose-hulman.edu/~bryan/lottamath/congrad.pdf

Soit
$$
A =
\begin{pmatrix}
3 & 1 & 0 \\
1 & 2 & 2 \\
0 & 2 & 4
\end{pmatrix}
$$
Considérons la fonction à minimiser
$$
f(x) = \frac{1}{2} x^TAx,
$$
et supposons que nous avons déjà calculer
\begin{align*}
d_0 &= (1, 0, 0)\\
d_1 & = (1, −3, 0)\\
d_2 &= (−2, 6, −5).
\end{align*}

Vérifions que $d_0$, $d_1$ et $d_2$ sont $A$-conjugés.

In [3]:
A = [ 3.0 1 0 ; 1 2 2 ; 0 2 4]
d0 = [ 1.0 0 0 ]'
d1 = [ 1.0 -3.0 0.0 ]'
d2 = [ -2.0 6.0 -5.0]'

println("$(dot(d0, A*d1)) $(dot(d0, A*d2)) $(dot(d1, A*d2))")

0.0 0.0 0.0


In [4]:
eigen(A)

Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
3-element Array{Float64,1}:
 0.47108204270564347
 3.167449191108536
 5.361468766185826
vectors:
3×3 Array{Float64,2}:
 -0.325306   0.916757  0.231804
  0.822673   0.15351   0.547398
 -0.466246  -0.368771  0.804128

Prenons comme solution initiale $x_0 = (1, 2, 3)$. Calculons $x_1$, $x_2$ et $x_3$ en utilisant l'algorithme du gradient conjugué. $x_3$ est-il optimal?

$$
\nabla f(x) = Ax
$$

In [5]:
x0 = [1; 2; 3.0]
-A*x0

3-element Array{Float64,1}:
  -5.0
 -11.0
 -16.0

In [6]:
f = x -> dot(x,A*x)

#3 (generic function with 1 method)

Nous devons calculer $\alpha_k$, $k = 0,1,2$, en résolvant
$$
\min_{\alpha} f(x_k + \alpha d_k)
$$

Afin d'obtenir $\alpha_0$, nous devons minimiser
\begin{align*}
f(x_0 + \alpha d_0) &= \frac{1}{2}
\left(\begin{pmatrix} 1 & 2 & 3\end{pmatrix} + \alpha \begin{pmatrix} 1 & 0 & 0\end{pmatrix} \right)
\begin{pmatrix}
3 & 1 & 0 \\
1 & 2 & 2 \\
0 & 2 & 4
\end{pmatrix}
\left(\begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix} + \alpha \begin{pmatrix} 1 \\0 \\0 \end{pmatrix} \right)
\\
& = \frac{1}{2}\begin{pmatrix} 1 + \alpha & 2 & 3 \end{pmatrix}
\begin{pmatrix}
3 & 1 & 0 \\
1 & 2 & 2 \\
0 & 2 & 4
\end{pmatrix}
\begin{pmatrix} 1 + \alpha \\ 2 \\ 3 \end{pmatrix}\\
& = \frac{1}{2}\begin{pmatrix} 1 + \alpha & 2 & 3 \end{pmatrix}
\begin{pmatrix} 5+3\alpha \\ 11+\alpha \\ 16 \end{pmatrix}\\
& = \frac{1}{2}
((1 + \alpha)(5+3\alpha) + 22+2\alpha + 48 ) \\
& = \frac{1}{2}
( 3\alpha^2 + 8\alpha + 5 + 70 + 2\alpha ) \\
& = \frac{3}{2}\alpha^2 + 5\alpha+\frac{75}{2}
\end{align*}
par rapport à $\alpha$.

Nous pouvons l'obtenir en cherchant le zéro de la dérivée par rapport à $\alpha$, c'est-à-dire
$$
\frac{d}{d\alpha} f(x+\alpha d) = 0,
$$
ou

$$
d^T \nabla f(x+\alpha d) = 0
$$

Dès lors, nous devons avoir

$$
3\alpha + 5 = 0
$$
Ainsi,
$$
\alpha_{0} = -\frac{5}{3}
$$
$$
x_1 = x_0 - \frac{5}{3} d_0 = \begin{pmatrix} -\frac{2}{3} \\ 2 \\ 3  \end{pmatrix}
$$

Nous pouvons aussi directement calculer $\alpha_0$ comme
$$
\alpha_0 = - \frac{d_0^T\nabla f(x_0)}{d_0^TAd_0}
$$

In [7]:
x0 = [1 ; 2 ; 3.0]
∇f = x -> A*x

#5 (generic function with 1 method)

In [8]:
d0 = [1 ; 0 ; 0]
α0 = -dot(d0,∇f(x0))/dot(d0,A*d0)

-1.6666666666666667

In [9]:
x1 = x0+α0*d0

3-element Array{Float64,1}:
 -0.6666666666666667
  2.0
  3.0

Une recherche linéaire à partir de $x_1$ dans la direction $d_1$ exige de minimiser
\begin{align*}
f(x_1 + \alpha d_1) & = \left(\begin{pmatrix} -\frac{2}{3} & 2 & 3 \end{pmatrix} + \alpha_1\begin{pmatrix} 1 & -3 & 0 \end{pmatrix} \right)\begin{pmatrix} 3 & 1 & 0 \\
1 & 2 & 2 \\
0 & 2 & 4 \end{pmatrix}\left(\begin{pmatrix}  -\frac{2}{3} \\ 2 \\ 3 \end{pmatrix} +  \alpha_1\begin{pmatrix} 1 \\ -3 \\ 0 \end{pmatrix} \right) \\
& =\frac{15}{2}\alpha^2 - 28\alpha + \frac{100}{3},
\end{align*}
ce qui a lieu en
$$
\alpha_1 = \frac{28}{15},
$$
donnant
$$
x_2 = x_1 + \frac{28}{15}d_1 =
    \begin{pmatrix}
     \frac{6}{5} \\ \frac{-18}{5} \\ 3
    \end{pmatrix}.
$$

In [11]:
norm([-2/3; 2 ;3]), norm([6/5; -18/5 ; 3])

(3.6666666666666665, 4.8373546489791295)

In [12]:
α1 = -dot(d1,A*x1)/dot(d1,A*d1)

1.8666666666666665

In [13]:
28/15

1.8666666666666667

In [14]:
x2 = x1+α1*d1

3×1 Array{Float64,2}:
  1.1999999999999997
 -3.5999999999999996
  3.0

In [15]:
norm(x1), norm(x2)

(3.6666666666666665, 4.8373546489791295)

La recherche linéaire finale à partir de $x_2$ dans la direction $d_2$ requiert de minimiser
$$
f(x_2 + \alpha d_2) = 20 \alpha^2 - 24\alpha + \frac{36}{5},
$$
ce qui a lieu en
$$
\alpha_2 = \frac{3}{5},
$$
donnant
$$
x_3 = x_2 + \frac{3}{5}d_2 =
    \begin{pmatrix}
     0 \\ 0 \\ 0
    \end{pmatrix},
$$
ce qui est bien entendu correct.

Similairement, nous pouvons calculer le nouveau point comme

In [16]:
α2 = -dot(d2,A*x2)/dot(d2,A*d2)
x3 = x2+α2*d2

3×1 Array{Float64,2}:
 -4.440892098500626e-16
  8.881784197001252e-16
 -4.440892098500626e-16

## Une implémentation naïve

Une première version de l'algorithme du gradient conjugué suit.

In [17]:
function cg_quadratic(A:: Matrix, b:: Vector, x0:: Vector, trace:: Bool = false)
    n = length(x0)
    x = x0
    g = b+A*x
    d = -g
    if (trace)
        iter = [ x ]
        iterg = [ norm(g) ]
        iterd = [ norm(d) ]
    end
    k = 0
    
    for k = 1:n-1
        Ad = A*d
        normd = dot(d,Ad)
        α = -dot(d,g)/normd
        x += α*d
        if (trace)
            iter = [ iter; [x] ]
            iterg = [ iterg; norm(g)]
            iterd = [ iterd; norm(d) ]
        end
        g = b+A*x
        β = dot(g,Ad)/normd
        d = -g+β*d
    end

    normd = dot(d,A*d)
    α = -dot(d,g)/normd
    x += α*d
    if (trace)
        g = b+A*x # g must be equal to 0
        iter = [ iter; [x] ]
        iterg = [ iterg; norm(g)]
        iterd = [ iterd; norm(d) ]
        return x, iter, iterg, iterd
    end
    
    return x
end

cg_quadratic (generic function with 2 methods)

Considérons l'exemple simple

In [18]:
A = [2 1; 1 2]
b = [1, 0]
A\(-b)

2-element Array{Float64,1}:
 -0.6666666666666666
  0.3333333333333333

Nous voulons résoudre
$$
    \min_{\alpha} f(x) = \frac{1}{2}x^TAx+b^Tx+c
$$

Ou, de manière équivalente, nous résolvons
$$
    c+\min_{\alpha} f(x) = \frac{1}{2}x^TAx+b^Tx
$$

In [19]:
cg_quadratic(A, b, [0, 0], true)

([-0.6666666666666666, 0.3333333333333333], [[0.0, 0.0], [-0.5, 0.0], [-0.6666666666666666, 0.3333333333333333]], [1.0, 1.0, 0.0], [1.0, 1.0, 0.5590169943749475])

Que se passe-t-il si $A$ n'est pas définie positive?

In [20]:
A = [ 1 2 ; 2 1]
A\(-b)

2-element Array{Float64,1}:
  0.3333333333333333
 -0.6666666666666666

In [21]:
cg_quadratic(A, b, [0, 0], true)

([0.33333333333333326, -0.6666666666666666], [[0.0, 0.0], [-1.0, 0.0], [0.33333333333333326, -0.6666666666666666]], [1.0, 1.0, 1.1102230246251565e-16], [1.0, 1.0, 4.47213595499958])

In [22]:
det(A)

-3.0

In [23]:
eigen(A)

Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
2-element Array{Float64,1}:
 -1.0
  3.0
vectors:
2×2 Array{Float64,2}:
 -0.707107  0.707107
  0.707107  0.707107

In [24]:
cg_quadratic(A, b, [1, 1], true)

([0.3333333333333335, -0.6666666666666667], [[1.0, 1.0], [-0.36986301369863006, -0.02739726027397249], [0.3333333333333335, -0.6666666666666667]], [5.0, 5.0, 2.220446049250313e-16], [5.0, 5.0, 0.9763790695367754])

In [25]:
f([1/3,-2/3])

-0.3333333333333333

In [26]:
f([0,0])

0

Le gradient conjugué trouve la solution du système linéaire, laquelle correspond à un point critique au premier ordre de la fonction.

In [27]:
∇f = x -> A*x+b

#7 (generic function with 1 method)

In [28]:
x = [1.0/3; -2.0/3]
∇f(x)

2-element Array{Float64,1}:
 0.0
 0.0

In [29]:
x = [1; 1]
∇f(x)

2-element Array{Int64,1}:
 4
 3

In [30]:
step= x -> x-α*∇f(x)

#9 (generic function with 1 method)

In [31]:
α = 10
dot(step(x),A*step(x))

6886

In [32]:
λ, u = eigen(A)

Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
2-element Array{Float64,1}:
 -1.0
  3.0
vectors:
2×2 Array{Float64,2}:
 -0.707107  0.707107
  0.707107  0.707107

In [33]:
u

2×2 Array{Float64,2}:
 -0.707107  0.707107
  0.707107  0.707107

In [36]:
x = u[:,1] # premier vecteur propre associé à λ = -1
A*x

2-element Array{Float64,1}:
  0.7071067811865475
 -0.7071067811865475

In [38]:
1.0-norm(x)

1.1102230246251565e-16

In [39]:
α = 10
f = x -> 0.5*dot(x,A*x)+dot(b,x)
f(step(x))

-106.05992052357223

In [40]:
α = 1000
dot(step(x),A*step(x))+dot(b,x)

-1.417629483042249e6

In [41]:
f(x)

-1.2071067811865475

In [43]:
x = [1/3.0; -2/3]
f(x)

0.16666666666666666

In [45]:
cg_quadratic(A, b, x, true)

([NaN, NaN], [[0.3333333333333333, -0.6666666666666666], [NaN, NaN], [NaN, NaN]], [0.0, 0.0, NaN], [0.0, 0.0, NaN])

Nous devons incorporer un test sur $\nabla f(x_k)$!

In [46]:
A = [ 1 2 ; 0 4 ]
eigen(A)

Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
2-element Array{Float64,1}:
 1.0
 4.0
vectors:
2×2 Array{Float64,2}:
 1.0  0.5547
 0.0  0.83205

In [47]:
eigen(A*A')

Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
2-element Array{Float64,1}:
  0.7917560805262003
 20.2082439194738
vectors:
2×2 Array{Float64,2}:
 -0.885022  0.465549
  0.465549  0.885022

In [48]:
A = [ 3 1; 1 2 ]
eigen(A)

Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
2-element Array{Float64,1}:
 1.381966011250105
 3.618033988749895
vectors:
2×2 Array{Float64,2}:
  0.525731  -0.850651
 -0.850651  -0.525731

In [49]:
eigen(A*A')

Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
2-element Array{Float64,1}:
  1.9098300562505257
 13.090169943749475
vectors:
2×2 Array{Float64,2}:
  0.525731  -0.850651
 -0.850651  -0.525731

Un exemple plus complexe.

In [50]:
n = 500;
m = 600;
A = randn(n,m);
A = A * A';  # A is now a positive semi-definite matrix
A = A+I # A is positive definite
b = zeros(n)
for i = 1:n
  b[i] = randn()
end
x0 = zeros(n)

500-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 ⋮
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

In [51]:
b1 = A\(-b)

500-element Array{Float64,1}:
  0.013689587789839782
 -0.01885315005133297
  0.0318160978874517
  0.02018141425238513
  0.01577563597684514
 -0.0017730089313002181
 -0.06801379207495274
 -0.0016406184215073923
  0.0017004913653938
  0.00452904556616134
  0.0008935469903653916
 -0.029372800635081242
  0.00017504150813374856
  ⋮
  0.044209737900243636
 -0.048861614988629135
 -0.010666867889729315
 -0.029847313763195597
  0.014563686988772574
  0.023240255799628032
 -0.0045103112775465555
  0.015997107771635787
  0.004398135807853763
  0.033231357923669734
 -0.002515508470793067
 -0.014162210400825045

In [52]:
b2, iter, iterg, iterd = cg_quadratic(A, b, x0, true);

In [53]:
norm(b1-b2)

1.734523990726017e-15

In [54]:
iterg

501-element Array{Float64,1}:
 22.831483117055008
 22.831483117055008
 21.686547381952103
 18.47405790416008
 17.679856509828976
 15.479270961191991
 15.054833502002992
 13.814610447800193
 12.63503173040512
 11.52554330836719
  9.575193528243885
  8.789891097025146
  8.034678645994301
  ⋮
  1.4908635599264138e-13
  1.5371720924180976e-13
  1.4106127271118373e-13
  1.3383977941484951e-13
  1.379272568519458e-13
  1.4489099844039407e-13
  1.5035721853910728e-13
  1.5847713661217427e-13
  1.498716828183743e-13
  1.5549970131117184e-13
  1.4955984269052359e-13
  1.5018105908603662e-13

In [55]:
iterd

501-element Array{Float64,1}:
 22.831483117055008
 22.831483117055008
 29.910303425502924
 28.502776208269697
 31.52834999401924
 28.700365586087482
 31.04293159827266
 29.564980127202784
 27.772256332956076
 25.823717874841616
 20.232614304274787
 19.182387728850454
 17.928887193547382
  ⋮
  1.4708663821848665e-13
  1.5927685062878838e-13
  1.4176822762880018e-13
  1.3711902855630244e-13
  1.4273471264141864e-13
  1.4652378455710525e-13
  1.5095170394201812e-13
  1.6600005925472444e-13
  1.5576346396201643e-13
  1.558812612998778e-13
  1.541227411837348e-13
  1.509204050578899e-13

Cela fonctionne, mais devons-nous vraiment faire 500 itérations? Nous serions satisfaits si nous sommes proches de la solution. Nous pouvons mesurer le résidu du système linéaire residual of the linear system
$$
r = b+Ax,
$$
ce qui n'est rien d'autre que le gradient de la fonction objectif du problème de minimisation quadratique.

In [56]:
iter

501-element Array{Array{Float64,1},1}:
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [-0.00017027618072256812, 0.0002731925457880143, 0.0010684115984472631, -0.0004017684535668579, -0.0010245312224726674, -0.0033353595759318296, -0.0011646300744951321, -0.002333894483727998, -0.0007008411809605268, -0.0016820699483587867  …  -0.00033178888615875105, -0.0012883893166555226, -0.0007292784898442888, -0.0004473856499263748, 0.00011689192235378118, 0.003198233491125432, -0.00021722337122564498, 0.0018325294753949153, -0.0006225107923252299, 0.00033662030259510396]
 [-3.825733040628996e-5, -0.0003223953427695385, 0.004278562464667181, -0.0011602153475444349, 0.00037342277557283945, -0.0033164358576595614, -0.00658636068216885, 1.0174343228482909e-5, -0.0019774710069156262, -0.003995304270415299  …  -0.0021765754771554506, -0.0037361475689225527, -0.0007448318703374449, -0.000944699994161559, -0.0013794303177228529, 0.004659504036842

Nous devons inclure un test de convergence dans la fonction.

In [57]:
function cg_quadratic_tol(A:: Matrix, b:: Vector, x0:: Vector, trace:: Bool = false, tol = 1e-8)
    n = length(x0)
    x = x0
    if (trace)
        iter = [ x ]
    end
    g = b+A*x
    d = -g
    k = 0
    
    tol2 = tol*tol

    β = 0.0

    while ((dot(g,g) > tol2) && (k <= n))
        Ad = A*d
        normd = dot(d,Ad)
        α = dot(g,g)/normd
#        α = -dot(d,g)/normd
        x += α*d
        if (trace)
            iter = [ iter; x ]
        end
        g = b+A*x
        β = dot(g,Ad)/normd
        d = -g+β*d
        k += 1
    end

    if (trace)
        iter = [ iter; x ]
        return x, iter, k
    end

    return x, k
end

cg_quadratic_tol (generic function with 3 methods)

In [58]:
x, iter, k = cg_quadratic_tol(A, b, x0, true)

([0.013689587798574444, -0.018853150047148094, 0.03181609788253195, 0.020181414253286397, 0.015775635994805418, -0.0017730089268754073, -0.06801379208995433, -0.0016406184374460314, 0.0017004913597049953, 0.00452904557032978  …  -0.010666867898507406, -0.029847313749673355, 0.014563686983419375, 0.023240255786818563, -0.004510311269642599, 0.015997107774918345, 0.004398135823719092, 0.033231357919214256, -0.0025155084716944457, -0.014162210399088179], Any[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], -0.00017027618072256812, 0.0002731925457880143, 0.0010684115984472631, -0.0004017684535668579, -0.0010245312224726674, -0.0033353595759318296, -0.0011646300744951321, -0.002333894483727998, -0.0007008411809605268  …  -0.010666867898507406, -0.029847313749673355, 0.014563686983419375, 0.023240255786818563, -0.004510311269642599, 0.015997107774918345, 0.004398135823719092, 0.033231357919214256, -0.0025155084716944457, -0.014162210399

Le nombre d'itérations est

In [59]:
k

192

Sommes-nous proche de la solution?

In [60]:
norm(b1-x)

1.9379487536581979e-10

In [62]:
size(A)

(500, 500)

ce qui est nettement moindre que la dimension du problème.

## Gradient conjugué préconditionné

Si le nombre de conditionnement est égal à 1, nous convergeons en une itération.

Rappelons que le nombre de conditionnement d'une matrice $A$ définie positive est donné par
$$
\kappa(A) = \frac{\lambda_{\max}}{\lambda_{\min}}.
$$
$\kappa(A) = 1$ ssi $A = \gamma I$. Dans ce cas
$$
A = \begin{pmatrix} \gamma & 0 & \cdots & 0 \\ 0 & \gamma & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \gamma \end{pmatrix}.
$$
Observons que $\lambda_{\max} = \lambda_{\min} = \gamma$.

Le problème quadratique devient alors
$$
f(x) = \frac{1}{2}\gamma x^Tx + b^Tx.
$$

Son gradient est
$$
\nabla f(x) = \gamma x + b.
$$
Il s'annule si
$$
x = -\frac{b}{\gamma}.
$$

Soit $x_0$. L'algorithme du gradient conjugué donne comme première de recherche $d_0 = -\nabla f(x_0) = -\gamma x_0 - b$.

Nous avons aussi
$$
\alpha_0 = - \frac{d_0^T\nabla f(x_0)}{d_0^TAd_0} = \frac{\| d_0 \|^2}{\gamma \| d_0 \|^2} = \frac{1}{\gamma}.
$$

Le premier itéré donne
$$
x_1 = x_0 + \alpha_0 d_0 = x_0 + \frac{1}{\gamma} (-\gamma x_0 - b) = -\frac{b}{\gamma}.
$$
ce qui correspond bien à la solution!

Si la matrice $A$ est diagonale et tous les éléments de la diagonale sont identiques, la direction de plus forte pente donne le minimum global.

Une implémentation basique d'un algorithme de gradient préconditionné suit, où $M$ est l'inverse du préconditioneur à appliquer.

In [1]:
function pcg_quadratic_tol(A:: Matrix, b:: Vector, x0:: Vector, M:: Matrix,
                           trace:: Bool = false, tol = 1e-8)
    n = length(x0)
    x = x0
    if (trace)
        iter = [ x ]
    end
    g = b+A*x
    v = M*g
    d = -v
    k = 0
    
    tol2 = tol*tol

    β = 0.0

    gv = dot(g,v)
    while ((gv > tol2) && (k <= n))
#    while ((dot(g,g) > tol2) && (k <= n))
        Ad = A*d
        normd = dot(d,Ad)
        #gv = dot(g,v)
        α = gv/normd
        x += α*d
        if (trace)
            iter = [ iter; x ]
        end
        g += α*Ad
        v = M*g
        gvold = gv
        gv = dot(g,v)
        β = gv/gvold
        d = -v+β*d
        k += 1
    end

    if (trace)
        iter = [ iter; x ]
        return x, iter, k
    end

    return x, k
end

pcg_quadratic_tol (generic function with 3 methods)

Let's check first that when there is no preconditioning, we obtain the same iterates.
Set

In [None]:
M = zeros(n,n)+I
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)

In [None]:
k, norm(x-b1)

We can compute the eigenvalues and condition number of $A$.

In [None]:
eigen(A)

In [None]:
cond(A)

Try to compute a simple precontionner using the inverse of the diagonal of matrix $A$.

In [None]:
D = 1 ./diag(A)
M = Diagonal(D)

Unfortunately, in this case, it does not help as the condition number is not improving.

In [None]:
B = M*A
cond(B)

Consider another situation when $A$ is diagonal.

In [None]:
n = 1000;
A = zeros(n,n);
for i = 1:n
    A[i,i] = 10*rand()
end
b = zeros(n)
for i = 1:n
  b[i] = rand()
end
x0 = zeros(n)
cond(A)

The solution we are looking for is

In [None]:
A\b

Without preconditionning, with have the iterates sequence

In [None]:
M = zeros(n,n)+I
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)

This is equivalent to the unpreconditioned version.

In [None]:
x, iter, k = cg_quadratic_tol(A, b, x0, true)

However, since $A$ is diagonal, an obvious diagonal preconditionner is $A^{-1}$ itself.

In [None]:
M = zeros(n,n)
for i = 1:n
    M[i,i] = 1/A[i,i]
end

The condition number of the preconditioned matrix is of course equal to 1.

In [None]:
cond(M*A)

The theory then predicts that we converge in one iteration with the precionditionned conjugate gradient.

In [None]:
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)

Consider now another example.

In [None]:
A = zeros(n,n)+3*I
for i = 1:n-1
    A[i,i+1] = 1.4
    A[i+1,i] = 1.4
end
A

In [None]:
eigen(A)

In [None]:
A\(-b)

In [None]:
x, iter, k = cg_quadratic_tol(A, b, x0, true)

In [None]:
M = zeros(n,n)
for i = 1:n
    M[i,i] = 1/A[i,i]
end

In [None]:
cond(A)

In [None]:
cond(M*A)

In [None]:
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)

There is no advantage.

In [None]:
M = A^(-1)

In [None]:
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)

Consider now the following example.

In [None]:
n = 1000
A = zeros(n,n)+Diagonal([2+i*i for i=1:n])

In [None]:
for i = 1:n-1
    A[i,i+1] = 1
    A[i+1,i] = 1
end
A[n,1] = 1
A[1,n] = 1
cond(A)

In [None]:
κ = cond(A)
(sqrt(κ)-1)/(sqrt(κ)+1)

In [None]:
A

In [None]:
A^(-1)

In [None]:
M = zeros(n,n)+I
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)

In [None]:
M = zeros(n,n)
for i = 1:n
    M[i,i] = 1/A[i,i]
end
cond(A*M), cond(A)

In [None]:
M

In [None]:
A*M

In [None]:
x, iter, k = pcg_quadratic_tol(A, b, x0, M, true)

In [None]:
function pcg_quadratic(A:: Matrix, b:: Vector, x0:: Vector, M:: Matrix,
                       trace:: Bool = false, tol = 1e-8)
    n = length(x0)
    x = x0
    if (trace)
        iter = [ x ]
    end
    g = b+A*x
    v = M\g
    d = -v
    k = 0
    
    tol2 = tol*tol

    β = 0.0

    gv = dot(g,v)
    while ((gv > tol2) && (k <= n))
#    while ((dot(g,g) > tol2) && (k <= n))
        Ad = A*d
        normd = dot(d,Ad)
        #gv = dot(g,v)
        α = gv/normd
        x += α*d
        if (trace)
            iter = [ iter; x ]
        end
        g += α*Ad
        v = M\g
        gvold = gv
        gv = dot(g,v)
        β = gv/gvold
        d = -v+β*d
        k += 1
    end

    if (trace)
        iter = [ iter; x ]
        return x, iter, k
    end

    return x, k
end

In [None]:
function ichol(A:: Matrix)

    n = size(A,1)
    C = zeros(n,n)+I
    
    for k=1:n
        C[k,k] = sqrt(A[k,k])
        for i=(k+1):n
            if (A[i,k] != 0)
                C[i,k] = A[i,k]/A[k,k]    
            end
        end
        for j=(k+1):n
            for i=j:n
                if (A[i,j] != 0)
                    C[i,j] = A[i,j]-A[i,k]*A[j,k]
                end
            end
        end
    end

    return C
end

In [None]:
C = cholesky(A)
C.L

In [None]:
M = C.L*C.U

In [None]:
x, iter, k = pcg_quadratic(A, b, x0, M, true)

In [None]:
C = ichol(A)

In [None]:
M=C*C'

In [None]:
x, iter, k = pcg_quadratic(A, b, x0, M, true)

An efficient implementation would make use of sparse matrices and specific functions to compute v.