<a href="https://colab.research.google.com/github/stephenbeckr/convex-optimization-class/blob/main/Demos/ConjugateGradientDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Conjugate Gradient
... and related Krylov subspace methods

For solving $Ax=b$ and related problems (e.g., least-squares).  Use Krylov subspace methods if all of the following criteria are met:
1. $A$ is very large (let $A$ be $n\times n$)
2. The multiply $Ax$ can be done faster than $O(n^2)$, e.g.
  - $A$ is very sparse
  - $A$ is from a fast operator, like a FFT
3. $A$ is somewhat well-conditioned, and/or you don't need too much accuracy

Just how large, or how well-conditioned, or how sparse depends, and there's no simple answer (other than just try it)

APPM 5630 Advanced Convex Optimization, Spring 2025, Becker

Note: in scipy, see [`scipy.sparse.linalg`](https://docs.scipy.org/doc/scipy/reference/sparse.linalg.html) to access Krylov solvers, even if your matrix isn't sparse (in that case, use the `LinearOperator` class)
- in practice, a main issue is finding good **preconditioners**
- if you want to solve a quadratic problem, do **not** apply CG to the normal equations (mathematically, this is fine, as then the quadratic term is symmetric positive definite). Use something like `lsqr` or `lsmr` or [`cgls`](https://web.stanford.edu/group/SOL/software/cgls/) which are designed to solve this problem and are mathematically equivalent (but do numerical tricks to make them more stable for ill-conditioned matrices)

In [1]:
import numpy as np
import scipy.sparse as sps
import scipy.sparse.linalg
import scipy.linalg as sla
from numpy.linalg import norm
import time

Create a sparse $10\times 10$ matrix

In [2]:
n   = int(1e1)

rng   = np.random.default_rng(1)
# Make sure A is invertrible by adding identity to it
A   = sps.random(n,n,density=0.1,format='csr',random_state=rng) + .1*sps.eye(n)
# A   = sps.random(n,n,density=0.1,format='csr') + .01*sps.eye(n) # worst conditioned, LSQR/CG struggles
b   = rng.normal(size=(n,1))

print(f'condition number is {np.linalg.cond( A.toarray() ):.1f}' )


condition number is 211.6


In [3]:
A

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 20 stored elements and shape (10, 10)>

In [4]:
with np.printoptions(precision=3, suppress=True):
  print( A.toarray() )

[[0.1   0.    0.    0.    0.    0.    0.    0.403 0.    0.   ]
 [0.    0.1   0.    0.754 0.    0.    0.    0.    0.    0.   ]
 [0.    0.    0.1   0.    0.    0.    0.    0.    0.    0.303]
 [0.788 0.134 0.    0.1   0.538 0.    0.    0.    0.    0.   ]
 [0.    0.    0.262 0.    0.1   0.    0.    0.    0.    0.   ]
 [0.    0.    0.    0.    0.    0.1   0.    0.    0.    0.   ]
 [0.    0.    0.    0.    0.    0.    0.1   0.    0.    0.   ]
 [0.    0.    0.    0.    0.453 0.    0.    0.1   0.    0.   ]
 [0.    0.    0.    0.    0.    0.    0.    0.    0.1   0.   ]
 [0.    0.    0.    0.    0.    0.    0.    0.33  0.203 0.1  ]]


### Get a reference solution using "direct" methods
Either explicitly convert $A$ to a dense (standard) matrix type and use standard linear algebra (e.g., LU factorization / Gaussian Elimination), or use the sparse solvers (which do a sparse version of Gaussian Elimination, trying to minimize "fill-in").  Both should be as accurate as we can do in floating point arithmetic...

In [7]:
x  = sla.solve(A.toarray(),b) # Dense solve
x2 = sps.linalg.spsolve(A,b)   # Direct solve, taking advantage of sparsity
print(f"Discrepancy in the two solutions is {norm(x.ravel()-x2.ravel()):.2e}")

Discrepancy in the two solutions is 2.52e-15
<class 'numpy.ndarray'> <class 'numpy.ndarray'>


### Now compare a reference solution to a Krylov subspace method
We don't use "CG" (conjugate gradients) exactly, since that's for symmetric positive definite systems, but we'll use similar methods (all **Krylov subspace** methods) to illustrate the same point

In [10]:
x = sla.solve(A.toarray(),b) # Dense solve, reference solution
# xKrylov, info = sps.linalg.minres(A,b,maxiter=1000) # only if A is symmetric
xKrylov = sps.linalg.lsqr(A,b)[0]
print(f"Discrepancy in the two solutions is {norm(x.ravel()-xKrylov.ravel()):.2e}")

# If it's ill-conditioned, the two "x" may not be similar, but check
#  that the residual is small:
print(f"Residual ||Ax-b|| for dense solve is {norm(A@x-b):.2e}, and is {norm(A@xKrylov-b.ravel()):.2e} for LSQR/CG")

# Let's redo the Krylov solver, asking for smaller residual
xKrylov = sps.linalg.lsqr(A,b,atol=1e-12,btol=1e-12,iter_lim=int(1e5))[0]
print("... and now re-solving with Krylov solver, for a tighter tolerance and more iterations ... ")
print(f"Residual ||Ax-b|| for dense solve is {norm(A@x-b):.2e}, and is {norm(A@xKrylov-b.ravel()):.2e} for LSQR/CG")

Discrepancy in the two solutions is 3.36e-05
Residual ||Ax-b|| for dense solve is 1.24e-15, and is 3.43e-05 for LSQR/CG
... and now re-solving with Krylov solver, for a tighter tolerance and more iterations ... 
Residual ||Ax-b|| for dense solve is 1.24e-15, and is 1.18e-11 for LSQR/CG


In [11]:
# Be careful with some bugs!
print( norm( b - b) )
print( norm( b - b.ravel() ) )

0.0
16.27126842092771


# Larger example

In [14]:
n   = int(5e3)
rng   = np.random.default_rng(1)
A   = sps.random(n,n,density=0.01,format='csr',random_state=rng) + 10*sps.eye(n)
b   = rng.normal(size=(n,1))

print("Doing dense direct version")
tic = time.perf_counter()
x = sla.solve(A.toarray(),b)
toc_dense = time.perf_counter() - tic

print("Doing sparse direct version")
tic = time.perf_counter()
x = sps.linalg.spsolve(A,b)
toc_direct = time.perf_counter() - tic

print('Now doing sparse Krylov version')
tic = time.perf_counter()
xCG = sps.linalg.lsqr(A,b)[0] # nice and fast
toc_sparse = time.perf_counter() - tic

e = norm(x.ravel()-xCG.ravel())
print(f"n x n matrix with n={n:d}")
print(f"Took {toc_dense:.2f} sec for direct dense version (Gaussian elimination...)")
print(f"Took {toc_direct:.2f} sec for direct sparse version (sparse Gaussian elimination...)")
print(f"Took {toc_sparse:.2f} sec for sparse version (CG, LSQR, ...)")
print(f"Difference between versions {e:.1e}")

Doing dense direct version
Doing sparse direct version
Now doing sparse version
n x n matrix with n=5000
Took 6.56 sec for direct dense version (Gaussian elimination...)
Took 15.97 sec for direct sparse version (sparse Gaussian elimination...)
Took 0.02 sec for sparse version (CG, LSQR, ...)
Difference between versions 8.1e-05
