Skip to content

rabraker/L1c

Repository files navigation

l1C: Compressed Sensing in c

pipeline Build Status

Introduction

The goal of this project is to provide a c library that is useful for the application of compressed sensing and related L1-regularized problems. To date, lots of research has been produced that develops methods to efficiently solve various formulations of the compressed sensing problem. To their great credit, many of those reasearchers have released code along with their publications. See, for example, l1-magic, NESTA, SparseLab, and Split Bregman. So, a good question is, why re-do it in c?

  • Much of the published code is written in Matlab. This means if you want to use the code in a Python or R or Julia or a c/c++ project, you must translate the Matlab code to your language. By contrast, c is something like the langua franca of computing. It is relatively straigtforward to interface c code with most languages, including Matlab.

  • CS optimizations are computationally intensive. Thus, for those who wish to apply CS to real world problems, it is beneficial to write optimized implementations of those optimizations and c fits that requirement (Fortran might be better, but I have forgotten it).

  • c is fun (probably the main reason).

Features

Present capabilites and features include

  • Solving the basis pursuit denoising problem

    \(\min_{x} ||W^Tx||_{1} \quad \text{s.t.} \quad ||Ax -b||_{2} < \epsilon,\)

    using Nesterov’s algorithm, which is based on the paper NESTA: A Fast and Accurate First-order Method for Sparse Recovery , by Stephen Becker, Jerome Bobin and Emmanuel J. Candes. The NESTA algorithm can problems in both synthesis mode (\(W^T=I\)) or analysis mode, where \(W\) may be an overcomplete dictionary. This allows the inclusion of several regularizers. NESTA is much faster than the log-barrier algorithm.

  • Solving the basis pursuit denoising problem

    \(\min_{x} ||x||_{1} \quad \text{s.t.} \quad ||Ax -b||_{2} < \epsilon,\)

    using a log-barrier algorithm, which is based on the implementation in l1-magic, with a few modifications.

  • Solving the Anistropic Total Variation (TV) denoising using Bregman Splitting. Given a noisy \(n\) by \(m\) image \(f\), this solves the problem

    \(\min_{u} ||\nabla_{x}u||_{1} + ||\nabla_{y}u||_{1} + \mu ||u-f||_{2}. \)

  • 1D and 2D discrete cosine transforms (using FFTW).

  • Python and Matlab bindings for the high-level optimization routines.

Building

Performance

So far, using l1C gives me a speed increase of between 2 and 7 times faster compared to the original matlab code, depending on the problem and computer I run it on.

If you compile with FFTW+OpenBlas, it is important that both libraries are compiled with openmp. I don’t quite understand what happens, but if this is not the case, I see only single processor being used and performance suffers dramatically.

If you have a CPU with hyperthreading, it is important to export the environmental variable

export OMP_NUM_THREADS=N

where N is the number of physical cores. Essentially, if you have HT, this is half the the number of processors you see in a resource monitor, which shows the number of logical cores. The code currently can not detect this, and for number crunching applications like this one, HT is detrimental.

Setting OMP_BIND_PROC=true seems to cost us about 1 second.

Usage

Please see either examples/c or, when building with the bindings, interfaces/python/examples or interfaces/mex/examples.

Modifications from the original algorithms

I have made a few changes (improvements?) to the original \~l1-magic algorithm, both pertaining to the line search. These changes address issues with numerical, rather than mathematical, problems. As the l1-magic authors note, in the later stages of the optimziation, numerical difficulties arise and the line search can fail. These modifications help to push that point into the future, enabling more iterations.

  1. In the original code, I noticed that at some point, the data become complex when it should have been purely real. One of the places where this occures is in the code which computes the maximum step-size which still satisfies the constraints (i.e., lines XX in the original code). In particular, the code which computes the largest number \(s\) such such that, for \(x_{k+1}= x_{k} + sd_{x_k}\), \(||Ax_{k+1}-b||<\epsilon\) still holds. To do this, we expand into a scalar equation quadratic in \(s\)

    \( \begin{aligned} ||A(x+sd_{x})-b||^{2} - \epsilon^{2} &=0 \\ s^{2}(d_x^{T}A^{T}Ad_x) + 2r^{T}Ad_x + r^{T}r - \epsilon^{2} &= 0 \end{aligned}\)

    where \(r = Ax - b\). Although the roots should always be real, due to either computing \(d_{x}\) with insufficient accuracy (which accomplished via conjugate gradient) or otherwise, the roots become complex in the later stages. In matlab, the promation to a complex type happens silently and we start optimizing complex data, which is undersirable. In c, the sqrt operation simply returns NaN, which is also undersirable. When this happens, the modification is to set \(s=1\) and let the line search deal with. This will work fine in c because taking the log of a negative number results in NaN. In Matlab, we need something like log(max(0, x)).

  2. The goal of the line-search is to find (approximitaly) the largest step-size \(s\) such that

    \( f(z + sd_{z}) < f(z) + \alpha s \nabla f\cdot d_{z} \)

    In the original code, the functional \(f(z)\) is only evaluated explicitly at the start of each log-barrier iteration and the value of \(f(z_{i})\) is updated from derived values, e.g., \(r_{k+1}= r_{k} + sAd_{x}\). Mathematically, this is fine. Numerically, it is problematic because after enough iterations the explicit value of \(f(z_{k})\) becomes infinite (due to the barrier functions) even though the putative value is finite. Thus, although it is less efficient, this code evaluates the functional explicitly at each iteration of the line-search and this value is then passed to the next Newton iteration.

To-Dos

  1. Enable detection hyperthreading, and set omp_num_threads to half the number of reported cores.

  2. Documentation!

  3. Examples via the bindings.

  4. Other optimization routines. On the list are

    • The isotropic TV-denoising problem using Bregman iteration.

    • FISTA

  5. Other transforms, such as

    • Wavelets. I think we could just pull in the wavelet implementation of GSL.

    • Fourier Transforms. I believe the algorithms should still work if we just reshape complex data of size n to double* of size 2n.