# Comparing performance of active-set and interior-point methods for solving quadratic subproblem inside SQP 

Here we explore the use of active-set and interior-point methods (the latter implemented by the commercial software MOSEK) for solving the quadratic subproblem inside SQP.

## Analysis setup

*Before attempting to run this Julia code, make sure your computer is properly set up to run this code by following the setup instructions in the README of the [git repository](https://github.com/stephenslab/mixsqp-paper).*

We begin by loading the Distributions, Mosek and JuMP Julia packages, as well as some function definitions used in the code chunks below.

In [1]:
using Distributions
using Mosek
using JuMP
include("../code/datasim.jl");
include("../code/likelihood.jl");
include("../code/mixSQP.jl");



Next, initialize the sequence of pseudorandom numbers.

In [2]:
srand(1);

## Generate a small data set

Let's begin with a smaller example with 50,000 samples.

In [3]:
n = round(Int,5e4);
z = normtmixdatasim(n);

## Compute the likelihood matrix

Compute the $n \times k$ likelihood matrix for a mixture of zero-centered normals, with $k = 20$. Note that the rows of the likelihood matrix are normalized by default.

In [4]:
sd = autoselectmixsd(z,nv = 20);
L  = normlikmatrix(z,sd = sd);
size(L)

(50000, 20)

## Fit mixture model using SQP algorithm

First we run the mix-SQP algorithm a couple of times to precompile the relevant functions.

In [5]:
out = mixSQP(L,qpsubprob = "activeset",lowrank = "none",verbose = false);
out = mixSQP(L,qpsubprob = "mosek",lowrank = "none",verbose = false);

Fit the model using the SQP algorithm, with an active-set method to find the solution to the quadratic program at each SQP iteration.

In [6]:
out1 = mixSQP(L,qpsubprob = "activeset",lowrank = "none");

Running SQP algorithm with the following settings:
- 50000 x 20 data matrix
- convergence tolerance = 1.00e-08
- zero threshold        = 1.00e-03
- Exact derivative computation (partial QR not used).
iter      objective -min(g+1) #nnz #qp #ls
   1 3.03733620e+04 +6.30e-01   20   0   0
   2 2.09533189e+04 +5.80e+04    1   0   1
   3 1.28079423e+04 +2.01e+04    3   0   1
   4 1.11142170e+04 +8.72e+03    3   0   1
   5 1.09365390e+04 +4.16e+03    3   0   1
   6 1.07220696e+04 +2.01e+03    3   0   1
   7 1.05949242e+04 +1.03e+03    3   0   1
   8 1.05173539e+04 +5.08e+02    3   0   1
   9 1.03017484e+04 +2.50e+02    2   0   1
  10 1.01824445e+04 +1.28e+02    3   0   1
  11 1.01286239e+04 +6.46e+01    3   0   1
  12 1.00404507e+04 +3.20e+01    3   0   1
  13 9.89744142e+03 +1.61e+01    3   0   1
  14 9.85084743e+03 +8.00e+00    3   0   1
  15 9.81505659e+03 +3.85e+00    3   0   1
  16 9.77438543e+03 +1.81e+00    3   0   1
  17 9.75247900e+03 +8.28e-01    4   0   1
  18 9.74083776e+03 +3.51e

Next fit the model again using the same SQP algorithm, with the active-set method replaced by MOSEK.

In [7]:
out2 = mixSQP(L,qpsubprob = "mosek",lowrank = "none");

Running SQP algorithm with the following settings:
- 50000 x 20 data matrix
- convergence tolerance = 1.00e-08
- zero threshold        = 1.00e-03
- Exact derivative computation (partial QR not used).
iter      objective -min(g+1) #nnz #qp #ls
   1 1.18584295e+04 +7.79e+04    2   0   0
   2 1.18019962e+04 +2.39e+04    7   0   1
   3 1.15826110e+04 +9.45e+03    8   0   1
   4 1.12252365e+04 +4.33e+03    8   0   1
   5 1.09642877e+04 +2.04e+03    8   0   1
   6 1.07884947e+04 +1.01e+03    6   0   1
   7 1.06007499e+04 +5.08e+02    7   0   1
   8 1.05098000e+04 +2.55e+02    7   0   1
   9 1.03011708e+04 +1.26e+02    4   0   1
  10 1.01721090e+04 +6.41e+01    3   0   1
  11 1.01096088e+04 +3.23e+01    3   0   1
  12 1.00125909e+04 +1.59e+01    4   0   1
  13 9.87791041e+03 +8.07e+00    3   0   1
  14 9.83461847e+03 +3.97e+00    3   0   1
  15 9.79385100e+03 +1.86e+00    3   0   1
  16 9.75930608e+03 +8.52e-01    4   0   1
  17 9.74409206e+03 +3.61e-01    4   0   1
  18 9.73365669e+03 +1.12e

Both runs converged to a solution in a small number of iterations. The solutions are very similar:

In [8]:
maximum(abs.(out1["x"] - out2["x"]))

1.40932608673483e-7

We also observe that solving the quadratic programs is only a small fraction of the total effort. Nonetheless, the effort with the active-set implementation is about 5 times less than with MOSEK.

In [9]:
@printf "Total runtime of active set method:     %0.3f s.\n" sum(out1["qptiming"])
@printf "Total runtime of interior point method: %0.3f s.\n" sum(out2["qptiming"])

Total runtime of active set method:     0.011 s.
Total runtime of interior point method: 0.033 s.


## Comparison with a larger data set

Let's now explore the accuracy and runtime of the active-set and MOSEK solvers in a larger data set.

In [10]:
z = normtmixdatasim(round(Int,1e5));

As before, we compute the $n \times k$ conditional likelihood matrix for a mixture of zero-centered normals. This time, we use a finer grid of $k = 40$ normal densities to compute this matrix.

In [11]:
k  = 40;
sd = autoselectmixsd(z,nv = k);
L  = normlikmatrix(z,sd = sd);
size(L)

(100000, 40)

Now we fit the model using the two variants of the SQP algorithm.

In [12]:
@time out1 = mixSQP(L,qpsubprob = "activeset",lowrank = "none",verbose = false);
@time out2 = mixSQP(L,qpsubprob = "mosek",lowrank = "none",verbose = false);

  4.081609 seconds (71.63 k allocations: 2.956 GiB, 66.13% gc time)
  5.948761 seconds (18.61 k allocations: 3.885 GiB, 60.98% gc time)


The first SQP run with the active-set method is slightly faster. And, as before, the solutions are very similar:

In [13]:
maximum(abs.(out1["x"] - out2["x"]))

7.392572117970175e-5

The amount of time spent solving the quadratic programs is again only a small proportion of the total:

In [14]:
@printf "Total runtime of active set method:     %0.3f s.\n" sum(out1["qptiming"])
@printf "Total runtime of interior point method: %0.3f s.\n" sum(out2["qptiming"])

Total runtime of active set method:     0.018 s.
Total runtime of interior point method: 0.062 s.


Therefore, although the active-set method is faster than MOSEK (roughly a 5-fold improvement in runtime), the overall impact on performance is relatively small.

## SQP with MOSEK sometimes fails to converge to the correct solution

Perhaps a more important advantage of the active-set method is that it converges more reliably to the correct solution; in practice, we have found that the MOSEK solver does not provide the correct solution when the initial iterate is not sparse. (To safeguard against this issue, the default initial estimate is set to a vector with only two nonzero entries whenever the MOSEK solver is used.)

To illustrate the convergence issue, we set the initial estimate to a vector in which all the entries are the same:

In [15]:
out3 = mixSQP(L,x = ones(k)/k,qpsubprob = "mosek",lowrank = "none");

Running SQP algorithm with the following settings:
- 100000 x 40 data matrix
- convergence tolerance = 1.00e-08
- zero threshold        = 1.00e-03
- Exact derivative computation (partial QR not used).
iter      objective -min(g+1) #nnz #qp #ls
   1 6.69043907e+04 +7.43e-01   40   0   0
   2 3.43556702e+04 +2.17e-01   39   0   1
   3 1.08872324e+04 -3.94e-02   39   0   1
Optimization took 3 iterations and 0.5615 seconds.


The optimization algorithm stops after only a small number of iterations, and we see that the solution is far from the correct solution:

In [16]:
maximum(abs.(out1["x"] - out3["x"]))

0.868145175450365

Indeed, we see that the provide solution from this run is very far away from the solution obtained earlier.

## Session information

The section gives information about the computing environment used to generate the results contained in this
notebook, including the version of Julia, and the versions of the Julia packages used here.

In [17]:
Pkg.status("Distributions");
Pkg.status("Mosek");
Pkg.status("JuMP");
versioninfo()

 - Distributions                 0.15.0
 - Mosek                         0.8.3
 - JuMP                          0.18.0
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
