Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User configurable linear solvers #65

Merged
merged 1 commit into from
May 14, 2019
Merged

User configurable linear solvers #65

merged 1 commit into from
May 14, 2019

Conversation

goulart-paul
Copy link
Member

This adds support for configurable linear solvers for the KKT systems.

Implements initial support for QDLDL(default), CHOLMOD and Pardiso (both direct and indirect).

Pardiso requires an extra license, so maybe best to switch this one to the MKL version instead, or support both.

@goulart-paul
Copy link
Member Author

Fixes #22 (implements Pardiso indirect solver support)

@migarstka migarstka changed the base branch from devel to master April 5, 2019 09:45
@migarstka
Copy link
Member

migarstka commented Apr 5, 2019

I rebased the branch onto the latest commit on master (with faster projections).
However, the changes in this PR seem to introduce a performance bottleneck.

Testing against the latest commit on master on Closest Correlation Matrix Problem with N = 600 from the COSMO benchmark repo, I get:

Old Code (using the CHOLMOD solver):

Solver time:	8.5145s (8514.52ms)
Setup time:	0.5991s (599.14ms)
Proj time:	6.1347s (6134.66ms)
Iter time:	7.9s (7900.04ms)
Graph time:	0.0s (0.0ms)
Factor time:	0.424s (424.03ms)
Post time:	0.0089s (8.91ms)

PR Code with kkt_solver_type = COSMO.CholmodKKTSolver:

Solver time:	105.0273s (105027.33ms)
Setup time:	97.5934s (97593.45ms)
Proj time:	5.7426s (5742.57ms)
Iter time:	7.4111s (7411.13ms)
Graph time:	0.0s (0.0ms)
Factor time:	97.4908s (97490.77ms)
Post time:	0.0102s (10.2ms)

PR Code with kkt_solver_type = COSMO.MKLPardisoKKTSolver:

Solver time:	50.5717s (50571.67ms)
Setup time:	41.7932s (41793.19ms)
Proj time:	5.6905s (5690.5ms)
Iter time:	8.7525s (8752.52ms)
Graph time:	0.0s (0.0ms)
Factor time:	41.5476s (41547.58ms)
Post time:	0.0134s (13.4ms)

Update: It seems like the problem is _kktutils_make_kkt since this is were 99% of the time is spent.

@goulart-paul
Copy link
Member Author

Yes, I agree that _kktutils_make_kkt is the problem. The KKT matrix is not being assembled in the same way anymore relative to the old code. There are two differences, one or both of which might be significant:

  1. The A matrix is not being explicitly converted to CSC format in the newer version prior to assembly.

  2. the two diagonal matrices that use rho and sigma are treated in a different way. Specifically, the rho scaled matrix is converted into a Diagonal matrix before assembling the KKT. In the old version, we instead put an identity placeholder and then populated the values directly.

I tried making the diagonal matrix into SparseMatrixCSC type before assembly. That helps a little bit, but it doesn't eliminate the problem. It could be that there is some super-efficient code before horizontal concatenation that we miss out on if we assemble matrix in the present way.

TL;DR : go back to assembling the KKT matrix the old way I think.

@migarstka
Copy link
Member

migarstka commented Apr 8, 2019

Assembling the KKT matrix like before helps with the CholmodKKTSolver. This is now as fast as it used to be.
However, I get another problem with the PardisoDirect and the MKLPardiso solvers. They spend about 60s in line 251:

COSMO.jl/src/kktsolver.jl

Lines 244 to 260 in f299102

function _kktutils_direct_solver_init(ps,K,work,m,n)
#common initialisation steps for MKL Pardiso and the Pardiso 5.0 direct solver
pardisoinit(ps)
# Analyze the matrix and compute a symbolic factorization.
set_phase!(ps, Pardiso.ANALYSIS)
pardiso(ps, K, work)
# Compute the numeric factorization.
set_phase!(ps, Pardiso.NUM_FACT)
pardiso(ps, K, work)
_pardiso_check_inertia(ps,m,n)
## set phase to solving for iterations
set_phase!(ps, Pardiso.SOLVE_ITERATIVE_REFINE)
end

which should be really quick.

@goulart-paul
Copy link
Member Author

Two ideas, though admittedly I don't think either will make a difference in the time:

  1. It is possible to set the phase to ANALYSIS_NUM_FACT at line 250, which would then make lines 254/255 redundant.

  2. Perhaps calls to pardiso are slow because the final argument (work) is an Array{Float64,1} instead of an Array{Float64,2}? The signature in the Pardiso package wants a VecOrMat{Float64}, but it could be that one way is faster than the other. This is probably unlikely though since the eventual ccall to Pardiso just converts it either way to a Ptr{Float64}.

@migarstka
Copy link
Member

migarstka commented Apr 9, 2019

99% of the time was spent reordering which was caused by the fact that we provided Julia sparse matrices in CSC format and Pardiso expects CSR. This can be overcome by telling Pardiso to solve the transposed problem. I therefore added

    # set to allow non-default iparams
    set_iparm!(ps, 1, 1)
    # set the transpose flag (Pardiso: CSR, Julia CSC)
    set_iparm!(ps, 12, 1)

For the problem mentioned above I know get the following solve times:

Code with kkt_solver_type = COSMO.MKLPardisoKKTSolver: (single thread)

Solver time:	8.6302s (8630.23ms)
Setup time:	0.4604s (460.43ms)
Proj time:	5.9341s (5934.11ms)
Iter time:	8.1501s (8150.11ms)
Factor time:	0.2321s (232.12ms)

Code with kkt_solver_type = COSMO.PardisoKKTSolver: (single thread)

Solver time:	8.1652s (8165.22ms)
Setup time:	0.4636s (463.6ms)
Proj time:	5.8964s (5896.39ms)
Iter time:	7.682s (7682.02ms)
Factor time:	0.2203s (220.28ms)

Code with kkt_solver_type = COSMO.QdldlKKTSolver

Solver time:	7.3964s (7396.4ms)
Setup time:	0.5103s (510.27ms)
Proj time:	5.9133s (5913.26ms)
Iter time:	6.8702s (6870.17ms)
Factor time:	0.3314s (331.44ms)

Code with kkt_solver_type = COSMO.CholdmodKKTSolver

Solver time:	8.1281s (8128.12ms)
Setup time:	0.5148s (514.84ms)
Proj time:	5.9053s (5905.27ms)
Iter time:	7.5951s (7595.11ms)
Factor time:	0.3851s (385.12ms)

It will be interesting to see how this changes when the pardiso solvers use multiple threads.

Edit: Fix typo

@goulart-paul
Copy link
Member Author

Did you mean for the last two sets of numerical results to be the same solver, or is one of them Cholmod?

I agree it will be good to see the results for multi-threaded Pardiso. For the non-MKL version it can (must?) be done via some environment variable, e.g. ENV["OMP_NUM_THREADS"] = 4. For the MKL version it seems to be done on a per-instance basis. This is problematic because we don't currently have any means for the user to pass solver-specific options to the linear solvers.

@codecov-io
Copy link

codecov-io commented Apr 10, 2019

Codecov Report

Merging #65 into master will decrease coverage by 3.75%.
The diff coverage is 52.99%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #65      +/-   ##
==========================================
- Coverage   93.93%   90.17%   -3.76%     
==========================================
  Files          20       22       +2     
  Lines        1500     1608     +108     
==========================================
+ Hits         1409     1450      +41     
- Misses         91      158      +67
Impacted Files Coverage Δ
src/COSMO.jl 100% <ø> (ø) ⬆️
src/kktsolver_pardiso.jl 0% <0%> (ø)
src/solver.jl 98.64% <100%> (+0.05%) ⬆️
src/setup.jl 100% <100%> (ø) ⬆️
src/parameters.jl 100% <100%> (ø) ⬆️
src/types.jl 82.69% <100%> (ø) ⬆️
src/settings.jl 100% <100%> (ø) ⬆️
src/interface.jl 98.23% <100%> (-0.02%) ⬇️
src/printing.jl 93.1% <100%> (+0.51%) ⬆️
src/kktsolver.jl 86.36% <86.36%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update af348b5...863168a. Read the comment docs.

@nrontsis
Copy link
Member

nrontsis commented May 1, 2019

@migarstka regarding a discussion we had about this branch: I think that passing custom arguments to the linear solvers can be passed in the same way as JuMP's Model(with_optimizer(solver, args; kwargs)) approach (coded here). I would be happy to code such an approach, if you think that this is a good way to go, (by pushing into this branch?)

Apart from that what other things you have in mind as necessary in order for this to be merged?

@migarstka
Copy link
Member

migarstka commented May 3, 2019

@nrontsis Yes, it would be great, if you would take a look at this. In my opinion, the implementation should fulfil the following requirements:

  • transferable to all customizable modules of the code, i.e. the user may want to set up the solver with an Accelerator object (or a Decomposition object ) and set specific options for that object, e.g. the iterate history length or verbose printing specific to that object
  • Right now if you want to use MKL Pardiso as your linear system solver you have to specify the number of CPU cores to use like this:
ps = MKLPardisoSolver()
set_nprocs!(ps, i) 

So we probably have to make sure that the set_nprocs! function is called behind the scenes when the user writes

COSMO.Model(with_KKT_solver(MKLPardisoSolver, num_threads = 3))
  • the syntax should still work the same way in JuMP. So we should check if we can do something like:
JuMP.Model(with_optimizer(COSMO.Optimizer(with_KKT_solver(Pardiso, num_threads = 3), with_accelerator(Anderson, memory = 5), with_many_more_modules(...))

Maybe @blegat can say something about how JuMP intends the solver packages to handle this?

@nrontsis
Copy link
Member

nrontsis commented May 3, 2019

@migarstka thanks for the detailed comment.

I have pushed an initial implementation that I think that satisfies the above requirements. It works as following:

The user creates a settings object as following

settings = COSMO.Settings(...,
    kkt_constructor=with_options(QdldlKKTSolver, args...; kwargs...)
)

and then creates/optimises the model as usual (via COSMO.Model() and COMSO.set!(..., settings))

For convenience, in the case that no custom options need to be passed by the user, the settings object can also be created by simply doing

settings = COSMO.Settings(..., kkt_constructor=QdldlKKTSolver)

I believe that the implementation is general and could be used e.g. for Accelerators by doing:

settings = COSMO.Settings(...,
    kkt_constructor=with_options(QdldlKKTSolver, args...; kwargs...),
    accelerator_constructor=with_options(MyAccelerator, args...; kwargs...)
)

@migarstka
Copy link
Member

migarstka commented May 6, 2019

@nrontsis I think this is a good way to do it. I fixed a few things to make it work with MOI / JuMP as well. I also made OptionsFactory parametric to prevent users from doing something like:

COSMO.Settings(kkt_solver = with_options(MyAccelerator, mem = 5))

@nrontsis
Copy link
Member

nrontsis commented May 6, 2019

Great, thanks.

On a different note, I think we should make some types in this PR more concrete via parametrisation. For example, on the PardisoIndirectKKTSolver, SparseMatrixCSC should become SparseMatrixCSC{Tv, Ti} and work::Vector should become work::Vector{Tv} etc.

Should I proceed on changing this?

@migarstka
Copy link
Member

Yeah. Go ahead!

@nrontsis
Copy link
Member

@migarstka I fixed the "ambiguities" I found in the structs.

Do you think this is ready to be merged?

migarstka pushed a commit that referenced this pull request May 13, 2019
- Allows user defined linear systems solvers
- Add support for CHOLMOD, QDLDL, Pardiso, MKLPardiso
- Make all Pardiso related code conditional
- Add free memory step for pardiso solvers at end of
main routine
- Print information about which solver is used if
verbose = true
- Add OptionsFactory pattern to allow the user to provide
custom options for each solver
- Add unittests (pardiso related ones are optional)
@migarstka
Copy link
Member

I added some documentation, squashed all the commits and rebased them onto the latest master commit. Will merge tomorrow

migarstka pushed a commit that referenced this pull request May 13, 2019
- Allows user defined linear systems solvers
- Add support for CHOLMOD, QDLDL, Pardiso, MKLPardiso
- Make all Pardiso related code conditional
- Add free memory step for pardiso solvers at end of
main routine
- Print information about which solver is used if
verbose = true
- Add OptionsFactory pattern to allow the user to provide
custom options for each solver
- Add unittests (pardiso related ones are optional)
migarstka pushed a commit that referenced this pull request May 13, 2019
- Allows user defined linear systems solvers
- Add support for CHOLMOD, QDLDL, Pardiso, MKLPardiso
- Make all Pardiso related code conditional
- Add free memory step for pardiso solvers at end of
main routine
- Print information about which solver is used if
verbose = true
- Add OptionsFactory pattern to allow the user to provide
custom options for each solver
- Add unittests (pardiso related ones are optional)
migarstka pushed a commit that referenced this pull request May 14, 2019
- Allows user defined linear systems solvers
- Add support for CHOLMOD, QDLDL, Pardiso, MKLPardiso
- Make all Pardiso related code conditional
- Add free memory step for pardiso solvers at end of
main routine
- Print information about which solver is used if
verbose = true
- Add OptionsFactory pattern to allow the user to provide
custom options for each solver
- Add unittests (pardiso related ones are optional)
- Allows user defined linear systems solvers
- Add support for CHOLMOD, QDLDL, Pardiso, MKLPardiso
- Make all Pardiso related code conditional
- Add free memory step for pardiso solvers at end of
main routine
- Print information about which solver is used if
verbose = true
- Add OptionsFactory pattern to allow the user to provide
custom options for each solver
- Add unittests (pardiso related ones are optional)
@migarstka migarstka deleted the pg/lin_solvers branch May 14, 2019 09:13
@migarstka migarstka restored the pg/lin_solvers branch May 14, 2019 09:19
@migarstka migarstka deleted the pg/lin_solvers branch February 20, 2022 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants