Skip to content

Conversation

@kalmarek
Copy link
Collaborator

this is a proof of concept, depends on JuliaPackaging/Yggdrasil#4773 but works locally ;)

Besides MKLDirectSolver I've added runtime scs_version for each solver library;
@odow technically it's a breaking change (there is argumentless version anymore), but it was just internal function that we didn't even test. So maybe we should ask do we actually need to query for version at runtime?

@odow
Copy link
Member

odow commented Apr 18, 2022

I'm not a huge fan of adding this to SCS.jl. It seems like quite a heavy dependency. Can we use Requires similar to the GPU?

@kalmarek
Copy link
Collaborator Author

I was thinking about it as well, but it pulls just two additional jlls: MKL_jll and IntelOpenMP_jll.
Disk-size though it weights ~ 700MB whereas julia is ~500MB.

But yeah, probably you're right, I didn't take into account the downloaded libs, only loadtime ;)

That's julia with JULIA_DEPOT_PATH=/tmp:

(@v1.7) pkg> add SCS
  Installing known registries into `/tmp/julia_tmp`
    Updating registry at `/tmp/julia_tmp/registries/General.toml`
   Resolving package versions...
   Installed Bzip2_jll ────────── v1.0.8+0
   Installed Preferences ──────── v1.2.5
   Installed SCS ──────────────── v1.1.1
   Installed JSON ─────────────── v0.21.3
   Installed CodecBzip2 ───────── v0.7.2
   Installed Parsers ──────────── v2.2.4
   Installed MutableArithmetics ─ v1.0.0
   Installed BenchmarkTools ───── v1.3.1
   Installed SCS_GPU_jll ──────── v3.2.0+0
   Installed SCS_jll ──────────── v3.2.0+0
   Installed OpenBLAS32_jll ───── v0.3.17+0
   Installed CodecZlib ────────── v0.7.0
   Installed Requires ─────────── v1.3.0
   Installed OrderedCollections ─ v1.4.1
   Installed TranscodingStreams ─ v0.9.6
   Installed JLLWrappers ──────── v1.4.1
   Installed MathOptInterface ─── v1.1.2
  Downloaded artifact: Bzip2
  Downloaded artifact: SCS_GPU
  Downloaded artifact: OpenBLAS32
  Downloaded artifact: SCS
    Updating `/tmp/julia_tmp/environments/v1.7/Project.toml`
  [c946c3f1] + SCS v1.1.1
    Updating `/tmp/julia_tmp/environments/v1.7/Manifest.toml`
  [6e4b80f9] + BenchmarkTools v1.3.1
  [523fee87] + CodecBzip2 v0.7.2
  [944b1d66] + CodecZlib v0.7.0
  [692b3bcd] + JLLWrappers v1.4.1
  [682c06a0] + JSON v0.21.3
  [b8f27783] + MathOptInterface v1.1.2
  [d8a4904e] + MutableArithmetics v1.0.0
  [bac558e1] + OrderedCollections v1.4.1
  [69de0a69] + Parsers v2.2.4
  [21216c6a] + Preferences v1.2.5
  [ae029012] + Requires v1.3.0
  [c946c3f1] + SCS v1.1.1
  [3bb67fe8] + TranscodingStreams v0.9.6
  [6e34b625] + Bzip2_jll v1.0.8+0
  [656ef2d0] + OpenBLAS32_jll v0.3.17+0
  [af6e375f] + SCS_GPU_jll v3.2.0+0
  [f4f2fc5b] + SCS_jll v3.2.0+0
  [0dad84c5] + ArgTools
  [56f22d72] + Artifacts
  [2a0f44e3] + Base64
  [ade2ca70] + Dates
  [f43a241f] + Downloads
  [b77e0a4c] + InteractiveUtils
  [b27032c2] + LibCURL
  [76f85450] + LibGit2
  [8f399da3] + Libdl
  [37e2e46d] + LinearAlgebra
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [a63ad114] + Mmap
  [ca575930] + NetworkOptions
  [44cfe95a] + Pkg
  [de0858da] + Printf
  [9abbd945] + Profile
  [3fa0cd96] + REPL
  [9a3f8284] + Random
  [ea8e919c] + SHA
  [9e88b42a] + Serialization
  [6462fe0b] + Sockets
  [2f01184e] + SparseArrays
  [10745b16] + Statistics
  [fa267f1f] + TOML
  [a4e569a6] + Tar
  [8dfed614] + Test
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
  [e66e0078] + CompilerSupportLibraries_jll
  [deac9b47] + LibCURL_jll
  [29816b5a] + LibSSH2_jll
  [c8ffd9c3] + MbedTLS_jll
  [14a3606d] + MozillaCACerts_jll
  [4536629a] + OpenBLAS_jll
  [83775a58] + Zlib_jll
  [8e850b90] + libblastrampoline_jll
  [8e850ede] + nghttp2_jll
  [3f19e933] + p7zip_jll
Precompiling project...
  23 dependencies successfully precompiled in 36 seconds

(@v1.7) pkg> add MKL_jll
   Resolving package versions...
   Installed MKL_jll ───────── v2022.0.0+0
   Installed IntelOpenMP_jll ─ v2018.0.3+2
  Downloaded artifact: IntelOpenMP
    Updating `/tmp/julia_tmp/environments/v1.7/Project.toml`
  [856f044c] + MKL_jll v2022.0.0+0
    Updating `/tmp/julia_tmp/environments/v1.7/Manifest.toml`
  [1d5cc7b8] + IntelOpenMP_jll v2018.0.3+2
  [856f044c] + MKL_jll v2022.0.0+0
  [4af54fe1] + LazyArtifacts
Precompiling project...
  2 dependencies successfully precompiled in 1 seconds (23 already precompiled)

@odow
Copy link
Member

odow commented Apr 18, 2022

Disk-size though it weights ~ 700MB

Yes, this is what I meant by heavy.

SCS_MKL_jll is build only for linux platforms, but MKL_jll is also
available on Windows/Macos causing warnings on these
platforms.
@kalmarek
Copy link
Collaborator Author

That's probably a problem with 32/64-bit interface on x86 linux:

------------------------------------------------------------------
	       SCS v3.2.1 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 1, constraints m: 1
cones: 	  z: primal zero / dual free vars: 1
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, rho_x: 1.00e-06
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-direct-mkl-pardiso
	  nnz(A): 1, nnz(P): 0
Error during symbolic factorization: -12Error during MKL Pardiso cleanup: -12ERROR: init_lin_sys_work failure
Test Failed at /home/runner/work/SCS.jl/SCS.jl/test/test_problems.jl:714
  Expression: solution.ret_val == 1
   Evaluated: -4 == 1
ERROR: LoadError: There was an error during testing
in expression starting at /home/runner/work/SCS.jl/SCS.jl/test/runtests.jl:22

ERROR: missing ScsWork, ScsSolution or ScsInfo input
ERROR: Package SCS errored during testing

Copy link
Member

@odow odow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems okay. What are the performance differences?

@@ -0,0 +1,61 @@
struct MKLDirectSolver <: LinearSolver end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs copyright header

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added;

out of curiosity: what is the rationale of adding those everywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid people copy-pasting parts of the wrappers into other projects and ignoring the license implications 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but to fool copilot we'd need to scatter this notice in comments throughout the whole codebase 🤣

@kalmarek
Copy link
Collaborator Author

kalmarek commented Nov 1, 2022

Numerically DirectSolver and MKLDirectSolver behave the same, i.e. the times below are for the same number (20_000) of iterations.

On a small problem I get

problem:  variables n: 2640, constraints m: 5499
cones:    z: primal zero / dual free vars: 2860
          s: psd vars: 2639, ssize: 10
settings: eps_abs: 1.0e-10, eps_rel: 1.0e-10, eps_infeas: 1.0e-07
          alpha: 1.90, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 20000, normalize: 1, rho_x: 1.00e-06
          acceleration_lookback: 50, acceleration_interval: 10
lin-sys:  sparse-direct-amd-qdldl
          nnz(A): 66208, nnz(P): 0
  • 43.913933 seconds (179.25 k allocations: 15.281 MiB) vs
  • 20.055237 seconds (179.25 k allocations: 15.281 MiB)
    i.e. a 2.19-speed up;

On a larger one

problem:  variables n: 5708, constraints m: 11938
cones:    z: primal zero / dual free vars: 6231
          s: psd vars: 5707, ssize: 20
settings: eps_abs: 1.0e-10, eps_rel: 1.0e-10, eps_infeas: 1.0e-07
          alpha: 1.90, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 20000, normalize: 1, rho_x: 1.00e-06
          acceleration_lookback: 50, acceleration_interval: 10
lin-sys:  sparse-direct-mkl-pardiso
          nnz(A): 275706, nnz(P): 0

the speed-up is comparable (MKLDirectSolver is 2-3 times faster here).


These might be atypical examples for showing off MKLDirectSolver: these problems are after symmetry reduction so there's a rather small number of small psd constraints with a bunch of dense linear constraints.

The original version (with large psd constraint and lots of sparse linear constraints) of the first (small) problem is:

problem:  variables n: 93962, constraints m: 169158
cones:    z: primal zero / dual free vars: 75197
          s: psd vars: 93961, ssize: 1
settings: eps_abs: 1.0e-10, eps_rel: 1.0e-10, eps_infeas: 1.0e-07
          alpha: 1.90, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 20000, normalize: 1, rho_x: 1.00e-06
          acceleration_lookback: 50, acceleration_interval: 10
lin-sys:  sparse-direct-amd-qdldl
          nnz(A): 746426, nnz(P): 0

DirectSolver runs in

  • 1296 seconds (1 thread),
  • 1452 seconds (4 threads)
    vs MKLDirectSolver
  • 1578 seconds (1 thread)
  • 1012 seconds (4 threads)

so MKLDirectSolver benefits from multiple threads (OMP_NUM_THREADS) while DirectSolver doesn't. Simply by looking at htop the DirectSolver seems to waste most of the resources (the occupied cores are predominantly in sys: wait state -- maybe some problem with synchronization/communication?). MKLDirectSolver fares much better here fully utilizing all available resources.

Tbh I hoped for something much better ;) But maybe those timings/findings are useful for @bodono as well.

julia> versioninfo(verbose=true)
Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      "Arch Linux"
  uname: Linux 6.0.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 29 Oct 2022 14:08:39 +0000 x86_64 unknown
  CPU: AMD Ryzen 7 PRO 4750U with Radeon Graphics: 
                 speed         user         nice          sys         idle          irq
       #1-16  1387 MHz     311723 s       3636 s     175687 s    1813700 s          1 s
  Memory: 30.586448669433594 GB (15411.67578125 MB free)
  Uptime: 117173.85 sec
  Load Avg:  4.83  4.28  3.24
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 8 on 16 virtual cores
Environment:
  JULIA_NUM_THREADS = 8
[...]

Copy link
Member

@odow odow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one other thing: something should be added to the README?

Otherwise, squash+merge if you're happy.

@kalmarek kalmarek merged commit 85d87ea into master Nov 2, 2022
@kalmarek kalmarek deleted the mk/mkl_direct branch November 2, 2022 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants