Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing multiple models with Ipopt and JuMP in parallel results in crashes #257

Closed
ForceBru opened this issue Jan 19, 2021 · 4 comments
Closed

Comments

@ForceBru
Copy link

ForceBru commented Jan 19, 2021

Since Julia positions itself as a language with simple parallelism, I attempted to run multiple optimisations in parallel. However, I got a double-free error.

This Julia code (bug.jl) attempts to create and optimise one model per thread:

import JuMP, Ipopt

N = 10
Time = 50
C = rand(Float64, (N, Time))

Threads.@threads for i in 1:10
	model = JuMP.Model(Ipopt.Optimizer)

	JuMP.@variable(model, 0 <= p[1:N] <= 1)
	JuMP.@constraint(model, probability_constr, sum(p) == 1)

	JuMP.@NLobjective(
	    model, Max,
	    sum(log(sum(p[i] * C[i, j] for i in 1:N)) for j in 1:Time)
	)

	JuMP.optimize!(model)
end

Error messages

First run:

forcebru ~/test> julia --threads=4 --project=bug bug.jl
julia(1940,0x70000ebcd000) malloc: *** error for object 0x7f9997055c40: pointer being freed was not allocated
julia(1940,0x70000dbc7000) malloc: *** error for object 0x7f9997055c40: pointer being freed was not allocated
julia(1940,0x70000ebcd000) malloc: *** set a breakpoint in malloc_error_break to debug
julia(1940,0x70000dbc7000) malloc: *** set a breakpoint in malloc_error_break to debug

signal (6): Abort trap: 6
in expression starting at /Users/forcebru/test/bug.jl:7

signal (6): Abort trap: 6
in expression starting at /Users/forcebru/test/bug.jl:7
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 111092902 (Pool: 111082205; Big: 10697); GC: 126
fish: 'julia --threads=4 --project=bug…' terminated by signal SIGABRT (Abort)
forcebru ~/test [SIGABRT]> 

Second run (it says [SIGABRT] because it continues from the last prompt above):

forcebru ~/test [SIGABRT]> julia --threads=4 --project=bug bug.jl

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver mumps.
This is Ipopt version 3.13.2, running with linear solver mumps.

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).


NOTE: Other linear solvers might be more efficient (see Ipopt documentation).


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

julia(1946,0x7000093e8000) malloc: Double free of object 0x7facd2430da0
julia(1946,0x7000093e8000) malloc: *** set a breakpoint in malloc_error_break to debug

signal (6): Abort trap: 6
in expression starting at /Users/forcebru/test/bug.jl:7
 PB allocation in DMUMPS_LOAD_INIT
MUMPS returned INFO(1) =-13 - out of memory when trying to allocate 202 bytes.
In some cases it helps to decrease the value of the option "mumps_mem_percent".
Total number of variables............................:       10
                     variables with only lower bounds:        0
                variables with lower and upper bounds:       10
                     variables with only upper bounds:        0
Total number of equality constraints.................:        1
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0


signal (11): Segmentation fault: 11
in expression starting at /Users/forcebru/test/bug.jl:7
__dmumps_load_MOD_dmumps_upper_predict at /Users/forcebru/.julia/artifacts/0549224fe7802c5721fda41cea7489ef52d8f71d/lib/libdmumps.dylib (unknown line)
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 115168144 (Pool: 115156054; Big: 12090); GC: 128
__dmumps_fac_par_m_MOD_dmumps_fac_par at /Users/forcebru/.julia/artifacts/0549224fe7802c5721fda41cea7489ef52d8f71d/lib/libdmumps.dylib (unknown line)
fish: 'julia --threads=4 --project=bug…' terminated by signal SIGABRT (Abort)
forcebru@iMac-ForceBru ~/test [SIGABRT]> 

Third run:

forcebru ~/test [SIGABRT]> julia --threads=4 --project=bug bug.jl

******************************************************************************
This program contains Ipopt, and such...

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

  Instance Error 2 in DMUMPS_F77           2
 ** MPI_ABORT called
  Instance Error 2 in DMUMPS_F77           3
 ** MPI_ABORT called
Total number of variables............................:       10
                     variables with only lower bounds:        0
                variables with lower and upper bounds:       10
                     variables with only upper bounds:        0
Total number of equality constraints.................:        1
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

Total number of variables............................:       10
                     variables with only lower bounds:        0
                variables with lower and upper bounds:       10
                     variables with only upper bounds:        0
Total number of equality constraints.................:        1
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

forcebru ~/test>

Sometimes it simply says fish: 'julia --threads=4 --project=bug…' terminated by signal SIGSEGV (Address boundary error), without any additional errors.

Looks like mumps is to blame. Another error that I now can't reproduce was complaining about dmumps_load.F (that GitHub project seems kinda dead...).

I have 3.5 GB of free memory, the julia process maxed out at 492 MB (!!) and crashed.

Versions

  • macOS 10.14.6
  • julia version 1.6.0-beta1 (and 1.5.3 too, and 1.7 as well)

Project.toml:

forcebru ~/test> cat bug/Project.toml 
[deps]
Ipopt = "b6b21f68-93f8-5de0-b562-5493be1d77c9"
JuMP = "4076af6c-e467-56ae-b986-b466b2749572"

Installed JuMP and Ipopt like this:

(@v1.6) pkg> activate bug
  Activating new environment at `~/test/bug/Project.toml`

(bug) pkg> add Ipopt#master JuMP#master

Regular non-development versions result in the same errors.


It would've been nice not to crash, I guess?

One of the errors says:

MUMPS returned INFO(1) =-13 - out of memory when trying to allocate 202 bytes.
In some cases it helps to decrease the value of the option "mumps_mem_percent".

Maybe it's possible to catch this and propagate this error to Julia code?

BTW, everything works fine when I remove Threads.@threads.

@ForceBru
Copy link
Author

I got another, more descriptive error message:

forcebru ~/test> julia --threads=4 --project=bug bug.jl

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

This is Ipopt version 3.13.2, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       55

 PB allocation in DMUMPS_LOAD_INIT
MUMPS returned INFO(1) =-13 - out of memory when trying to allocate 1 bytes.
In some cases it helps to decrease the value of the option "mumps_mem_percent".

signal (11): Segmentation fault: 11
Total number of variables............................:       10
in expression starting at /Users/forcebru/test/bug.jl:7
                     variables with only lower bounds:        0
                variables with lower and upper bounds:       10
                     variables with only upper bounds:        0
Total number of equality constraints.................:        1
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

__dmumps_load_MOD_dmumps_upper_predict at /Users/forcebru/.julia/artifacts/0549224fe7802c5721fda41cea7489ef52d8f71d/lib/libdmumps.dylib (unknown line)
__dmumps_fac_par_m_MOD_dmumps_fac_par at /Users/forcebru/.julia/artifacts/0549224fe7802c5721fda41cea7489ef52d8f71d/lib/libdmumps.dylib (unknown line)
dmumps_fac_b_ at /Users/forcebru/.julia/artifacts/0549224fe7802c5721fda41cea7489ef52d8f71d/lib/libdmumps.dylib (unknown line)
dmumps_fac_driver_ at /Users/forcebru/.julia/artifacts/0549224fe7802c5721fda41cea7489ef52d8f71d/lib/libdmumps.dylib (unknown line)
dmumps_ at /Users/forcebru/.julia/artifacts/0549224fe7802c5721fda41cea7489ef52d8f71d/lib/libdmumps.dylib (unknown line)
dmumps_f77_ at /Users/forcebru/.julia/artifacts/0549224fe7802c5721fda41cea7489ef52d8f71d/lib/libdmumps.dylib (unknown line)
dmumps_c at /Users/forcebru/.julia/artifacts/0549224fe7802c5721fda41cea7489ef52d8f71d/lib/libdmumps.dylib (unknown line)
_ZN5Ipopt20MumpsSolverInterface13FactorizationEbi at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt20MumpsSolverInterface10MultiSolveEbPKiS2_iPdbi at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt16TSymLinearSolver10MultiSolveERKNS_9SymMatrixERNSt3__16vectorINS_8SmartPtrIKNS_6VectorEEENS4_9allocatorIS9_EEEERNS5_INS6_IS7_EENSA_ISE_EEEEbi at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt18StdAugSystemSolver10MultiSolveEPKNS_9SymMatrixEdPKNS_6VectorEdS6_dPKNS_6MatrixES6_dS9_S6_dRNSt3__16vectorINS_8SmartPtrIS5_EENSA_9allocatorISD_EEEESH_SH_SH_RNSB_INSC_IS4_EENSE_ISI_EEEESL_SL_SL_bi at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt15AugSystemSolver5SolveEPKNS_9SymMatrixEdPKNS_6VectorEdS6_dPKNS_6MatrixES6_dS9_S6_dRS5_SA_SA_SA_RS4_SB_SB_SB_bi at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt22LeastSquareMultipliers20CalculateMultipliersERNS_6VectorES2_ at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt25DefaultIterateInitializer18least_square_multsERKNS_10JournalistERNS_8IpoptNLPERNS_9IpoptDataERNS_25IpoptCalculatedQuantitiesERKNS_8SmartPtrINS_22EqMultiplierCalculatorEEEd at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt25DefaultIterateInitializer18SetInitialIteratesEv at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt14IpoptAlgorithm18InitializeIteratesEv at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt14IpoptAlgorithm8OptimizeEb at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt16IpoptApplication13call_optimizeEv at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt16IpoptApplication11OptimizeNLPERKNS_8SmartPtrINS_3NLPEEERNS1_INS_16AlgorithmBuilderEEE at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt16IpoptApplication11OptimizeNLPERKNS_8SmartPtrINS_3NLPEEE at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
_ZN5Ipopt16IpoptApplication12OptimizeTNLPERKNS_8SmartPtrINS_4TNLPEEE at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
IpoptSolve at /Users/forcebru/.julia/artifacts/cc4899b157cb93bf02bfa1814e0e2ce6f20dfc66/lib/libipopt.3.dylib (unknown line)
solveProblem at /Users/forcebru/.julia/packages/Ipopt/1pNAf/src/Ipopt.jl:513
optimize! at /Users/forcebru/.julia/packages/Ipopt/1pNAf/src/MOI_wrapper.jl:1441
optimize! at /Users/forcebru/.julia/packages/MathOptInterface/ZJFKw/src/Bridges/bridge_optimizer.jl:264
jl_apply_generic at /Users/forcebru/Desktop/Julia/Julia-1.6.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib (unknown line)
optimize! at /Users/forcebru/.julia/packages/MathOptInterface/ZJFKw/src/Utilities/cachingoptimizer.jl:215
jl_apply_generic at /Users/forcebru/Desktop/Julia/Julia-1.6.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib (unknown line)
#optimize!#104 at /Users/forcebru/.julia/packages/JuMP/uzpo8/src/optimizer_interface.jl:139
optimize! at /Users/forcebru/.julia/packages/JuMP/uzpo8/src/optimizer_interface.jl:115 [inlined]
optimize! at /Users/forcebru/.julia/packages/JuMP/uzpo8/src/optimizer_interface.jl:115 [inlined]
macro expansion at /Users/forcebru/test/bug.jl:18 [inlined]
#2#threadsfor_fun at ./threadingconstructs.jl:81
#2#threadsfor_fun at ./threadingconstructs.jl:48
unknown function (ip: 0x14b74aa3c)
jl_apply_generic at /Users/forcebru/Desktop/Julia/Julia-1.6.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib (unknown line)
start_task at /Users/forcebru/Desktop/Julia/Julia-1.6.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib (unknown line)
Allocations: 115380820 (Pool: 115368530; Big: 12290); GC: 127

signal (11): Segmentation fault: 11
in expression starting at /Users/forcebru/test/bug.jl:7
fish: 'julia --threads=4 --project=bug…' terminated by signal SIGSEGV (Address boundary error)
forcebru ~/test [SIGSEGV]> 

@frapac
Copy link

frapac commented Jan 19, 2021

MUMPS is known to be not thread safe. See e.g. the discussion in this issue: #190
Depending on your use case, I would recommend:

  • compile the Ipopt C library with another linear solver (HSL MA27 or HSL MA57, for instance)
  • or parallelize your code using MPI or Julia's distributed library Distributed

@ForceBru
Copy link
Author

MUMPS is known to be not thread safe. See e.g. the discussion in this issue: #190
Depending on your use case, I would recommend:

  • compile the Ipopt C library with another linear solver (HSL MA27 or HSL MA57, for instance)
  • or parallelize your code using MPI or Julia's distributed library Distributed

Oh, that's a pity... Will try to use Distributed, thanks!

Anyway, a straight up segmentation fault is highly confusing. Maybe it's possible to detect that the code is run in "multithreaded mode" and warn the user.

@frapac
Copy link

frapac commented Jan 19, 2021

I agree facing segfault in Julia is always puzzling ...
In my opinion, it would be difficult to handle that properly: I am not aware of any method detecting if we are running code inside a multithreaded mode. Maybe we could improve the README to include a warning about this issue with MUMPS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants