Skip to content

SCIP gets stuck after upgrade to 9.2.4 #172

@lhoegh

Description

@lhoegh

Background

  • We build models programatically and solve them on a job server, that uses scip through the jscip interface.
  • Our models are MIP's with lots of binary variables. They generally take somewhere between 0 to 5 minutes to solve, and have a time limit of no more than 10 minutes (varies per model).
  • We continually feed the solver with new problems, resulting in it running continually through the day and night.
  • We use concurrent solving with 48 threads. After some experimenting i have set the following parameters:
        scip.setRealParam("heuristics/completesol/maxunknownrate", 1)
        scip.setIntParam("heuristics/rens/freq", 20)
        scip.setRealParam("heuristics/rens/minfixingrate", 0.2)
        scip.setIntParam("estimation/restarts/restartlimit", -1)
        scip.setRealParam("estimation/restarts/restartfactor", 5.0)
        scip.setIntParam("parallel/mode", 0)
        scip.setIntParam("parallel/maxnthreads", 48)
  • We have been running with 9.2.3 for a while. It crashed around once a day, forcing us to restart the server. The problem was non-reproducible - the same lp file would work fine second time around (which is why I never got around to making an issue for you)
  • On friday we bumped to 9.2.4, and based on the changelog we were feeling hopeful that maybe the issue was resolved. It hasn't crashed since.

Issue
When looking at the server today, we noticed that it has stalled - no cpu activity for the past day. The thread in question looks like this:

[166] prio=5 os_prio=0 cpu=11668786.72ms elapsed=248491.56s tid=0x00007fa3e6ef8310 nid=166 runnable  [0x00007f991c4fc000]
java.lang.Thread.State: RUNNABLE
at jscip.SCIPJNIJNI.SCIPsolveConcurrent(Native Method)
at jscip.SCIPJNI.SCIPsolveConcurrent(SCIPJNI.java:250)
at jscip.Scip.solveConcurrent(Scip.java:45)

I know that's probably not super helpful. Unfortunately I don't have much else to give you, other than the background above.

Could it perhaps be the case that the fix mentioned in the changelog of 9.2.4:
"fixed bug with concurrent solve w.r.t. variable indices that led to segmentation faults and fix termination test"
which is what we hoped would fix the crashing, has introduced a regression?

Please tell me if there is anything I can provide you to help with debugging. I can get a lp file perhaps, but given that the same problem has since been solved without issue on the server, I fear it won't really help much.

Technical details

I built scip with:
RUN cmake .. -DTPI=tny -DAUTOBUILD=ON -DGCG=OFF -DUG=OFF -DCMAKE_BUILD_TYPE=Release
If its any help, the server has a AMD EPYC 9454P cpu, is running in linux and in containerized.

Thanks for the work on the library :)

  • Lukas

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions