Memory leaks with LP Solvers are back #17320

ypfmde · 2014-11-12T10:58:30Z

3 years ago a memory leak in lp solvers was fixed in ticket 11917. It seems that the problems are back in Sage 6.3. The code

while True:
    P = MixedIntegerLinearProgram(solver="Coin")

consumes several GB of memory within a few seconds! The same happens with solver="gurobi", but not with solver="glpk".

Simon King confirmed the leak at sage-support, including some analysis and details.

CC: @simon-king-jena @nathanncohen

Component: linear programming

Author: Nils Bruin, Nathann Cohen

Branch/Commit: 5480f4e

Reviewer: Nils Bruin, Nathann Cohen

Issue created by migration from https://trac.sagemath.org/ticket/17320

The text was updated successfully, but these errors were encountered:

nbruin · 2014-11-12T19:00:53Z

comment:3

When I try to replicate Simon's memory object analysis, I do not find a significant leak:

sage: import gc
sage: from collections import Counter
sage: gc.collect()
0
sage: gc.set_debug(gc.DEBUG_STATS)
sage: pre={id(a) for a in gc.get_objects()}
sage: for i in [1..100000]:
....:         P = MixedIntegerLinearProgram(solver="Coin")
....:         del P
....:     
sage: gc.collect()
gc: collecting generation 2...
gc: objects in each generation: 117 0 89955
gc: done, 0.0464s elapsed.
0
sage: gc.collect()
gc: collecting generation 2...
gc: objects in each generation: 48 0 90013
gc: done, 0.0397s elapsed.
0
sage: post=Counter(str(type(a)) for a in gc.get_objects() if id(a) not in pre)
sage: post
Counter({"<type 'method_descriptor'>": 32, "<type 'list'>": 23, "<type 'dict'>": 13, "<type 'tuple'>": 9, "<class '_ast.Name'>": 7, "<type 'weakref'>": 5, "<class '_ast.Call'>": 4, "<type 'frame'>": 2, "<type 'builtin_function_or_method'>": 2, "<type 'listiterator'>": 1, "<type 'set'>": 1, "<type 'instancemethod'>": 1, "<class '_ast.comprehension'>": 1, "<class '_ast.Compare'>": 1, "<class '_ast.Attribute'>": 1, "<class '_ast.Interactive'>": 1, "<type 'module'>": 1})

As you can see, the garbage collector does not get triggered during the loop (indicating that reference counting is sufficient to keep the number of python objects bounded in memory during the loop). Also, after the loop there are no alarming objects in "post".

I can confirm that memory usage explodes, but I think this indicates the memory leak is outside of python managed memory.

[The problem with Simon's code might be that he doesn't update the set S when he counts a new object, so there's a possibility for objects that get created on the first pass to get counted every time.]

nbruin · 2014-11-12T19:17:06Z

comment:4

The following is fishy in coin_backend.pyx:

    def __cinit__(self, maximization = True):
...
        self.si = new OsiClpSolverInterface()
        self.model =  new CbcModel(self.si[0])
...
    def __dealloc__(self):
        del self.si

If deleting self.si is required in dealloc, shouldn't self.model be deleted as well? (this indicates to me that cython has grown enough understanding of C++ to know how to translate a "del" on a C++ object?)

Indeed, I've tried and simply inserting

        del self.model

makes memory consumption constant with the given test. I haven't tested if it affects operations otherwise. Responsible ticket seems to be #12220, where new CbcModel was added. In that same file we have

    cpdef int solve(self) except -1:
...
        model = new CbcModel(si[0])
...(*)
        del self.model
        self.model = model

which seems to suggest that unceremoniously deleting a model is just fine.

Incidentally, that routine can leak too, because there are a whole bunch of "raise" statements in (*) that can lead to self.model not being replaced by model, but model itself not being deleted.

This suggests that the whole statement here should be guarded by some try/except/finally to ensure that del model is executed if self.model=model is not.

nbruin · 2014-11-13T06:34:56Z

Branch: u/nbruin/memory_leaks_with_lp_solvers_are_back

nbruin · 2014-11-13T06:36:17Z

Commit: bc20667

nbruin · 2014-11-13T06:36:17Z

comment:6

something like this should do the trick. Feel free to amend as needed.

New commits:

`bc20667`	`trac #17320: fix some memory management for coin_backend`

nathanncohen · 2014-11-13T06:44:02Z

comment:7

Hello !

I do not understand why you put all this in a "try" block. Wouldn't it be easier to just set self.si and self.model to NULL at the beginning, and add "if self.model != NULL" in the dealloc ?

Nathann

nbruin · 2014-11-13T07:36:05Z

comment:8

Replying to @nathanncohen:

I do not understand why you put all this in a "try" block. Wouldn't it be easier to just set self.si and self.model to NULL at the beginning, and add "if self.model != NULL" in the dealloc ?

Quite possibly, but that's not how that code was written and I wasn't planning on delving into the program logic. There's a "new ..." there and there are uncontrolled exits via explicit raises and quite possibly by other code that can generate errors. If the new model gets inserted into self.model there's no leak because the old value of self.model gets properly deleted, but all the error condition exits would lead to an allocated "model" that simply falls out of scope, i.e., a memory leak (unless cython generates an implicit try/finally to cleanly delete any local variables that point to C++ objects--it doesn't, otherwise the original code wouldn't have worked properly).

I just expressed it as a try/finally to ensure we always have a chance to see if "model" needs to be deleted. I'm sure there are other solutions.

nathanncohen · 2014-11-16T10:42:45Z

comment:9

Hello !

It seems that only moving the "del" above was sufficient. I created a branch at public/17320 that does that, tell me what you think !

Nathann

ypfmde · 2014-11-16T14:00:35Z

comment:10

Dear Nathann,

ok, seems to fix the leak issue. Is there a similar easy fix for the gurobi backend?

Best wishes, Peter

nathanncohen · 2014-11-16T17:04:12Z

comment:11

Helloooooooo !

ok, seems to fix the leak issue. Is there a similar easy fix for the gurobi backend?

No idea, I don't have a gurobi license anymore. Their making them computer-dependent complicates stuff a lot.

Nathann

ypfmde · 2014-11-16T17:42:20Z

comment:12

Nathann,

what do you mean? On whichever machine I use gurobi, it is a matter of a few seconds to receive the free academic license.

-- Peter

nathanncohen · 2014-11-16T17:49:58Z

comment:13

Hello !

what do you mean? On whichever machine I use gurobi, it is a matter of a few seconds to receive the free academic license.

Oh ? Well, last time I tried I had to setup a weird proxy and start the authentication software in that proxy jail. Because my lab was not recognised as 'research institution' and stuff. And right now I am in India for two months.

Well, I will give it a try tomorrow but I am sure that I will lose at least one hour on that :-P

Nathann

nbruin · 2014-11-16T18:11:34Z

comment:14

Replying to @nathanncohen:

It seems that only moving the "del" above was sufficient. I created a branch at public/17320 that does that, tell me what you think !

It seems to me that only committing to the new model by assigning it to self.model at the very end is a design feature: it means that if the computation fails for any reason, the object self is unaffected and hence still in a consistent state. It's also better for future development, because as long as model is private, there is no lock required to modify it. Once it's exported to self.model, you'd need to lock self before self.model is changed. Python currently doesn't support such multiprocessing programming anyway, but having changes in global state as atomic as possible is generally a good feature, because it reduces the possibility for unexpected interactions between multiple threads (the GIL does get released every now and again ...)

With your change, the model that just experienced an error is now stored in self. It could be convenient to have it available for further analysis, so a priori it's not clear which design is more useful. It depends on the details of what the code does and how it does it. Changing the design just to avoid a try/finally is probably not a good idea, however.

Furthermore, there is still this code between model allocation and committing to self.model=model.

        model.setLogLevel(self.model.logLevel())

        # multithreading
        import multiprocessing
        model.setNumberThreads(multiprocessing.cpu_count())

        model.branchAndBound()

which looks like it's quite nontrivial and hence could raise errors by itself. You'd still have a leak then. So either guard that with a try/finally, in which case you might as well stick with the guard all the way, or move the self.model=model even higher.

nathanncohen · 2014-11-17T04:54:33Z

comment:15

Hello,

It seems to me that only committing to the new model by assigning it to self.model at the very end is a design feature:

I wrote this file myself, and I do not think that this had the slightest importance to me at that time.

it means that if the computation fails for any reason,

When an exception is raised in that code, it is almost always because the problem is infeasible. This is not a "failure", it is the actual answer the user is looking for.

the object self is unaffected and hence still in a consistent state. It's also better for future development, because as long as model is private, there is no lock required to modify it. Once it's exported to self.model, you'd need to lock self before self.model is changed. Python currently doesn't support such multiprocessing programming anyway, but having changes in global state as atomic as possible is generally a good feature, because it reduces the possibility for unexpected interactions between multiple threads (the GIL does get released every now and again ...)

If your worry is that in a multithread Python program with a MixedIntegerLinearProgram instance p the function p.solve() may be called simultaneously from different processors, I believe that you can forget it. Even if that made the sligthest sense, you would have many other places in the code to re-read and adapt.

With your change, the model that just experienced an error is now stored in self. It could be convenient to have it available for further analysis, so a priori it's not clear which design is more useful. It depends on the details of what the code does and how it does it.

Well, again what you call an 'error' is just a certificate of infeasibiity, and this is definitely good to have around. But the fact that all doctests pass (even in the graph/ folder) indicates that this change brings no real problem.

Furthermore, there is still this code between model allocation and committing to self.model=model.
        model.setLogLevel(self.model.logLevel())

        # multithreading
        import multiprocessing
        model.setNumberThreads(multiprocessing.cpu_count())

        model.branchAndBound()
which looks like it's quite nontrivial and hence could raise errors by itself. You'd still have a leak then. So either guard that with a try/finally, in which case you might as well stick with the guard all the way, or move the self.model=model even higher.

Please move that line higher if you care. I believe that you are being worried unnecessarily.

Nathann

nathanncohen · 2014-11-17T05:42:36Z

comment:16

Well, I will give it a try tomorrow but I am sure that I will lose at least one hour on that :-P

I just tried it, but here we have no direct connection to internet: everything has to go through a http proxy that obviously will not handle their grbgetkey utility.

I will try again tonight from another connection, but then I will not be on a university network anymore.... >_<

Nathann

nathanncohen · 2014-11-17T15:17:50Z

Changed branch from u/nbruin/memory_leaks_with_lp_solvers_are_back to public/17320

nathanncohen · 2014-11-17T15:17:50Z

Changed commit from bc20667 to 351f3e1

nathanncohen · 2014-11-17T15:17:50Z

comment:17

Hello again !

Turns out that their license thing was greatly simplified since (or that I was lucky). Either way it went smoothly after I got a real internet connection.

I did the fix for Gubori, though for some reason I don't quite get why it works... I actually removed a call to GRBfreeenv and the memory leak seems to have disappeared. While debugging I noticed that GRBmodel returned an error code.

Well, here it is :-)

Nathann

New commits:

`4da9aa3`	`trac #17320: Broken doctests`
`4fcbc89`	`trac #17320: Memory leak`
`520c6f1`	`trac #17320: Merged with 6.5.beta0`
`351f3e1`	`trac #17320: Same fix for Gurobi`

sagetrac-git · 2014-11-17T20:20:27Z

Changed commit from 351f3e1 to 5480f4e

sagetrac-git · 2014-11-17T20:20:27Z

Branch pushed to git repo; I updated commit sha1. New commits:

`5480f4e`	`trac 17320: tighten window for memory leak in CoinBackend.solve and ensure that function declared "int" actually has a return statement`

nbruin · 2014-11-17T20:40:52Z

comment:19

Replying to @nathanncohen:

Please move that line higher if you care. I believe that you are being worried unnecessarily.

OK, done. If it's possible the obviously avoid memory leaks at virtually no cost then we should, particularly because sage library code often is used as template for new contributions.

Also, ensured that CoinBackend.solve has a return statement: In C, falling out of a function with a defined return type without executing an explicit return causes undefined behaviour, which could include returning -1 by accident (the one value that indicates erroneous behaviour). It seems cython does set the return value to 0 in absence of a return statement, but I wasn't able to find that stated explicitly in cython's documentation. Being explicit about it is safer. (certainly, if you say in the function signature that returning -1 means an exception has occurred, then it is instructive to explicitly not do that when no exception has occurred)

nathanncohen · 2014-11-18T10:59:40Z

comment:20

Hellooooooo !!

OK, done. If it's possible the obviously avoid memory leaks at virtually no cost then we should, particularly because sage library code often is used as template for new contributions.

Okayokay. Well, there is obviously nothing wrong with what you did !

Right now I am on a very bad internet connection. It goes through a weird proxy and I can't use git nor ssh and I get disconnected from trac so I have to re-send every comment 10 times. I will be able to test and review your commit tonight. Do you have anything against the previous changes ? If not I will set this ticket to positive_review later when I will be able to test your commit.

Thanks,

Nathann

nathanncohen · 2014-11-18T12:59:16Z

comment:21

Hello ! I was finally able to test your latest commit and Sage reported nothing wrong. You can set this ticket to positive_review if you agree with my commits.

Thanks,

Nathann

nathanncohen · 2014-11-18T12:59:16Z

Author: Nils Bruin, Nathann Cohen

nathanncohen · 2014-11-18T12:59:16Z

Reviewer: Nils Bruin, Nathann Cohen

nbruin · 2014-11-18T17:16:12Z

comment:22

Replying to @nathanncohen:

You can set this ticket to positive_review if you agree with my commits.

No objections here. However, someone who uses gurobi should probably check&sign off on that change: especially because it's optional, the buildbot won't have anything useful to say about it either.

nathanncohen · 2014-11-18T17:18:43Z

comment:23

No objections here. However, someone who uses gurobi should probably check&sign off on that change: especially because it's optional, the buildbot won't have anything useful to say about it either.

Okay... Perhaps Peter then, since he raised the problem ? :-P

Nathann

ypfmde · 2014-11-18T18:16:44Z

comment:24

Replying to @nathanncohen:

Okay... Perhaps Peter then, since he raised the problem ? :-P

Well, I tried the patch in version 6.4.rc2. My original program now runs fine. There, however, all MIPs are feasible. So I tried a silly test code where every other MIP is infeasible. Again, memory usage is constant.

So the Coin and gurobi backends now seem to work fine.

(By the way, MIPSolverException cannot be used in a try/except environment. Not sure if this is a bug. Anyway, this was the same in the unpatched version.)

simon-king-jena · 2014-11-18T18:23:40Z

comment:26

Replying to @ypfmde:

(By the way, MIPSolverException cannot be used in a try/except environment. Not sure if this is a bug. Anyway, this was the same in the unpatched version.)

Why not? Did you perhaps not import the error first?

sage: from sage.numerical.mip import MIPSolverException
sage: try:
....:     raise MIPSolverException("Problem")
....: except MIPSolverException:
....:     print "problem caught"
....: 
problem caught

ypfmde · 2014-11-18T19:08:54Z

comment:27

Replying to @simon-king-jena:

Replying to @ypfmde:

(By the way, MIPSolverException cannot be used in a try/except environment. Not sure if this is a bug. Anyway, this was the same in the unpatched version.)

Why not? Did you perhaps not import the error first?

Indeed, I didn't know that one has to import this error. Hmm, still doesn't look user-friendly to me that the class MixedIntegerLinearProgram is known to Sage without an import, while the exception belonging to it is not. -- Peter Mueller

simon-king-jena · 2014-11-18T19:49:48Z

comment:28

Replying to @ypfmde:

Why not? Did you perhaps not import the error first?

Indeed, I didn't know that one has to import this error. Hmm, still doesn't look user-friendly to me that the class MixedIntegerLinearProgram is known to Sage without an import, while the exception belonging to it is not. -- Peter Mueller

I'd say that having MixedIntegerLinearProgram in the global name space is not good, since most Sage users wouldn't use it. So, it is friendly to some users that there is no need to import it before using.

And MIPSolverException is a technicality that is only relevant to developers or expert users, and thus clearly shouldn't pollute the global name space.

vbraun · 2014-11-20T16:35:12Z

Changed branch from public/17320 to 5480f4e

ypfmde added this to the sage-6.4 milestone Nov 12, 2014

ypfmde added c: linear programming labels Nov 12, 2014

nathanncohen mannequin added the s: needs review label Nov 17, 2014

ypfmde added s: positive review and removed s: needs review labels Nov 18, 2014

vbraun removed the s: positive review label Nov 20, 2014

vbraun closed this as completed in 1fc9ab6 Nov 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leaks with LP Solvers are back #17320

Memory leaks with LP Solvers are back #17320

ypfmde commented Nov 12, 2014

nbruin commented Nov 12, 2014

nbruin commented Nov 12, 2014

nbruin commented Nov 13, 2014

nbruin commented Nov 13, 2014

nbruin commented Nov 13, 2014

nathanncohen mannequin commented Nov 13, 2014

nbruin commented Nov 13, 2014

nathanncohen mannequin commented Nov 16, 2014

ypfmde commented Nov 16, 2014

nathanncohen mannequin commented Nov 16, 2014

ypfmde commented Nov 16, 2014

nathanncohen mannequin commented Nov 16, 2014

nbruin commented Nov 16, 2014

nathanncohen mannequin commented Nov 17, 2014

nathanncohen mannequin commented Nov 17, 2014

nathanncohen mannequin commented Nov 17, 2014

nathanncohen mannequin commented Nov 17, 2014

nathanncohen mannequin commented Nov 17, 2014

sagetrac-git mannequin commented Nov 17, 2014

sagetrac-git mannequin commented Nov 17, 2014

nbruin commented Nov 17, 2014

nathanncohen mannequin commented Nov 18, 2014

nathanncohen mannequin commented Nov 18, 2014

nathanncohen mannequin commented Nov 18, 2014

nathanncohen mannequin commented Nov 18, 2014

nbruin commented Nov 18, 2014

nathanncohen mannequin commented Nov 18, 2014

ypfmde commented Nov 18, 2014

simon-king-jena commented Nov 18, 2014

ypfmde commented Nov 18, 2014

simon-king-jena commented Nov 18, 2014

vbraun commented Nov 20, 2014

Memory leaks with LP Solvers are back #17320

Memory leaks with LP Solvers are back #17320

Comments

ypfmde commented Nov 12, 2014

nbruin commented Nov 12, 2014

nbruin commented Nov 12, 2014

nbruin commented Nov 13, 2014

nbruin commented Nov 13, 2014

nbruin commented Nov 13, 2014

nathanncohen mannequin commented Nov 13, 2014

nbruin commented Nov 13, 2014

nathanncohen mannequin commented Nov 16, 2014

ypfmde commented Nov 16, 2014

nathanncohen mannequin commented Nov 16, 2014

ypfmde commented Nov 16, 2014

nathanncohen mannequin commented Nov 16, 2014

nbruin commented Nov 16, 2014

nathanncohen mannequin commented Nov 17, 2014

nathanncohen mannequin commented Nov 17, 2014

nathanncohen mannequin commented Nov 17, 2014

nathanncohen mannequin commented Nov 17, 2014

nathanncohen mannequin commented Nov 17, 2014

sagetrac-git mannequin commented Nov 17, 2014

sagetrac-git mannequin commented Nov 17, 2014

nbruin commented Nov 17, 2014

nathanncohen mannequin commented Nov 18, 2014

nathanncohen mannequin commented Nov 18, 2014

nathanncohen mannequin commented Nov 18, 2014

nathanncohen mannequin commented Nov 18, 2014

nbruin commented Nov 18, 2014

nathanncohen mannequin commented Nov 18, 2014

ypfmde commented Nov 18, 2014

simon-king-jena commented Nov 18, 2014

ypfmde commented Nov 18, 2014

simon-king-jena commented Nov 18, 2014

vbraun commented Nov 20, 2014