-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leaks during Model Checking? #103
Comments
That's interesting ! I'll look at it as soon as I can. |
Ah... after an absurdly large amount of time used to fetch the absurdly large problem file, I think I see the problem. Currently, dolmen does the following for model checking:
I think that the large memory consumption is because of the list of stored asserts. My current solution to solve that would be an option to type the entire model when we reach the first assertion, which would then allow to treat each assert right after typing it (and thus skip the need to store them in a list). However, that requires the input problem to have all its declarations and definitions before the first assert (which I think should be the case for most problems). |
I'm quite surprised that just keeping the assertions takes so much space, but indeed it is the case. It is worrying that the typed AST takes so much space. |
Sorry for the delay. I'll try and fix that asap. About the memory consumption, I'd expect at least a 10x (maybe closer to 20x ?) factor in size between text representation and the typed dolmen AST. For instance, consider the following term:
That sums up to at least a dozen memory words, and I've not counted some of the fields of the typed representation (e.g. memoized hash and the custom tag map, which each adds 2 memory words for each term). So I'm not surprised that if we need to store in memory the whole AST, then we need quite a lot of RAM for big problems. Actually, textual representation is insanely compact, and AST takes quite a lot of space because: in OCaml the base unit for representation is memory words rather than bytes, and we need to store metadata as well as pointers (whereas text just concatenates everything, so we need to pay for the Tree structure). |
To be clear, i have some plans and ideas to reduce the memory used to store the ASTs used by Dolmen, but I don't think we can ever realistically go lower than a ~10x factor between text size and AST size, mainly because we have to pay for the metadata associated to typed terms, as well as the convenience of having a tree instead of a linear term: the ability to directly go the the n-th argument of a function application requires some space to store pointers (and/or offsets, but that has the same result). |
A small nitpick, but if you store offsets in a int32 bigarray, you use
half the space you'd use with pointers :-).
|
Sure, but then we'd need to re-implement a GC on this bigarray (to correctly handle cases where we can treat each assert one at a time and then forget it before treating the next one, which works quite well currently), ^^ |
For sure, it's difficult. I think hashconsing with a weak table is the reasonable compromise. Compared to the text, it's possible that not all terms are in memory at once, and sharing is super important. How many distinct terms does a problem contain? |
I'll try and experiment with hashconsing when I have the time, but I'm not really convinced there are that many duplicated terms. |
We're talking about assertions, right? I think they often have a ton of duplicated terms, especially if you rely on hashconsing to expand |
Dolmen doesn't expand let-bindings, ^^ |
My point is that maybe, for checking models, it should. |
Hm.. if the size of the env during typechecking becomes a problem, maybe we'll need to consider that, but I don't think that's a problem for now. |
@bobot I have a fix for the memory consumption on your example. I'll explain a bit more in a forthcoming PR, but basically, with the fix, dolmen manages to check the model, all while staying at ~500M of RAM used. Unfortunately, I didn't have the time to include that fix as part of the 0.8.1 release, but it will be part of the next release (i.e. 0.9). By the way, the model file you provide actually has the model duplicated, i.e. the whole model definition is actually present twice in the file, I don't know if it's intended or not. |
When checking the model of this big file form the smt-lib, QF_BV/2019-Mann/ridecore-qf_bv-bug.smt2, with this model
output.zip dolmen takes more than 10Go of memory and increasing. Typing takes only a fraction of that. Are we keeping the intermediate values or environment unnecessarily?
The text was updated successfully, but these errors were encountered: