Optimize KCFG memory usage#872
Conversation
57f7ea3 to
4b9a6f9
Compare
|
Please add a description that roughly explains how the optimization works, and what its impacts are on performance (including of course savings in memory consumption). Thanks! |
Done, I hope it's reasonable. If you have a standardized way of measuring performance, I can also run that. |
|
@virgil-serbanuta I'm curious what is taking up so much memory in teh KCFG in general. Can you guys store fewer nodes as well (increase This seems like a good move overall, I'm worried about the added complexity though. Also worried about thread safety, since we are working on a parallel prover implementation. |
|
@ehildenb Well, most of the kcfg memory is taken by the parsed wasm code, and the largest part of that is a table of functions with their type and code. This changes rarely, i.e. when there are contract calls, and for the large proof+kcfg I mentioned above there are ~10 calls between contracts. Also, the pair contract, for which I'm doing property verification right now, is rather large. FWIW, there are 252 nodes in that large kcfg (~200M/node), most of which have 1000 steps between them. On average, the contract takes ~7 minutes between nodes, but it can take significantly longer. I thought about using more steps between nodes, and I mentioned this in my initial discussion with @tothtamas28 . That is doable, but that would make debugging harder. The problem is that when there is a backend error (or when I stop execution, say, because I want to figure out why a split happens), I have to restart from the last valid node. With 1000 steps between nodes, I usually lose up to 7 minutes to reach the error point (plus the time needed for loading the configuration from the disk). With 10000 steps, I would lose up to 70 minutes. |
|
@ehildenb I forgot: the unindexed contracts are also stored in the kcfg, they take a lot of space, and they do not change, but I think that they take less memory than their indexed version I mentioned above (the "parsed wasm code"). In case it wasn't clear, only the contract currently being executed is in its indexed state. Also, I didn't actually measure the memory usage, for the various parts of the configuration, all of the above are educated guesses. |
|
Thanks for the detailed explanation. I wonder if we can somehow make the storage of the KCFG more compact via something like CSE for this, by executing chunks of the code and storing that separately as a KCFG (with smaller code attached to it, just of that function), and then re-using that KCFG for the larger executions. It may be worth investigating. The biggest concern I have here then is thread-safety. Can we make it so that this caching mechanism is thread-safe? We'd like to have multiple worker threads extending the KCFG in various places at the same time. |
c49adf2 to
2049190
Compare
d315bac to
1f8b023
Compare
@ehildenb I'm assuming that the kcfg object itself is/will be thread safe. I'm not fully sure how this will be used. The simplest way of solving the problem is to add a simple lock around the However, there are other solutions with higher performance, and I implemented one here: https://github.com/runtimeverification/pyk/blob/optimize-kcfg-fancy/src/pyk/kast/optimizer.py#L17-L76 . It is not fully finished, but I could try to move it to the current PR if you think that kcfgs need to handle non-trivial amounts of concurrent updates. |
c26b64e to
3db25e0
Compare
tothtamas28
left a comment
There was a problem hiding this comment.
I added a few more nitpicks, but apart from those looks good to me.
As far as I'm concerned, the only thing left before merging is testing this in Kontrol. Once you consider the PR ready, let me know, and I'll run the Kontrol integration tests with this version.
Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com>
Needed to implement [Issue 389](runtimeverification/kontrol#389) in Kontrol. --------- Co-authored-by: devops <devops@runtimeverification.com>
- `KCFGExplore.extend_kcfg()` replaced by `KCFG.extend()` - Logging of `ExtendResult`s moved earlier to when they are created (before, they were logged when applied, but this is no longer possible with application logic (`extend()`) living in `KCFG`. - `KCFGExplore.extend()` removed and its logic moved up into `advance_pending_node()` --------- Co-authored-by: devops <devops@runtimeverification.com>
Co-authored-by: devops <devops@runtimeverification.com> Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com> Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>
9480e5f to
3404aa9
Compare
| def __getitem__(self, key: int) -> KCFG.Node: | ||
| return self._nodes[key] | ||
|
|
||
| def __setitem__(self, key: int, node: KCFG.Node) -> None: | ||
| old_cterm = node.cterm | ||
| new_config = self._optimize(old_cterm.config) | ||
| new_constraints = tuple(self._optimize(c) for c in old_cterm.constraints) | ||
| new_node = KCFG.Node(node.id, CTerm(new_config, new_constraints)) | ||
| self._nodes[key] = new_node |
There was a problem hiding this comment.
I don't fully understand how this is working....
When I select a given node with something like kcfg.node(node_id), it appears to me here that what I'll get back is the optimized version. Where does it actually reconstruct the original CTerm, so that I can pass it to the rest of the pyk library for manipulation?
There was a problem hiding this comment.
Hmm... I'm not sure I understand the question, I hope this answers it:
"Optimized" has two meanings here.
First, objects derived form "_OptInner" are "optimized". An _OptInner object, as used here, is equivalent to a KInner in the context of a OptimizedNodeStore object. However, _OptInner is supposed to be private to this file, so it is not directly related to CTerms (it is just a helper to make the optimization efficient).
Second, new_node contains a CTerm that is optimized (in the sense that it shares memory with other CTerms), but which is equal (i.e. the == operator will evaluate to True) to the CTerm inside the node passed to this function. This means that the original CTerm was reconstructed two lines above, in the CTerm(new_config, new_constraints) call.
This optimization works like this (I'm ignoring a few details):
Before optimization, a KInner term is decomposed in a bottom-up way into _OptInner objects. These objects are stored in a _Cache, which ensures that there is a single copy of each _OptInner object. Additionally, for each _OptInner object, the OptimizedNodeStore holds a KInner object created from that _OptInner. The subterms of this KInner also correspond to _OptInner objects in the cache.
As an example, let's say that we have a term that looks like this: f(t1, g(t2, t1')), where t1 == t1'. This is decomposed bottom-up, there will be a single _OptInner object (o1) corresponding to t1 and t1', and the OptimizedNodeStore will hold a KInner object T1 corresponding to o1 such that T1 == t1 == t1'. Then , when reconstructing the original term, it will look like this: f(T1, g(T2, T1)). Note that this uses the same object (T1) for both occurrences (i.e. it shares memory), while the original term used two identical objects (t1 and t1').
In other words, if the same KInner (K1) occurs in different parts of a KInner object (K2) created from an _OptInner, then there is a single object for K1 which is shared for all occurrences in K2. This optimization is less important, but we get it for free.
Next, when another KInner term (e.g., from a different CTerm) is optimized, it is also decomposed and reconstructed as above. This means that this KInner shares the objects for the common subterms with the previous KInner terms that were optimized. This is the optimization that matters.
The end result is that an optimized KCFG is the same as an unoptimized one (i.e., '==' returns True), but the optimized version shares a single object for all occurrences of a (sub)term, while the unoptimized one has a copy for each occurrence.
There was a problem hiding this comment.
Ah I see. So it's really about using Python's object references in a smart way, by basically making sure that for each immutable KInner we ever see, we only store one copy of it, and any new ones we see that are the same are adjusted to point at the same copy as the first one we saw, instead of being their own copy?
|
This looks good to me, @tothtamas28 if you can do final sign-off then I think we're good. Awesome idea @virgil-serbanuta ! |
tothtamas28
left a comment
There was a problem hiding this comment.
Tested in kontrol, LGTM.
This optimization works by making sure that a single object is used for all groups of equal subterms in a kcfg's nodes. This is implemented by keeping a map from (optimized) subterms to IDs, and another map from subterm IDs to actual subterms. Whenever a new node is added to the kcfg, this PR will do a bottom-up traversal of its subterms, replacing them with their cached version, if that exists, or adding them to the cache otherwise. Memory usage --------------- For the add liquidity proof, when running with Kasmer, the kcfg object uses 50-60GB of memory. With this PR, it uses only 1 GB. Educated guess: In general, I expect that the memory usage for proofs whose configurations do not change much between consecutive nodes (usually that would be configurations of medium or high complexity), the memory usage is higher when the kcfg holds 1-2 nodes, and lower afterwards. For proofs whose configurations change significantly between nodes (usually that would mean simple configurations, e.g. with only the `<k>` cell), this PR may increase the memory usage. Runtime -------- * Running a simple proof (i.e. most of the time is spent in the Booster backend): 6 min 59 sec with the PR, 6 min 56 sec before PR * Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9 sec before PR * Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR, 1425 sec before PR --------- Co-authored-by: devops <devops@runtimeverification.com> Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com> Co-authored-by: Lisandra <lisandrasilva18@hotmail.com> Co-authored-by: Noah Watson <107630091+nwatson22@users.noreply.github.com> Co-authored-by: rv-jenkins <admin@runtimeverification.com> Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>
This optimization works by making sure that a single object is used for all groups of equal subterms in a kcfg's nodes. This is implemented by keeping a map from (optimized) subterms to IDs, and another map from subterm IDs to actual subterms. Whenever a new node is added to the kcfg, this PR will do a bottom-up traversal of its subterms, replacing them with their cached version, if that exists, or adding them to the cache otherwise. Memory usage --------------- For the add liquidity proof, when running with Kasmer, the kcfg object uses 50-60GB of memory. With this PR, it uses only 1 GB. Educated guess: In general, I expect that the memory usage for proofs whose configurations do not change much between consecutive nodes (usually that would be configurations of medium or high complexity), the memory usage is higher when the kcfg holds 1-2 nodes, and lower afterwards. For proofs whose configurations change significantly between nodes (usually that would mean simple configurations, e.g. with only the `<k>` cell), this PR may increase the memory usage. Runtime -------- * Running a simple proof (i.e. most of the time is spent in the Booster backend): 6 min 59 sec with the PR, 6 min 56 sec before PR * Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9 sec before PR * Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR, 1425 sec before PR --------- Co-authored-by: devops <devops@runtimeverification.com> Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com> Co-authored-by: Lisandra <lisandrasilva18@hotmail.com> Co-authored-by: Noah Watson <107630091+nwatson22@users.noreply.github.com> Co-authored-by: rv-jenkins <admin@runtimeverification.com> Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>
This optimization works by making sure that a single object is used for all groups of equal subterms in a kcfg's nodes. This is implemented by keeping a map from (optimized) subterms to IDs, and another map from subterm IDs to actual subterms. Whenever a new node is added to the kcfg, this PR will do a bottom-up traversal of its subterms, replacing them with their cached version, if that exists, or adding them to the cache otherwise. Memory usage --------------- For the add liquidity proof, when running with Kasmer, the kcfg object uses 50-60GB of memory. With this PR, it uses only 1 GB. Educated guess: In general, I expect that the memory usage for proofs whose configurations do not change much between consecutive nodes (usually that would be configurations of medium or high complexity), the memory usage is higher when the kcfg holds 1-2 nodes, and lower afterwards. For proofs whose configurations change significantly between nodes (usually that would mean simple configurations, e.g. with only the `<k>` cell), this PR may increase the memory usage. Runtime -------- * Running a simple proof (i.e. most of the time is spent in the Booster backend): 6 min 59 sec with the PR, 6 min 56 sec before PR * Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9 sec before PR * Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR, 1425 sec before PR --------- Co-authored-by: devops <devops@runtimeverification.com> Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com> Co-authored-by: Lisandra <lisandrasilva18@hotmail.com> Co-authored-by: Noah Watson <107630091+nwatson22@users.noreply.github.com> Co-authored-by: rv-jenkins <admin@runtimeverification.com> Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>
This optimization works by making sure that a single object is used for all groups of equal subterms in a kcfg's nodes. This is implemented by keeping a map from (optimized) subterms to IDs, and another map from subterm IDs to actual subterms. Whenever a new node is added to the kcfg, this PR will do a bottom-up traversal of its subterms, replacing them with their cached version, if that exists, or adding them to the cache otherwise. Memory usage --------------- For the add liquidity proof, when running with Kasmer, the kcfg object uses 50-60GB of memory. With this PR, it uses only 1 GB. Educated guess: In general, I expect that the memory usage for proofs whose configurations do not change much between consecutive nodes (usually that would be configurations of medium or high complexity), the memory usage is higher when the kcfg holds 1-2 nodes, and lower afterwards. For proofs whose configurations change significantly between nodes (usually that would mean simple configurations, e.g. with only the `<k>` cell), this PR may increase the memory usage. Runtime -------- * Running a simple proof (i.e. most of the time is spent in the Booster backend): 6 min 59 sec with the PR, 6 min 56 sec before PR * Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9 sec before PR * Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR, 1425 sec before PR --------- Co-authored-by: devops <devops@runtimeverification.com> Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com> Co-authored-by: Lisandra <lisandrasilva18@hotmail.com> Co-authored-by: Noah Watson <107630091+nwatson22@users.noreply.github.com> Co-authored-by: rv-jenkins <admin@runtimeverification.com> Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>
This optimization works by making sure that a single object is used for all groups of equal subterms in a kcfg's nodes.
This is implemented by keeping a map from (optimized) subterms to IDs, and another map from subterm IDs to actual subterms. Whenever a new node is added to the kcfg, this PR will do a bottom-up traversal of its subterms, replacing them with their cached version, if that exists, or adding them to the cache otherwise.
Memory usage
For the add liquidity proof, when running with Kasmer, the kcfg object uses 50-60GB of memory. With this PR, it uses only 1 GB.
Educated guess: In general, I expect that the memory usage for proofs whose configurations do not change much between consecutive nodes (usually that would be configurations of medium or high complexity), the memory usage is higher when the kcfg holds 1-2 nodes, and lower afterwards. For proofs whose configurations change significantly between nodes (usually that would mean simple configurations, e.g. with only the
<k>cell), this PR may increase the memory usage.Runtime