Optimize KCFG memory usage by virgil-serbanuta · Pull Request #872 · runtimeverification/pyk

virgil-serbanuta · 2024-02-12T11:51:29Z

This optimization works by making sure that a single object is used for all groups of equal subterms in a kcfg's nodes.

This is implemented by keeping a map from (optimized) subterms to IDs, and another map from subterm IDs to actual subterms. Whenever a new node is added to the kcfg, this PR will do a bottom-up traversal of its subterms, replacing them with their cached version, if that exists, or adding them to the cache otherwise.

Memory usage

For the add liquidity proof, when running with Kasmer, the kcfg object uses 50-60GB of memory. With this PR, it uses only 1 GB.

Educated guess: In general, I expect that the memory usage for proofs whose configurations do not change much between consecutive nodes (usually that would be configurations of medium or high complexity), the memory usage is higher when the kcfg holds 1-2 nodes, and lower afterwards. For proofs whose configurations change significantly between nodes (usually that would mean simple configurations, e.g. with only the <k> cell), this PR may increase the memory usage.

Runtime

Running a simple proof (i.e. most of the time is spent in the Booster backend): 6 min 59 sec with the PR, 6 min 56 sec before PR
Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9 sec before PR
Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR, 1425 sec before PR

tothtamas28 · 2024-02-13T12:25:29Z

Please add a description that roughly explains how the optimization works, and what its impacts are on performance (including of course savings in memory consumption). Thanks!

virgil-serbanuta · 2024-02-13T17:58:03Z

Please add a description that roughly explains how the optimization works, and what its impacts are on performance (including of course savings in memory consumption). Thanks!

Done, I hope it's reasonable. If you have a standardized way of measuring performance, I can also run that.

ehildenb · 2024-02-13T18:07:27Z

@virgil-serbanuta I'm curious what is taking up so much memory in teh KCFG in general. Can you guys store fewer nodes as well (increase execute_depth on your requests from 1k default to 10k, for example) as well?

This seems like a good move overall, I'm worried about the added complexity though. Also worried about thread safety, since we are working on a parallel prover implementation.

virgil-serbanuta · 2024-02-13T18:45:43Z

@ehildenb Well, most of the kcfg memory is taken by the parsed wasm code, and the largest part of that is a table of functions with their type and code. This changes rarely, i.e. when there are contract calls, and for the large proof+kcfg I mentioned above there are ~10 calls between contracts. Also, the pair contract, for which I'm doing property verification right now, is rather large.

FWIW, there are 252 nodes in that large kcfg (~200M/node), most of which have 1000 steps between them. On average, the contract takes ~7 minutes between nodes, but it can take significantly longer.

I thought about using more steps between nodes, and I mentioned this in my initial discussion with @tothtamas28 . That is doable, but that would make debugging harder. The problem is that when there is a backend error (or when I stop execution, say, because I want to figure out why a split happens), I have to restart from the last valid node. With 1000 steps between nodes, I usually lose up to 7 minutes to reach the error point (plus the time needed for loading the configuration from the disk). With 10000 steps, I would lose up to 70 minutes.

virgil-serbanuta · 2024-02-13T18:53:40Z

@ehildenb I forgot: the unindexed contracts are also stored in the kcfg, they take a lot of space, and they do not change, but I think that they take less memory than their indexed version I mentioned above (the "parsed wasm code"). In case it wasn't clear, only the contract currently being executed is in its indexed state.

Also, I didn't actually measure the memory usage, for the various parts of the configuration, all of the above are educated guesses.

ehildenb · 2024-02-13T18:55:58Z

Thanks for the detailed explanation. I wonder if we can somehow make the storage of the KCFG more compact via something like CSE for this, by executing chunks of the code and storing that separately as a KCFG (with smaller code attached to it, just of that function), and then re-using that KCFG for the larger executions. It may be worth investigating.

The biggest concern I have here then is thread-safety. Can we make it so that this caching mechanism is thread-safe? We'd like to have multiple worker threads extending the KCFG in various places at the same time.

virgil-serbanuta · 2024-02-13T23:46:30Z

The biggest concern I have here then is thread-safety. Can we make it so that this caching mechanism is thread-safe? We'd like to have multiple worker threads extending the KCFG in various places at the same time.

@ehildenb I'm assuming that the kcfg object itself is/will be thread safe.

I'm not fully sure how this will be used. The simplest way of solving the problem is to add a simple lock around the bottom_up_with_summary call in optimize(), and I did exactly that.

However, there are other solutions with higher performance, and I implemented one here: https://github.com/runtimeverification/pyk/blob/optimize-kcfg-fancy/src/pyk/kast/optimizer.py#L17-L76 . It is not fully finished, but I could try to move it to the current PR if you think that kcfgs need to handle non-trivial amounts of concurrent updates.

tothtamas28

I added a few more nitpicks, but apart from those looks good to me.

As far as I'm concerned, the only thing left before merging is testing this in Kontrol. Once you consider the PR ready, let me know, and I'll run the Kontrol integration tests with this version.

Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com>

Needed to implement [Issue 389](runtimeverification/kontrol#389) in Kontrol. --------- Co-authored-by: devops <devops@runtimeverification.com>

- `KCFGExplore.extend_kcfg()` replaced by `KCFG.extend()` - Logging of `ExtendResult`s moved earlier to when they are created (before, they were logged when applied, but this is no longer possible with application logic (`extend()`) living in `KCFG`. - `KCFGExplore.extend()` removed and its logic moved up into `advance_pending_node()` --------- Co-authored-by: devops <devops@runtimeverification.com>

Co-authored-by: devops <devops@runtimeverification.com> Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com> Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>

ehildenb · 2024-02-23T15:54:01Z

+    def __getitem__(self, key: int) -> KCFG.Node:
+        return self._nodes[key]
+
+    def __setitem__(self, key: int, node: KCFG.Node) -> None:
+        old_cterm = node.cterm
+        new_config = self._optimize(old_cterm.config)
+        new_constraints = tuple(self._optimize(c) for c in old_cterm.constraints)
+        new_node = KCFG.Node(node.id, CTerm(new_config, new_constraints))
+        self._nodes[key] = new_node


I don't fully understand how this is working....

When I select a given node with something like kcfg.node(node_id), it appears to me here that what I'll get back is the optimized version. Where does it actually reconstruct the original CTerm, so that I can pass it to the rest of the pyk library for manipulation?

Hmm... I'm not sure I understand the question, I hope this answers it:

"Optimized" has two meanings here.

First, objects derived form "_OptInner" are "optimized". An _OptInner object, as used here, is equivalent to a KInner in the context of a OptimizedNodeStore object. However, _OptInner is supposed to be private to this file, so it is not directly related to CTerms (it is just a helper to make the optimization efficient).

Second, new_node contains a CTerm that is optimized (in the sense that it shares memory with other CTerms), but which is equal (i.e. the == operator will evaluate to True) to the CTerm inside the node passed to this function. This means that the original CTerm was reconstructed two lines above, in the CTerm(new_config, new_constraints) call.

This optimization works like this (I'm ignoring a few details):

Before optimization, a KInner term is decomposed in a bottom-up way into _OptInner objects. These objects are stored in a _Cache, which ensures that there is a single copy of each _OptInner object. Additionally, for each _OptInner object, the OptimizedNodeStore holds a KInner object created from that _OptInner. The subterms of this KInner also correspond to _OptInner objects in the cache.

As an example, let's say that we have a term that looks like this: f(t1, g(t2, t1')), where t1 == t1'. This is decomposed bottom-up, there will be a single _OptInner object (o1) corresponding to t1 and t1', and the OptimizedNodeStore will hold a KInner object T1 corresponding to o1 such that T1 == t1 == t1'. Then , when reconstructing the original term, it will look like this: f(T1, g(T2, T1)). Note that this uses the same object (T1) for both occurrences (i.e. it shares memory), while the original term used two identical objects (t1 and t1').

In other words, if the same KInner (K1) occurs in different parts of a KInner object (K2) created from an _OptInner, then there is a single object for K1 which is shared for all occurrences in K2. This optimization is less important, but we get it for free.

Next, when another KInner term (e.g., from a different CTerm) is optimized, it is also decomposed and reconstructed as above. This means that this KInner shares the objects for the common subterms with the previous KInner terms that were optimized. This is the optimization that matters.

The end result is that an optimized KCFG is the same as an unoptimized one (i.e., '==' returns True), but the optimized version shares a single object for all occurrences of a (sub)term, while the unoptimized one has a copy for each occurrence.

Ah I see. So it's really about using Python's object references in a smart way, by basically making sure that for each immutable KInner we ever see, we only store one copy of it, and any new ones we see that are the same are adjusted to point at the same copy as the first one we saw, instead of being their own copy?

ehildenb · 2024-02-24T15:16:09Z

This looks good to me, @tothtamas28 if you can do final sign-off then I think we're good. Awesome idea @virgil-serbanuta !

tothtamas28

Tested in kontrol, LGTM.

This optimization works by making sure that a single object is used for all groups of equal subterms in a kcfg's nodes. This is implemented by keeping a map from (optimized) subterms to IDs, and another map from subterm IDs to actual subterms. Whenever a new node is added to the kcfg, this PR will do a bottom-up traversal of its subterms, replacing them with their cached version, if that exists, or adding them to the cache otherwise. Memory usage --------------- For the add liquidity proof, when running with Kasmer, the kcfg object uses 50-60GB of memory. With this PR, it uses only 1 GB. Educated guess: In general, I expect that the memory usage for proofs whose configurations do not change much between consecutive nodes (usually that would be configurations of medium or high complexity), the memory usage is higher when the kcfg holds 1-2 nodes, and lower afterwards. For proofs whose configurations change significantly between nodes (usually that would mean simple configurations, e.g. with only the `<k>` cell), this PR may increase the memory usage. Runtime -------- * Running a simple proof (i.e. most of the time is spent in the Booster backend): 6 min 59 sec with the PR, 6 min 56 sec before PR * Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9 sec before PR * Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR, 1425 sec before PR --------- Co-authored-by: devops <devops@runtimeverification.com> Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com> Co-authored-by: Lisandra <lisandrasilva18@hotmail.com> Co-authored-by: Noah Watson <107630091+nwatson22@users.noreply.github.com> Co-authored-by: rv-jenkins <admin@runtimeverification.com> Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>

virgil-serbanuta force-pushed the optimize-kcfg branch 3 times, most recently from 57f7ea3 to 4b9a6f9 Compare February 12, 2024 12:16

virgil-serbanuta marked this pull request as ready for review February 13, 2024 17:57

virgil-serbanuta requested a review from tothtamas28 February 13, 2024 17:57

Optimize KCFG memory usage

0d392d8

virgil-serbanuta force-pushed the optimize-kcfg branch from c49adf2 to 2049190 Compare February 13, 2024 23:26

Thread safety for the kcfg optimizer

1f8b023

virgil-serbanuta force-pushed the optimize-kcfg branch from d315bac to 1f8b023 Compare February 13, 2024 23:37

virgil-serbanuta and others added 2 commits February 13, 2024 23:37

Merge 1f8b023 into 6ed3f28

8b2c5a4

Set Version: 0.1.623

4d295fc

tothtamas28 reviewed Feb 19, 2024

View reviewed changes

Comment thread src/pyk/kcfg/kcfg.py Outdated

virgil-serbanuta added 4 commits February 19, 2024 18:25

Create an OptimizedNodeStore

28dd896

Move the optimizer code to OptimizedNodeStore

0ac7e93

Merge remote-tracking branch 'origin/master' into optimize-kcfg

ae68cd1

Lint fixes

3db25e0

virgil-serbanuta force-pushed the optimize-kcfg branch from c26b64e to 3db25e0 Compare February 19, 2024 17:06

virgil-serbanuta and others added 2 commits February 19, 2024 17:06

Merge 3db25e0 into 37edac2

faf1c97

Set Version: 0.1.636

5a36c7a

virgil-serbanuta requested a review from tothtamas28 February 19, 2024 17:10

tothtamas28 reviewed Feb 20, 2024

View reviewed changes

Comment thread src/pyk/kcfg/kcfg.py Outdated

virgil-serbanuta added 2 commits February 20, 2024 18:26

Extract OptimizedNodeStore to a separate file

cedbc29

Merge remote-tracking branch 'origin/master' into optimize-kcfg

5de898b

tothtamas28 reviewed Feb 22, 2024

View reviewed changes

virgil-serbanuta and others added 6 commits February 23, 2024 12:58

Apply suggestions from code review

c26a18c

Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com>

Fix review comments

1499576

Added execution time to APRSummary (#908)

7b86b30

Needed to implement [Issue 389](runtimeverification/kontrol#389) in Kontrol. --------- Co-authored-by: devops <devops@runtimeverification.com>

Update dependency: deps/k_release (#904)

14e0877

Co-authored-by: devops <devops@runtimeverification.com> Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com> Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>

Lint

3404aa9

virgil-serbanuta force-pushed the optimize-kcfg branch from 9480e5f to 3404aa9 Compare February 23, 2024 11:11

virgil-serbanuta and others added 2 commits February 23, 2024 11:11

Merge 3404aa9 into 5d0df1f

7906dfd

Set Version: 0.1.649

08f1e0f

ehildenb reviewed Feb 23, 2024

View reviewed changes

virgil-serbanuta requested a review from tothtamas28 February 26, 2024 17:00

This was referenced Feb 26, 2024

Encapsulation of CTerm interface separate from Kast/Kore representation runtimeverification/k#4178

Open

KCFGStore for managing loading and storage of KCFG data runtimeverification/k#4179

Open

tothtamas28 approved these changes Feb 26, 2024

View reviewed changes

virgil-serbanuta added the automerge label Feb 26, 2024

virgil-serbanuta and others added 6 commits February 27, 2024 00:36

Merge branch 'master' into optimize-kcfg

4624a7d

Merge 4624a7d into c130622

3339d11

Set Version: 0.1.653

cc0ff94

Merge branch 'master' into optimize-kcfg

2bafef6

Merge 2bafef6 into 6fd119b

821923b

Set Version: 0.1.654

4380cba

rv-jenkins merged commit a8a749d into master Feb 27, 2024

rv-jenkins deleted the optimize-kcfg branch February 27, 2024 01:35

jberthold mentioned this pull request May 15, 2024

Term cache to avoid storing term duplicates (addresses cache memory footprint) runtimeverification/haskell-backend#3863

Closed

Conversation

virgil-serbanuta commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Memory usage

Runtime

Uh oh!

tothtamas28 commented Feb 13, 2024

Uh oh!

virgil-serbanuta commented Feb 13, 2024

Uh oh!

ehildenb commented Feb 13, 2024

Uh oh!

virgil-serbanuta commented Feb 13, 2024

Uh oh!

virgil-serbanuta commented Feb 13, 2024

Uh oh!

ehildenb commented Feb 13, 2024

Uh oh!

virgil-serbanuta commented Feb 13, 2024

Uh oh!

Uh oh!

Uh oh!

tothtamas28 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ehildenb Feb 23, 2024

Choose a reason for hiding this comment

Uh oh!

virgil-serbanuta Feb 23, 2024

Choose a reason for hiding this comment

Uh oh!

ehildenb Feb 24, 2024

Choose a reason for hiding this comment

Uh oh!

ehildenb commented Feb 24, 2024

Uh oh!

tothtamas28 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

virgil-serbanuta commented Feb 12, 2024 •

edited

Loading