Skip to content
This repository was archived by the owner on Apr 25, 2024. It is now read-only.

Optimize KCFG memory usage#872

Merged
rv-jenkins merged 36 commits intomasterfrom
optimize-kcfg
Feb 27, 2024
Merged

Optimize KCFG memory usage#872
rv-jenkins merged 36 commits intomasterfrom
optimize-kcfg

Conversation

@virgil-serbanuta
Copy link
Copy Markdown
Contributor

@virgil-serbanuta virgil-serbanuta commented Feb 12, 2024

This optimization works by making sure that a single object is used for all groups of equal subterms in a kcfg's nodes.

This is implemented by keeping a map from (optimized) subterms to IDs, and another map from subterm IDs to actual subterms. Whenever a new node is added to the kcfg, this PR will do a bottom-up traversal of its subterms, replacing them with their cached version, if that exists, or adding them to the cache otherwise.

Memory usage

For the add liquidity proof, when running with Kasmer, the kcfg object uses 50-60GB of memory. With this PR, it uses only 1 GB.

Educated guess: In general, I expect that the memory usage for proofs whose configurations do not change much between consecutive nodes (usually that would be configurations of medium or high complexity), the memory usage is higher when the kcfg holds 1-2 nodes, and lower afterwards. For proofs whose configurations change significantly between nodes (usually that would mean simple configurations, e.g. with only the <k> cell), this PR may increase the memory usage.

Runtime

  • Running a simple proof (i.e. most of the time is spent in the Booster backend): 6 min 59 sec with the PR, 6 min 56 sec before PR
  • Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9 sec before PR
  • Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR, 1425 sec before PR

@virgil-serbanuta virgil-serbanuta force-pushed the optimize-kcfg branch 3 times, most recently from 57f7ea3 to 4b9a6f9 Compare February 12, 2024 12:16
@tothtamas28
Copy link
Copy Markdown
Collaborator

Please add a description that roughly explains how the optimization works, and what its impacts are on performance (including of course savings in memory consumption). Thanks!

@virgil-serbanuta virgil-serbanuta marked this pull request as ready for review February 13, 2024 17:57
@virgil-serbanuta
Copy link
Copy Markdown
Contributor Author

Please add a description that roughly explains how the optimization works, and what its impacts are on performance (including of course savings in memory consumption). Thanks!

Done, I hope it's reasonable. If you have a standardized way of measuring performance, I can also run that.

@ehildenb
Copy link
Copy Markdown
Member

@virgil-serbanuta I'm curious what is taking up so much memory in teh KCFG in general. Can you guys store fewer nodes as well (increase execute_depth on your requests from 1k default to 10k, for example) as well?

This seems like a good move overall, I'm worried about the added complexity though. Also worried about thread safety, since we are working on a parallel prover implementation.

@virgil-serbanuta
Copy link
Copy Markdown
Contributor Author

@ehildenb Well, most of the kcfg memory is taken by the parsed wasm code, and the largest part of that is a table of functions with their type and code. This changes rarely, i.e. when there are contract calls, and for the large proof+kcfg I mentioned above there are ~10 calls between contracts. Also, the pair contract, for which I'm doing property verification right now, is rather large.

FWIW, there are 252 nodes in that large kcfg (~200M/node), most of which have 1000 steps between them. On average, the contract takes ~7 minutes between nodes, but it can take significantly longer.

I thought about using more steps between nodes, and I mentioned this in my initial discussion with @tothtamas28 . That is doable, but that would make debugging harder. The problem is that when there is a backend error (or when I stop execution, say, because I want to figure out why a split happens), I have to restart from the last valid node. With 1000 steps between nodes, I usually lose up to 7 minutes to reach the error point (plus the time needed for loading the configuration from the disk). With 10000 steps, I would lose up to 70 minutes.

@virgil-serbanuta
Copy link
Copy Markdown
Contributor Author

@ehildenb I forgot: the unindexed contracts are also stored in the kcfg, they take a lot of space, and they do not change, but I think that they take less memory than their indexed version I mentioned above (the "parsed wasm code"). In case it wasn't clear, only the contract currently being executed is in its indexed state.

Also, I didn't actually measure the memory usage, for the various parts of the configuration, all of the above are educated guesses.

@ehildenb
Copy link
Copy Markdown
Member

Thanks for the detailed explanation. I wonder if we can somehow make the storage of the KCFG more compact via something like CSE for this, by executing chunks of the code and storing that separately as a KCFG (with smaller code attached to it, just of that function), and then re-using that KCFG for the larger executions. It may be worth investigating.

The biggest concern I have here then is thread-safety. Can we make it so that this caching mechanism is thread-safe? We'd like to have multiple worker threads extending the KCFG in various places at the same time.

@virgil-serbanuta
Copy link
Copy Markdown
Contributor Author

The biggest concern I have here then is thread-safety. Can we make it so that this caching mechanism is thread-safe? We'd like to have multiple worker threads extending the KCFG in various places at the same time.

@ehildenb I'm assuming that the kcfg object itself is/will be thread safe.

I'm not fully sure how this will be used. The simplest way of solving the problem is to add a simple lock around the bottom_up_with_summary call in optimize(), and I did exactly that.

However, there are other solutions with higher performance, and I implemented one here: https://github.com/runtimeverification/pyk/blob/optimize-kcfg-fancy/src/pyk/kast/optimizer.py#L17-L76 . It is not fully finished, but I could try to move it to the current PR if you think that kcfgs need to handle non-trivial amounts of concurrent updates.

Comment thread src/pyk/kcfg/kcfg.py Outdated
Comment thread src/pyk/kcfg/kcfg.py Outdated
Copy link
Copy Markdown
Collaborator

@tothtamas28 tothtamas28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few more nitpicks, but apart from those looks good to me.

As far as I'm concerned, the only thing left before merging is testing this in Kontrol. Once you consider the PR ready, let me know, and I'll run the Kontrol integration tests with this version.

Comment thread src/pyk/kcfg/store.py Outdated
Comment thread src/pyk/kcfg/store.py Outdated
Comment thread src/pyk/kcfg/store.py Outdated
Comment thread src/pyk/kcfg/store.py Outdated
Comment thread src/pyk/kcfg/store.py Outdated
Comment thread src/pyk/kcfg/store.py Outdated
Comment thread src/tests/unit/kcfg/test_store.py Outdated
Comment thread src/tests/unit/kcfg/test_store.py Outdated
virgil-serbanuta and others added 6 commits February 23, 2024 12:58
Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com>
Needed to implement [Issue
389](runtimeverification/kontrol#389) in
Kontrol.

---------

Co-authored-by: devops <devops@runtimeverification.com>
- `KCFGExplore.extend_kcfg()` replaced by `KCFG.extend()`
- Logging of `ExtendResult`s moved earlier to when they are created
(before, they were logged when applied, but this is no longer possible
with application logic (`extend()`) living in `KCFG`.
- `KCFGExplore.extend()` removed and its logic moved up into
`advance_pending_node()`

---------

Co-authored-by: devops <devops@runtimeverification.com>
Co-authored-by: devops <devops@runtimeverification.com>
Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com>
Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>
Comment thread src/pyk/kcfg/store.py
Comment on lines +38 to +46
def __getitem__(self, key: int) -> KCFG.Node:
return self._nodes[key]

def __setitem__(self, key: int, node: KCFG.Node) -> None:
old_cterm = node.cterm
new_config = self._optimize(old_cterm.config)
new_constraints = tuple(self._optimize(c) for c in old_cterm.constraints)
new_node = KCFG.Node(node.id, CTerm(new_config, new_constraints))
self._nodes[key] = new_node
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand how this is working....

When I select a given node with something like kcfg.node(node_id), it appears to me here that what I'll get back is the optimized version. Where does it actually reconstruct the original CTerm, so that I can pass it to the rest of the pyk library for manipulation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I'm not sure I understand the question, I hope this answers it:

"Optimized" has two meanings here.

First, objects derived form "_OptInner" are "optimized". An _OptInner object, as used here, is equivalent to a KInner in the context of a OptimizedNodeStore object. However, _OptInner is supposed to be private to this file, so it is not directly related to CTerms (it is just a helper to make the optimization efficient).

Second, new_node contains a CTerm that is optimized (in the sense that it shares memory with other CTerms), but which is equal (i.e. the == operator will evaluate to True) to the CTerm inside the node passed to this function. This means that the original CTerm was reconstructed two lines above, in the CTerm(new_config, new_constraints) call.

This optimization works like this (I'm ignoring a few details):

Before optimization, a KInner term is decomposed in a bottom-up way into _OptInner objects. These objects are stored in a _Cache, which ensures that there is a single copy of each _OptInner object. Additionally, for each _OptInner object, the OptimizedNodeStore holds a KInner object created from that _OptInner. The subterms of this KInner also correspond to _OptInner objects in the cache.

As an example, let's say that we have a term that looks like this: f(t1, g(t2, t1')), where t1 == t1'. This is decomposed bottom-up, there will be a single _OptInner object (o1) corresponding to t1 and t1', and the OptimizedNodeStore will hold a KInner object T1 corresponding to o1 such that T1 == t1 == t1'. Then , when reconstructing the original term, it will look like this: f(T1, g(T2, T1)). Note that this uses the same object (T1) for both occurrences (i.e. it shares memory), while the original term used two identical objects (t1 and t1').

In other words, if the same KInner (K1) occurs in different parts of a KInner object (K2) created from an _OptInner, then there is a single object for K1 which is shared for all occurrences in K2. This optimization is less important, but we get it for free.

Next, when another KInner term (e.g., from a different CTerm) is optimized, it is also decomposed and reconstructed as above. This means that this KInner shares the objects for the common subterms with the previous KInner terms that were optimized. This is the optimization that matters.

The end result is that an optimized KCFG is the same as an unoptimized one (i.e., '==' returns True), but the optimized version shares a single object for all occurrences of a (sub)term, while the unoptimized one has a copy for each occurrence.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. So it's really about using Python's object references in a smart way, by basically making sure that for each immutable KInner we ever see, we only store one copy of it, and any new ones we see that are the same are adjusted to point at the same copy as the first one we saw, instead of being their own copy?

@ehildenb
Copy link
Copy Markdown
Member

This looks good to me, @tothtamas28 if you can do final sign-off then I think we're good. Awesome idea @virgil-serbanuta !

Copy link
Copy Markdown
Collaborator

@tothtamas28 tothtamas28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in kontrol, LGTM.

@rv-jenkins rv-jenkins merged commit a8a749d into master Feb 27, 2024
@rv-jenkins rv-jenkins deleted the optimize-kcfg branch February 27, 2024 01:35
Baltoli pushed a commit to runtimeverification/k that referenced this pull request Apr 9, 2024
This optimization works by making sure that a single object is used for
all groups of equal subterms in a kcfg's nodes.

This is implemented by keeping a map from (optimized) subterms to IDs,
and another map from subterm IDs to actual subterms. Whenever a new node
is added to the kcfg, this PR will do a bottom-up traversal of its
subterms, replacing them with their cached version, if that exists, or
adding them to the cache otherwise.

Memory usage
---------------

For the add liquidity proof, when running with Kasmer, the kcfg object
uses 50-60GB of memory. With this PR, it uses only 1 GB.

Educated guess: In general, I expect that the memory usage for proofs
whose configurations do not change much between consecutive nodes
(usually that would be configurations of medium or high complexity), the
memory usage is higher when the kcfg holds 1-2 nodes, and lower
afterwards. For proofs whose configurations change significantly between
nodes (usually that would mean simple configurations, e.g. with only the
`<k>` cell), this PR may increase the memory usage.

Runtime
--------

* Running a simple proof (i.e. most of the time is spent in the Booster
backend): 6 min 59 sec with the PR, 6 min 56 sec before PR
* Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9
sec before PR
* Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR,
1425 sec before PR

---------

Co-authored-by: devops <devops@runtimeverification.com>
Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com>
Co-authored-by: Lisandra <lisandrasilva18@hotmail.com>
Co-authored-by: Noah Watson <107630091+nwatson22@users.noreply.github.com>
Co-authored-by: rv-jenkins <admin@runtimeverification.com>
Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>
Baltoli pushed a commit to runtimeverification/k that referenced this pull request Apr 9, 2024
This optimization works by making sure that a single object is used for
all groups of equal subterms in a kcfg's nodes.

This is implemented by keeping a map from (optimized) subterms to IDs,
and another map from subterm IDs to actual subterms. Whenever a new node
is added to the kcfg, this PR will do a bottom-up traversal of its
subterms, replacing them with their cached version, if that exists, or
adding them to the cache otherwise.

Memory usage
---------------

For the add liquidity proof, when running with Kasmer, the kcfg object
uses 50-60GB of memory. With this PR, it uses only 1 GB.

Educated guess: In general, I expect that the memory usage for proofs
whose configurations do not change much between consecutive nodes
(usually that would be configurations of medium or high complexity), the
memory usage is higher when the kcfg holds 1-2 nodes, and lower
afterwards. For proofs whose configurations change significantly between
nodes (usually that would mean simple configurations, e.g. with only the
`<k>` cell), this PR may increase the memory usage.

Runtime
--------

* Running a simple proof (i.e. most of the time is spent in the Booster
backend): 6 min 59 sec with the PR, 6 min 56 sec before PR
* Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9
sec before PR
* Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR,
1425 sec before PR

---------

Co-authored-by: devops <devops@runtimeverification.com>
Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com>
Co-authored-by: Lisandra <lisandrasilva18@hotmail.com>
Co-authored-by: Noah Watson <107630091+nwatson22@users.noreply.github.com>
Co-authored-by: rv-jenkins <admin@runtimeverification.com>
Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>
Baltoli pushed a commit to runtimeverification/k that referenced this pull request Apr 10, 2024
This optimization works by making sure that a single object is used for
all groups of equal subterms in a kcfg's nodes.

This is implemented by keeping a map from (optimized) subterms to IDs,
and another map from subterm IDs to actual subterms. Whenever a new node
is added to the kcfg, this PR will do a bottom-up traversal of its
subterms, replacing them with their cached version, if that exists, or
adding them to the cache otherwise.

Memory usage
---------------

For the add liquidity proof, when running with Kasmer, the kcfg object
uses 50-60GB of memory. With this PR, it uses only 1 GB.

Educated guess: In general, I expect that the memory usage for proofs
whose configurations do not change much between consecutive nodes
(usually that would be configurations of medium or high complexity), the
memory usage is higher when the kcfg holds 1-2 nodes, and lower
afterwards. For proofs whose configurations change significantly between
nodes (usually that would mean simple configurations, e.g. with only the
`<k>` cell), this PR may increase the memory usage.

Runtime
--------

* Running a simple proof (i.e. most of the time is spent in the Booster
backend): 6 min 59 sec with the PR, 6 min 56 sec before PR
* Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9
sec before PR
* Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR,
1425 sec before PR

---------

Co-authored-by: devops <devops@runtimeverification.com>
Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com>
Co-authored-by: Lisandra <lisandrasilva18@hotmail.com>
Co-authored-by: Noah Watson <107630091+nwatson22@users.noreply.github.com>
Co-authored-by: rv-jenkins <admin@runtimeverification.com>
Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>
Baltoli pushed a commit to runtimeverification/k that referenced this pull request Apr 10, 2024
This optimization works by making sure that a single object is used for
all groups of equal subterms in a kcfg's nodes.

This is implemented by keeping a map from (optimized) subterms to IDs,
and another map from subterm IDs to actual subterms. Whenever a new node
is added to the kcfg, this PR will do a bottom-up traversal of its
subterms, replacing them with their cached version, if that exists, or
adding them to the cache otherwise.

Memory usage
---------------

For the add liquidity proof, when running with Kasmer, the kcfg object
uses 50-60GB of memory. With this PR, it uses only 1 GB.

Educated guess: In general, I expect that the memory usage for proofs
whose configurations do not change much between consecutive nodes
(usually that would be configurations of medium or high complexity), the
memory usage is higher when the kcfg holds 1-2 nodes, and lower
afterwards. For proofs whose configurations change significantly between
nodes (usually that would mean simple configurations, e.g. with only the
`<k>` cell), this PR may increase the memory usage.

Runtime
--------

* Running a simple proof (i.e. most of the time is spent in the Booster
backend): 6 min 59 sec with the PR, 6 min 56 sec before PR
* Loading a small kcfg (29M on disk) from json: 5.5 sec with the PR, 4.9
sec before PR
* Loading a large kcfg (6.1G on disk) from json: 1606 sec with the PR,
1425 sec before PR

---------

Co-authored-by: devops <devops@runtimeverification.com>
Co-authored-by: Tamás Tóth <tothtamas28@users.noreply.github.com>
Co-authored-by: Lisandra <lisandrasilva18@hotmail.com>
Co-authored-by: Noah Watson <107630091+nwatson22@users.noreply.github.com>
Co-authored-by: rv-jenkins <admin@runtimeverification.com>
Co-authored-by: Everett Hildenbrandt <everett.hildenbrandt@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants