Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse VMs when possible #898

Closed
wants to merge 2 commits into from
Closed

Conversation

julienduchesne
Copy link
Member

Whenever ext code and imports are unchanged, VMs have a cache that can be useful for future evaluations. This PR adds a dumb pool that keeps the latest VM that was used for that same eval path.

I also added a shuffling of envs, so that the likelihood of reusing a VM is higher. When envs are done evaluating, their VM is put back in the pool (with an enriched cache from their eval).

An important thing to note is that VMs cannot be used concurrently, so we need to remove them from the pool when they are used.

@julienduchesne julienduchesne requested a review from a team August 4, 2023 17:10
@github-actions
Copy link

github-actions bot commented Aug 4, 2023

PR Preview Action v1.4.4
Preview removed because the pull request was closed.
2023-08-09 15:46 UTC

@julienduchesne julienduchesne force-pushed the julienduchesne/tanka-reuse-vms branch 2 times, most recently from 3ecdfc3 to b2cd809 Compare August 4, 2023 17:42
Copy link
Member

@iainlane iainlane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this was fun to review, cheers. the benchmark results look promising 👍

pkg/jsonnet/eval.go Show resolved Hide resolved
pkg/jsonnet/eval.go Show resolved Hide resolved
pkg/jsonnet/eval.go Outdated Show resolved Hide resolved
pkg/jsonnet/eval.go Show resolved Hide resolved
Comment on lines +82 to +85
type vmPool struct {
mutex sync.Mutex
available map[string][]*jsonnet.VM
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if a more fine-grained locking structure would provide benefit, something like this:

  • In the vmPool, have a sync.RWMutex
  • Make available's value type a struct { []*jsonnet.VM, sync.Mutex }
  • Lock the RWMutex for reading unless creating a slice for the first time

That way, you would only have to hold the global lock for one call (to create a struct instance and put it in the map); otherwise, you would only lock your own slice.

wdyt?


More deep / hard stuff follows, might not be worth it

Alternatively, would it be possible to precompute all the hashes so that the empty pool is initialised in advance and eliminate the global RWMutex?

Then building on that (assuming it can be done reasonably), could go one step further still and have an unbuffered channel per optionset. The VMs would be directly handed between environments then - if you finish and you know someone is going to want your VM, block until they get started. Otherwise discard it.

return vm
}

// Release returns a Jsonnet VM to the pool
func (p *vmPool) Release(vm *jsonnet.VM, opts Opts) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like VMs are never going to be garbage collected, is that right? How heavy are these objects? Could that be a problem with a huge set of environments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely a leak. I could make it an option (easy, as we pass the opts object). What do you think of that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS the interesting part is figuring out when to do the release, so sounds good but hard to tell without seeing what it looks like 😁

Alternatively sync.Pool might be a pragmatic way to do this, trading off being able to guarantee reuse if possible for GCability? One pool per hash.

Whenever ext code and imports are unchanged, VMs have a cache that can be useful for future evaluations.
This PR adds a dumb pool that keeps the latest VM that was used for that same eval path.

I also added a shuffling of envs, so that the likelihood of reusing a VM is higher. When envs are done evaluating, their VM is put back in the pool (with an enriched cache from their eval).
An important thing to note is that VMs cannot be used concurrently, so we need to remove them from the pool when they are used.
@julienduchesne
Copy link
Member Author

I won't move forward with this one. The gain is too small to warrant the memory leak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants