Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix interquery cache memory leak #5488

Merged
merged 1 commit into from
Dec 21, 2022

Conversation

asleire
Copy link
Contributor

@asleire asleire commented Dec 15, 2022

I thought we could fix the leak in #5381 by keeping track of the elements in c.l and remove them from the list whenever the corresponding cache values are removed. Since c.l is already a doubly linked list it's possible to add this with very little
changes and with O(1) complexity

I can add some tests if this looks otherwise OK

@netlify
Copy link

netlify bot commented Dec 15, 2022

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 7596b60277da222601f9db101fd4309d3070c0e2
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/63a17e5aa2ed90000a5f62d3
😎 Deploy Preview https://deploy-preview-5488--openpolicyagent.netlify.app/docs/edge/configuration
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense.I guess we won't need to do this https://sourcegraph.com/github.com/open-policy-agent/opa@7c052053899090af0d71007cfbd8ef1745289647/-/blob/topdown/cache/cache.go?L136 if we make the change in unsafeDelete.

@philipaconrad
Copy link
Contributor

@asleire This looks great so far! 😄

I'd say it's probably worth making some tests for this, and definitely worth doing some kind of load testing / benchmarking on your end to see if this addresses #5381 at all.

@asleire
Copy link
Contributor Author

asleire commented Dec 19, 2022

Thanks @philipaconrad, I've just done some simple tests to verify my hypothesis.

I used a similar procedure to the one I wrote in "Steps to reproduce in this issue: #5359, using the following OPA policy:

package loadtest

req := http.send({
	"url": "http://localhost:3000",
	"headers": {
		"key": input.key,
		"authorization": "Bearer <800b long JWT here>",
	},
	"method": "GET",
	"raise_error": false,
	"force_cache": true,
	"force_cache_duration_seconds": 1,
})

result := "OK" {
	not req.error
} else := "ERROR" {
	true
}

I used a locust script to load test the policy using up to 1000 unique users (and input.key values). I used a hardcoded JWT instead of renewing the JWT every 5 minutes as I would normally do. I used the not very precise method of observing the memory usage of the OPA process in Task Manager and writing down my observations.

I tested the following scenarios:

  1. Using the current main-branch OPA code, with a 1MB cache limit
  2. Using the current main-branch OPA code, with a 10MB cache limit
  3. Using my fix, with a 1MB cache limit
  4. Using my fix, with a 10MB cache limit

In scenario 1, OPA's memory usage would rise with the number of users until the c.usage value hit 1MB, after which the memory usage would bounce consistently between 100-140MB.

In scenario 2, OPA's memory usage would rise continuously. The c.usage value peaked at 2.2MB, well below the cache limit, meaning elements were never removed from c.l. I stopped the test after a few minutes when the memory usage had reached 500MB. If the test had rotated JWTs as it would in a real scenario then the cache limit would eventually be reached and the memory should stop growing

In scenario 3, OPA behaved just like scenario 1.

In scenario 4, OPA behaved just like scenario 1, except c.usage peaked at 2.2MB.

I'm pretty confident this PR will fix the leak described in #5381 😄

I will follow up with some unit tests

@asleire asleire changed the title WIP: Fix interquery cache memory leak Fix interquery cache memory leak Dec 19, 2022
@asleire
Copy link
Contributor Author

asleire commented Dec 19, 2022

I updated the insert/update/delete tests to include check to ensure that the number of elements in c.l equals the number of elements in c.items. This should be sufficient to guarantee that we do not have a leak, since the number of elements in c.items is limited by c.usage

Let me know if something else is required :)

Removed WIP prefix from title. Pinging @ashutosh-narkar for review 🙂

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asleire thanks for the fix. Changes lgtm.

Signed-off-by: Aleksander <Alekken@live.no>
@ashutosh-narkar ashutosh-narkar merged commit 600899a into open-policy-agent:main Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants