Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topdown: Update cache interface with Clone method #5997

Merged

Conversation

ashutosh-narkar
Copy link
Member

@ashutosh-narkar ashutosh-narkar commented Jun 9, 2023

Concurrent evaluation of the http.send builtin for the
same object can sometimes result in the HTTP headers
map being concurrently accessed. This can happen for
example when a key already present in the inter-query
cache needs to be revalidated and multiple routines
may access the HTTP headers at the same time resulting
in a race.

This change adds a new Clone method to cache interface.
The idea is to give each routine its own copy of the cached object
which would mean it has a copy of the headers map and
thus should be able to avoid any sync issues.

Signed-off-by: Ashutosh Narkar anarkar4387@gmail.com

@srenatus
Copy link
Contributor

Thanks for tackling this problem! I'm not sure that this is the best way to approach it, though. A minor thing is that the mutex on the builtin context is quite unspecific -- what is it protecting? "Mtx" could be anything.

But more generally, I wonder if we could afford storing a copy instead of the map that's concurrently accessed? Taking a step back, I wonder what the flow of events is here that leads to this, and if we perhaps should address that. If we

  1. convert the response into a map,
  2. store that map in the cache,
  3. retrieve that map from the cache for a concurrent request
  4. read from that map in a concurrent request

Is the problem then really the concurrent map access, or is it that we write something "unfinished" in (2.) that we're still working on, which is retrieved in (3.) and read from? In other words, is the concurrent access to the header map unavoidable, or could we fix that?

@netlify
Copy link

netlify bot commented Jun 13, 2023

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit cdbb0fc
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/648a5d3cfa2d1a000817b0ff
😎 Deploy Preview https://deploy-preview-5997--openpolicyagent.netlify.app/
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@ashutosh-narkar
Copy link
Member Author

What if we cloned the value http.send gets from the cache and perform operations on this cloned value. We can sync access to the clone operation itself. I've tried this out in this change. WDYT?

if !found {
return nil, nil
}

value, cerr := requestCache.Clone(cachedValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of this over cachedValues.Clone() here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestCache.Clone(cachedValue) uses the cache's mutex.

My thinking here is that each routine will have it's own copy of the cached item. Inside the clone implementation we make a clone of the header as well. So multiple routines can perform operations on the header map w/o any race issues. What do you think of this approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestCache.Clone(cachedValue) uses the cache's mutex.

Ah, of course. I had missed that. 👍

@srenatus
Copy link
Contributor

I'm still not sure I understand the concrete interleaving that leads to this... do you? 😃

@ashutosh-narkar
Copy link
Member Author

I'm still not sure I understand the concrete interleaving that leads to this... do you? 😃

You have 2 concurrent http.send requests (likely part of 2 concurrent policy evaluations) for the same request object. The object is in the cache but expired so it needs to be revalidated. In this process one routine is reading the headers which are required to create the revalidation request while the other routine is a bit ahead in the process and preparing the response to be returned to the caller which also requires processing the headers. So both routines end up accessing the header map in the process.

Hence the proposed fix is to give each routine its own copy of the cached object which means it has a copy of the headers map and thus we should be able to avoid any sync issues. Hope this makes sense 😄

@ashutosh-narkar ashutosh-narkar changed the title WIP: Pass mutex to built-in functions via BuiltinContext topdown: Update cache interface with Clone method Jun 13, 2023
@ashutosh-narkar ashutosh-narkar marked this pull request as ready for review June 13, 2023 22:02
@srenatus
Copy link
Contributor

In this process one routine is reading the headers which are required to create the revalidation request while the other routine is a bit ahead in the process and preparing the response to be returned to the caller which also requires processing the headers.

I think what I don't understand is the write part of it.

Is it because we're sending the date from the RESPONSE as part of the subsequent REQUEST? So the read is the retrieval of the date header from the response headers? And the write is the storing of the response headers after a successful request?

@ashutosh-narkar
Copy link
Member Author

So the read is the retrieval of the date header from the response headers?

Yes

I think what I don't understand is the write part of it.

There are some places where we could have potential conflicts. There could be a read initiated here and a write here. This one is not what is reported from the log attached here.

Per the above log, there is a write here. As you indicated, here, the map creation itself could be the write operation.

Concurrent evaluation of the http.send builtin for the
same object can sometimes result in the HTTP headers
map being concurrently accessed. This can happen for
example when a key already present in the inter-query
cache needs to be revalidated and multiple routines
may access the HTTP headers at the same time resulting
in a race.

This change adds a new Clone method to cache interface.
The idea is to give each routine its own copy of the cached object
which would mean it has a copy of the headers map and
thus should be able to avoid any sync issues.

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>
@ashutosh-narkar ashutosh-narkar merged commit 2d6ad9d into open-policy-agent:main Jun 15, 2023
27 checks passed
@porwalameet
Copy link

porwalameet commented Jun 26, 2023

Hi @ashutosh-narkar , can we have an OPA release with this fix, so we can test.

@anderseknert
Copy link
Member

@porwalameet you can download the artifacts from any build. This is currently the latest build on main: https://github.com/open-policy-agent/opa/actions/runs/5378997883#artifacts

If you'd like to try it using a container, you can use the -edge suffix to the image name.

@porwalameet
Copy link

porwalameet commented Jun 26, 2023

Thanks @anderseknert for response. We have our internal pipeline which pulls docker images and tag accordingly and push to internal repo. We want to retain our internal pipeline as is for future upgrades, So, requested to have an official release build.

@anderseknert
Copy link
Member

Official OPA releases are published following a monthly cadence. Given that it's almost July, I believe a new stable release should be published in about a week or so. But since the purpose was for you to test this, I would suggest you do that using an edge release first.

@porwalameet
Copy link

Thanks for heads-up @anderseknert .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants