Question: OPA as service sidecar and data/policy acquisition #718

jasonmacdonald · 2018-04-25T21:07:21Z

I'm just starting to read up on OPA as a way to resolve Authz in a microservice architecture. So far it looks promising but it's not clear to me how you handle data and policies in a microservice (sidecar or daemon set) implementation where there is no single API endpoint to call - there would be many-many instances running across the cluster. I just watched the talk on Netflix and their set-up is very similar to what I'd like to achieve, but it seems the "aggregator" and "distributor" they use to manage the issue described above are homegrown solutions. From this I assume that OPA does not provide any mechanisms for handling this out-of-box... is that correct? If so, do you have any recommendations or third party plugins that can assist in gathering and distributing this data (policy and data) to all instances of OPA currently running across a cluster?

For additional context, we are rebuilding our platform using a microservice architecture utilizing Docker w/ Kubernetes and are trying to find a solution to our Authz problem. We are looking to solve this with minimal latency and no single-service bottlenecks (no single Authz service everything calls). OPA seems to be a fit here, except for the distribution of state data and policies to all active instances.

tsandall · 2018-04-25T21:49:51Z

In v0.8 you can configure OPA to pull bundles of policy and data from remote HTTP servers. Using this feature, you could implement a small service that serves up bundles for your agents.

If you don't want to implement this service, there are a couple options.

github.com/open-policy-agent/kube-mgmt provides a sidecar for managing OPA on top of Kubernetes. With the sidecar, you store policies in Kubernetes ConfigMaps. The sidecar watches for ConfigMaps containing policies and automatically loads them into OPA. In the future, we plan to pull some of this functionality into OPA proper.
styra.com provides a SaaS offering that helps you manage OPAs. This includes policy distribution but also data collection from sources like LDAP, audit logging, and more. The service is currently in beta. You can sign up for a self-service account on the website.

If you can share more details on your use case(s), I can follow up with more details.

Also, if you want, feel free to join slack.openpolicyagent.org. We can chat on there.

jasonmacdonald · 2018-04-25T22:16:11Z

Thanks for the quick response, @tsandall. Give me a bit to consume your post and I'll give an update tomorrow with a better example of what we are trying to accomplish (or at least the thoughts we have on how it might work). Cheers!

jasonmacdonald · 2018-04-26T14:58:34Z

Let's assume the following diagram is a simplified MSA application comprised of an API Gateway, an AuthN Service, a User Service, a Photos Service and a Kafka cluster.

MSA

Each service contains its own data store and talks to the API Gateway over gRPC. In addition, each service fires events about any data changes that have happened within that service (CRUD events).

We need to implement authorization on the Photos service so that only photos uploaded by the user can be edited. In order to do this, the photo service needs to know the policy, and have access to the user data to apply a policy. It can gather that info from the events fired by the User service and cache them.

I do realize this is a very simple example and could be done without OPA by comparing the user's ID against one stored in Photos, but imagine that this scenario was more complex and spread across many additional services with more complex policy checks. Just keeping this simple for demonstration purposes. :)

All of these services are deployed using Docker into a Kubernetes cluster. Let's assume this is a 4 node worker cluster with a pod distribution as follows.

Nodes

My initial thoughts are to have an OPAgent as a Daemonset on each node so that all pods on that node can share the OPA instance, something like this.

Daemonset

The thing I'm trying to understand is how to get the policies and necessary data out, and in sync, across all the OPA Agents.

Assuming this is a very busy application, data is constantly changing at a volume of change that creating a zip file and storing it somewhere doesn't really make sense. Calling the APi would be much better, but having to keep track of all the agents in order to call all them means writing a server that can interact with the Kube API to get all the IPs. I'd like to avoid that if possible.

Option 1

One thought is to use OPA as a library and write a Go wrapper application that listens to the events coming out of Kafka to keep each node in sync with the data, something like the following diagram...

This could work, but each Agent would have to subscribe to multiple partitions and topics, manage offsets, etc. And if new Daemon comes up on a new node, or one goes down, they would need to start reading from offset 0 in Kafka to rebuild a full current state. Compaction would need to be turned on as well to ensure all keys had a current state value.

Option 2

Another alternative we might try is to create a new service, the Policy Service, that consumes the data, manages the policies, and keeps the state in a document store. Agents can then connect to the Policy service using a streaming GRPC connection so as new data comes in it can be pushed to the agent in real-time, or when new agents come up and connect, they can get a current state. Something like this.

This seems like a better approach but still requires using OPA as a library and writing all of the code necessary to communicate with the Policy service. Again, completely doable.

My question is, given an environment like this, is there a better way? Or, is there something around OPA plugin's that might be worth exploring? The docs aren't very clear about how plugins work.

I hope this better explains the situation we are in and some of the things I'm thinking. Just hoping to get a little bit of guidance around OPA and if we are totally off target here.

Thanks for your time!

tsandall · 2018-04-26T16:37:12Z

This is great writeup. Thanks for sharing the details.

Another approach that comes to mind is to have the service (e.g., Photos) provide all of the information required by the policy as input when it executes the policy query. This avoids the need to replicate and maintain a copy of the user/auth/service state inside OPA (and all of the burden that comes along with that.) If you follow this approach, the service would:

Authenticate the request, gathering the necessary user and auth information.
Optionally, gather additional service-specific data from it's private store.
Provide all of this information as input when it executes the policy query.

With this approach, the policy could make the authorization purely from the input document. Of course, if you have additional global state that you need to make the decision, you could still replicate that to all of your OPAs and use it to inform the decision. Perhaps the bundle feature would be a good fit there.

This is the approach that I would recommend. My guess is that you can hide the details of authenticating requests and gathering user/auth data from your services. The additional service-specific step could also be hidden (to some extent) depending on how opinionated your service frameworks are.

OTOH, if you think this is too much burden then replication is the way to go. Here are a couple thoughts on replication.

What kind of consistency requirements do you have? E.g., if a user is updated and then shortly afterwards they perform an action on a Photo, what do you expect to happen? Depending on the instance they're routed to they may receive different answers since the user update has to propagate.
Deploying as a DaemonSet makes the most sense (compared to a sidecar model) as it'll reduce the overhead of replicating user/auth/service state to OPAs.
The plugin interface is quite basic. You register plugins with a factory function that gets called with plugin configuration. When the OPA starts, it'll call the factory function to create your plugin and then invoke the Start function on your plugin. In your case, your plugin could start a goroutine that consumes policy/data updates from Kafka and reloads the compiler/store in OPA. Here's an example of how to register a plugin.

Hope this helps!

jasonmacdonald · 2018-04-26T18:30:00Z

Thanks for the update, @tsandall! I'm definitely leaning towards option number 2 I mentioned. It would give us a way to centralize policies and data, and manage them, but not slow down the policy checks.

As for consistency, our entire system is being built on the premise of eventual consistency (our business is not transactionally heavy), so we can withstand a slight delay, and would design to allow for updates to happen as quickly as possible. But, it's a fair point we'll need to consider.

Are plugins external or do you need to recompile OPA with the plugin? Sorry, I'm not a Go dev so just wrapping my head around how plugins are loaded. It seems the optimal approach to just write a plugin that is able to handle the gRPC data/policy streaming into OPA without having to modify OPA or use it as lib.

tsandall · 2018-04-26T18:35:49Z

Yes, today you have to recompile OPA. Go does have some basic shared library support now so we could add support for loading plugins dynamically. If we had this in place then you would only have to rebuild the OPA docker image to include your shared lib.

jasonmacdonald · 2018-04-26T18:40:25Z

Thanks! It's not the end of the world by any means, just most of our devs are Java developers so trying to keep the learning curve low. But still, it's worth the investment since OPA really does help solve a problem I've been mulling over for months (Authz in MSA). Thanks for all your hard work and a great OS project!

Cheers!

jasonmacdonald closed this as completed Apr 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: OPA as service sidecar and data/policy acquisition #718

Question: OPA as service sidecar and data/policy acquisition #718

jasonmacdonald commented Apr 25, 2018 •

edited

tsandall commented Apr 25, 2018 •

edited

jasonmacdonald commented Apr 25, 2018 •

edited

jasonmacdonald commented Apr 26, 2018 •

edited

tsandall commented Apr 26, 2018 •

edited

jasonmacdonald commented Apr 26, 2018 •

edited

tsandall commented Apr 26, 2018

jasonmacdonald commented Apr 26, 2018

Question: OPA as service sidecar and data/policy acquisition #718

Question: OPA as service sidecar and data/policy acquisition #718

Comments

jasonmacdonald commented Apr 25, 2018 • edited

tsandall commented Apr 25, 2018 • edited

jasonmacdonald commented Apr 25, 2018 • edited

jasonmacdonald commented Apr 26, 2018 • edited

MSA

Nodes

Daemonset

Option 1

Option 2

tsandall commented Apr 26, 2018 • edited

jasonmacdonald commented Apr 26, 2018 • edited

tsandall commented Apr 26, 2018

jasonmacdonald commented Apr 26, 2018

jasonmacdonald commented Apr 25, 2018 •

edited

tsandall commented Apr 25, 2018 •

edited

jasonmacdonald commented Apr 25, 2018 •

edited

jasonmacdonald commented Apr 26, 2018 •

edited

tsandall commented Apr 26, 2018 •

edited

jasonmacdonald commented Apr 26, 2018 •

edited