Proposal: Policy Extension Point #18647

dave-tucker · 2015-12-14T20:48:31Z

I've been working with @jainvipin on the following proposal for the last week or so...

Introduction

In an enterprise environment, strict control over how the infrastructure is running is required is required to ensure compliance with either business policies or with regulations such as those imposed by HIPAA, SOX or PCI etc…

While the work on AuthN #13697 and AuthZ #14674 will allow us to provide Role-Based Access Control (RBAC) this deals with who can execute commands and what can they execute. Policy is complementary to this as it is focussed on how the container is being run.

A policy consists of a selector and a one or more constraints
A selector is a means of identifying containers that are to be targeted by this policy
Constraints are a set of conditions that are to be enforced by the policy

Neither of these concepts are native to the Docker Engine and it is expected that a Policy Engine is responsible for implementing them.

The following would be examples of policy:

“Containers that have the label ‘environment=production’ may only be deployed on networks named ‘prod-*’”
“Containers that have the label ‘environment=production’ are to receive preferred access to memory, cpu and network”
“Containers that are using the ‘postgres’ image are to be started with at least 2GB RAM

A Policy Engine is responsible for determining:

What policies are defined
Which policies should be applied to a given container
In cases where more than one policy applies, it should resolve any conflicts It should determine:
- If a container can be run as intended
- If a container will not be permitted to be run
- If a policy changes, which containers are affected

In order for these decisions to be made, I'm proposing that we add a Policy Extension Point to the Docker Engine.

Requirements

In order for this decision to be made, the policy engine should be passed the ContainerConfig and HostConfig. It may use any of the passed parameters like 'labels', 'network', 'exposed ports', 'domain name', 'ENV' to specify a language to implement a selector.
In Future, we may also pass information about the Authenticated User and Roles applied once AuthN and AuthZ work has concluded

The policy engine should also have the opportunity to make changes to the configuration.
As there are cases where the Docker Engine is not responsible for enforcing the result of policy, for example, when the implementation is handled by a Docker Plugin, it is recommended that the result of the policy engine’s computations should be written to the HostConfig in such a way that it can be interpreted by the necessary drivers.

The fundamental reason for being to override behaviour is that constraints are specified by someone who has override authority above whoever is doing 'docker run'.

In general, there are multiple roles in the infra:

Developer
SRE or Ops person in a DevOps team
Infra Owner (this person is responsible for infra to be up and running either bare-metal servers or VMs, physical/virtual network, physical/virtual storage)

The override rules are specified by the SRE/Ops person who is responsible for security, and well maintained application infrastructure. The infrastructure operator who wants to offer a set of profiles may also create policies to be consumed by the application SRE/Ops person to use to run those applications.

It’s the responsibility of the user to choose a compatible Policy Engine, Network Driver, IP Addressing Driver, Volume Driver combination. The communication between these components is expected to take place out-of-band and is outside the scope of this proposal.

Proposed Implementation

Changes to HostConfig

A new field should be added to HostConfig in support of this change

type HostConfig struct
    Policy map[string]string

Changes to existing plugin protocols

The Policy field from HostConfig should also be passed as part of the Plugin protocol for Networks, IPAM and Volumes.

Policy Extension Point

The implementation centers around a new Policy Engine extension point.
The suggested protocol is as follows:

PolicyEngine.CreateContainer

This method should be called before the container is started

Request:

{
    “ContainerConfig”: {},
    “HostConfig”: {}
}

The request contains the ContainerConfig and HostConfig so the policy engine can derive which policies should apply in this instance. In future, this could be expanded to include additional context, for example, the authenticated user that is making the request

Response:

{
    “Err”: “”,
    “Action”: “permit|deny”,
    “Policy”: {
        “com.example.network_policy” : “aa12345b”
    },
    “Reason”: “”
}

The response contains facility to return a human readable error message in the case of a problem in the container configuration.

The Action field determines whether a container can be run:

permit - run the container without modification
deny - do not run the container

In every case other than permit, the Reason field should be completed to provide feedback to the user.

In all cases other than deny, the Map HostConfig.Policy should contain results from the policy engine that contains information about any policies that were applied.
The keys in the Policy field should conform to the same reverse DNS notation used by labels.

UX Considerations

Select a policy engine

$ docker daemon \
    # Policy Engine Plugin to use
    --policy-engine=mypolicyengine

Run in-policy container

$ docker run -itd --net=prod-foo -l environment=production nginx
959a66911d2ea582b1f9b07cabc6010ca1eb8114d10201041887b783939b241

Run out-of-policy container

$ docker run -itd --net=stage-foo -l environment=production nginx
[Error] Policy Violation: Containers labelled production must be deployed on a network called prod-*

Swarm Considerations

This proposal should work seamlessly with Swarm today, provided that all Engines within a swarm are configured with identical --policy-engine flags

It is clearly desirable for Swarm to be able to apply policy before the request is dispatched to a Docker Engine. Our recommendation would be to extend Swarm and to allow it to do this using the same plugin protocol… This way, policy could be enforced at both the cluster and the host level

The text was updated successfully, but these errors were encountered:

dave-tucker · 2015-12-14T20:53:29Z

/cc @lxpollitt - I know we had some discussions on policy a while back so this may be of interest

dave-tucker · 2015-12-14T21:47:12Z

Clarification Points:

How is this different from AuthZ

1) Policy Engines should (eventually) have the ability to change the configuration of a container

It's possible to "guess" the user intent from the configuration and change the config appropriately - as opposed to failing and asking them to re-run with the right configuration.

This was omitted from the original proposal as to do so we'd need to put containers in a state that requires user intervention, effectively making the remote API asynchronous.

In this case, it would be advantageous to give a user a Yes/No prompt to see if they agree with the policy changes, or you may want to defer that decision to somebody with override permissions.

Additionally, changing the configuration of a running container is also necessary (perhaps via #15078) if the policy changes, which brings us to point 2.

2) Policies are dynamic

Where AuthZ is reacting to specific user input (e.g can I run this?), a Policy Engine is proactive.
If a policy changes, a policy engine must ensure that all the running containers are still in compliance. If not, it needs to handle this - the details of which are an implementation detail of the policy engine itself.

3) There are parts of the container config that are outside the control of the Docker Engine

We pass the a reference to the computed policies to drivers (Network/IPAM/Volume etc...) so they can apply the policy - assuming they know how to contact the policy engine.

E.g a network driver could write different QoS settings based on the policies applied

Why not just use Labels?

Labels can appear from within a Dockerfile using LABEL or at build/runtime. If we were to apply special meaning to labels (e.g this container should have this specific behaviour) it could have very surprising results.

E.g If a postgres image has LABEL policy=highperformance in it's Dockerfile, it could be granted more resources that the user had intended.

Labels are also not being passed to Network drivers and as such they would not be in a position to enforce a policy.

lxpollitt · 2015-12-17T21:20:49Z

Feels like this is heading in a good and sensible direction, but lots of questions we would like to ask to understand more fully. I've just been discussing with @tomdee. He'll comment shortly in a bit more depth.

tomdee · 2015-12-17T21:43:42Z

I've taken a look over this and tried to understand it all. It seems like a useful feature but there are some areas that I think would be really useful to spec out in a little more detail.

Policy - It looks like the policy language is provided out of band to the policy engine and you're not specifying what it needs to look like.
- A selector can operator on any existing data in the containerconfig (so a user could pass in network policy using a label?)
- A constraint may guide an allow/deny decision but it can also define "policy" key/values in the returned hostconfig (which MUST then be understood by drivers to enforce that policy)

Based on what I've understood (as outlined above), I'm concerned about this

It’s the responsibility of the user to choose a compatible Policy Engine, Network Driver, IP Addressing >Driver, Volume Driver combination. The communication between these components is expected to >take place out-of-band and is outside the scope of this proposal.

The policy engine is going to pass data to plugins, that the plugins need to act on but you're not defining what this data is. You're simply saying that the user must select ones that work together which kind of implies that they all need to be written by the same person. This is fine for Docker, since you'll provide a complete suite of batteries included plugins, but isn't very good for people that only have a single type of plugin (e.g. a network plugin). Does it mean that they need to write a compatible policy engine? Will that policy engine have to enforce unrelated concerns too (e.g. things relevant to volumes, even though it's written by a network driver writer) or will policy engines be composable in some way?

If the default Docker policy engine (I'm presuming there will be one?) defines a policy definition language that allows constraints to include arbitrary key/values that can be consumed by the drivers then I think that alleviates most of my concerns. e.g. an infrastruture provider could specify a constraint that ensures that all users with certain level of service get an arbitrary key/value that will be passed down to the network driver for it to consume.

dave-tucker · 2015-12-18T15:18:36Z

@tomdee

so a user could pass in network policy using a label?

Preferably not. Rather, the engine should derive the correct network policy from the combination of abstract labels e.g com.example.environment=prod or com.example.application=foo. Labels shouldn't be used to pass the name of a desired policy as I've tried to articulate in "Why not just use Labels"

If the default Docker policy engine (I'm presuming there will be one?)

Not at this time. As a follow up to this, I'd welcome a discussion on creating a standard set of key/value pairs that drivers can expect to act upon + an interface to allow them to talk to the policy engine. Also, a discussion on creating a policy description language (something like Docker Compose) and a policy engine. As those discussions are likely to take much longer IMO, I thought it would be prudent to lead with the plumbing first to unblock people that want to provide a full stack of plugins with support for policy.

calavera · 2015-12-21T17:49:16Z

can we have a clear explanation about why this must be implemented at the engine level and not by a something else on top of the engine? At first glance, this looks very specific to @jainvipin's use case.

As @tomdee points out, other plugins providers are not going to be able to use this unless we provide a set of standard and understandable rules that they can use. As you mention, we're not going to provide this for now, which makes this feature less useful.

I'm personally not very sold in this proposal, as it seems to be very focused on solving a very specific use case. I'm also very open to reconsider my position of we can agree that this is absolutely necessary inside the engine and we can define something that everyone can take advantage of.

jainvipin · 2016-01-05T21:08:24Z

@calavera Thanks for sharing your thoughts on this. Please allow me to take advantage of "very open to reconsider my position" part :-)

Three parts to the response, first discuss the use cases and if they are generic, second one on why in a docker-engine.

Some very simple use cases that I believe are generic:

Security Zones for networked containers: say two services (tiers) within a tier need a specific security policy between them (only allow port-80, or port-443). This level of security policy is critical to allow for security (and compliance, like PCI).
Prioritize network/storage traffic: say two containers belong to 'prod', 'test' and test environments and are indicated so using labels, the policies for 'prod' vs 'test' need to be different to ensure one doesn't affect other. This may include network prioritization, network resource allocation, etc.

I don't want to burden with more and elaborate use cases here, but there are many that will require drivers to know the network/storage policy for the containers. I will be happy to talk about them in a hangout session if that works best.

Why does it belong in docker-engine:

Drivers (default drivers or remote drivers) need to know the policy because they become the point of implementation; however drivers can not get this information out of band, because the 'container <-> policy' association can not be delivered to the drivers using the current plugin APIs that drivers (volume/network) provide
Providing this in docker can also help the end user a consistent way to find out what policies are in place and are enacted upon

Compatibility with disjoint set of plugins:

We could start off with allowing the above use cases done with compatible set of plugins/drivers,
Work on defining the policy language that for simple use cases that is native to docker that can provide consistent Policy-Ux to the end user, while leaving exotic (one off use cases) to plugins

Even if you believe these are specific use cases, for that reason the 'default policy plugin' need not be there, but if (remote or local) drivers can get the policy attributes via policy plugin it is a big enabler for the use cases in Docker eco-system.

I will also let @tomdee, @lxpollitt, @bboreham and others comment on their perspective...

dave-tucker · 2016-01-06T10:08:29Z

Update: This proposal was discussed at the last maintainers meeting and we discussed that Policy (from a plumbing perspective) could be a AuthZ plugin, or an extension of the AuthZ plugin protocol. I'm looking in to the details of that today.

Secondly, it was agreed that adding Policy isn't really useful if we don't define a standard - e.g some high level policy objects that can be supported by all drivers. I suggested we start with Network Policy as there are enough interested parties to start fleshing something out.

bboreham · 2016-01-06T10:24:12Z

@jainvipin it seems you make a good case for docker-engine to know the policy, and to supply it as needed to plugins, end-users, etc., but your examples give no rationale for docker-engine to implement policy.

More broadly, I would like clarification around the proposed mechanism - the example of "label ‘environment=production’" seems like it makes sense for container labels but not for image labels - you'd want to run an image pre-production for test purposes, then bless the same image as "production" once it passed tests.

"receive preferred access to memory" does not seem to be compatible with the "permit or deny" response - how is this intended to work?

Are there some specific use-cases that drive the requirement?

Note for clarity: I work for Weaveworks. We do get people asking how they can prevent things happening, generally developers connecting random things into production without Ops knowing about it.

jainvipin · 2016-01-06T18:55:48Z

@bboreham - some follow-ups...

@jainvipin it seems you make a good case for docker-engine to know the policy, and to supply it as needed to plugins, end-users, etc., but your examples give no rationale for docker-engine to implement policy.

your observation is correct about my examples rationalizing docker engine to know vs implement the policy. I was thinking three things being considered implemented in stages that cover know, define, then implement aspect of doing this in the docker engine. The rationale to define the policy in docker would ensure that Ux is consistent and all compliant drivers can provide with one way or the other for doing this. On other hand implementing this in docker engine would be rationalized as a batteries-included model.

More broadly, I would like clarification around the proposed mechanism - the example of "label ‘environment=production’" seems like it makes sense for container labels but not for image labels - you'd want to run an image pre-production for test purposes, then bless the same image as "production" once it passed tests.

Some of the labels on which policy acts are really run-time labels, not image labels. However I can have an image label that describes application as 'background' which really could hint the drivers to apply different policy than the rest of applications.

"receive preferred access to memory" does not seem to be compatible with the "permit or deny" response - how is this intended to work?

Are there some specific use-cases that drive the requirement?

Permit with a policy, is application classification for infrastructure resource usage, and thus I see this to be a part of permit action fitting well. This is basic form of prioritization, for example an application can be automatically be put in class of applications that can not be evicted (scheduler policy), however in terms of network it could simply mean providing it a specific Mbps and CoS values for the traffic wihin the network, which in the network gets used for buffer allocation and scheduling.

Disclaimer: I work for Cisco

dave-tucker · 2016-01-11T10:26:11Z

Having done some research, it would be possible to implement this as an AuthZ plugin.

Docs:
https://github.com/docker/docker/blob/master/docs/extend/authorization.md

The Plugin Helper
https://github.com/docker/go-plugins-helpers/blob/master/authz/api.go

In order to do so, we would still need:

The addition of Policy to HostConfig
The ability for AuthZ plugins to change a request (this could be limited to Policy only)
For Policy to be passed to drivers

For this to progress, I suggest that we:

Implement a simple AuthZ plugin and make the necessary changes to docker for this to work (action @jainvipin)
Agree on contents of the Policy struct - let's keep this focussed on Networking today
Come back to docker/docker with a PR

With that in mind, I'm going to close this issue with a view to moving the discussion elsewhere (action @jainvipin to post a link)

bfirsh added the kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny label Dec 15, 2015

thaJeztah mentioned this issue Dec 31, 2015

Proposal: daemon defaults for runtime constraints #19023

Closed

dave-tucker closed this as completed Jan 11, 2016

mapuri mentioned this issue Jan 16, 2016

enhance authz interface for policies contiv-experimental/docker#1

Open

thaJeztah mentioned this issue Feb 24, 2023

Container ulimits are inherited from containerd by default #45060

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Policy Extension Point #18647

Proposal: Policy Extension Point #18647

dave-tucker commented Dec 14, 2015

dave-tucker commented Dec 14, 2015

dave-tucker commented Dec 14, 2015

lxpollitt commented Dec 17, 2015

tomdee commented Dec 17, 2015

dave-tucker commented Dec 18, 2015

calavera commented Dec 21, 2015

jainvipin commented Jan 5, 2016

dave-tucker commented Jan 6, 2016

bboreham commented Jan 6, 2016

jainvipin commented Jan 6, 2016

dave-tucker commented Jan 11, 2016

Proposal: Policy Extension Point #18647

Proposal: Policy Extension Point #18647

Comments

dave-tucker commented Dec 14, 2015

Introduction

Requirements

Proposed Implementation

Changes to HostConfig

Changes to existing plugin protocols

Policy Extension Point

PolicyEngine.CreateContainer

UX Considerations

Select a policy engine

Run in-policy container

Run out-of-policy container

Swarm Considerations

dave-tucker commented Dec 14, 2015

dave-tucker commented Dec 14, 2015

How is this different from AuthZ

Why not just use Labels?

lxpollitt commented Dec 17, 2015

tomdee commented Dec 17, 2015

dave-tucker commented Dec 18, 2015

calavera commented Dec 21, 2015

jainvipin commented Jan 5, 2016

Some very simple use cases that I believe are generic:

Why does it belong in docker-engine:

Compatibility with disjoint set of plugins:

dave-tucker commented Jan 6, 2016

bboreham commented Jan 6, 2016

jainvipin commented Jan 6, 2016

dave-tucker commented Jan 11, 2016