Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kata agent runtime configuration #1837

Closed
sameo opened this issue May 12, 2021 · 29 comments · Fixed by #2517
Closed

kata agent runtime configuration #1837

sameo opened this issue May 12, 2021 · 29 comments · Fixed by #2517
Assignees
Labels
area/confidential-containers Issues related to confidential containers (see also CCv0 branch) feature New functionality

Comments

@sameo
Copy link
Contributor

sameo commented May 12, 2021

Problem Statement

The kata-agent communicates with the host through a gRPC API, via a vsock channel. The host typically sends commands and request to the kata-agent, though this RPC mechanism.

In a confidential computing context, where the host software stack is out of the guest TCB, parts of that host <-> guest interface can not be supported without breaking the expected confidential computing threat model.

Proposal

We want to propose for the kata-agent to optionally use a configuration file that would allow for a more dynamically defined agent behaviour. The configuration file would be passed to the kata-agent as a command line option: kata-agent -c agent-configuration.toml

We expect this file to be added to the guest primary root filesystem (initramfs or virtio-block device). As a consequence, confidential computing technologies would include that configuration file as part of their verified TCB measurements. In other words, an attested guest could only start its kata-agent through a verified configuration file.

Goals

Although a kata-agent runtime configuration file could be used to extensively customize the agent's behaviour, the primary and initial goal of that proposal is to use it as a runtime definition of the supported agent gRPC API, in order to meet and build the confidential computing threat model.

In other words the goals of that proposal are:

  • Allow for the kata-agent binary to use an optional configuration file
  • Define an API restriction section for this configuration file
  • Add the existing kata-agent command line parameters to the configuration file. The command line parameters would overwrite the configuration file ones.

Configuration file format

For consistency sake with the other kata-containers components, we propose for the agent configuration file to use the TOML format.

API section

The kata-agent configuration file would contain an [api] section. The section would then contain a blocked_endpoints entry describing the blocked gRPC endpoints as a string list.

Each entry in the blocked_endpoints list should map exactly to one of the agent gRPC service method defined in the AgentService proto file.

The agent should fail to start if one or more entry in the blocked_endpoints does not map to an AgentService method.

Sample

# Agent runtime settings.
# A configured values will be overwritten when passing it through its corresponding command line interface option.
[agent]
debug_console = false
debug_console_vport = 0
log_level = "default"
log_vport = 0
hotplug_timeout = 3
container_pipe_size = 0
server_addr = "vsock://-1:1024"
unified_cgroup_hierarchy = false

# Agent gRPC API settings.
[api]
# Blocked API endpoints.
# A list of blocked gRPC endpoints. For example:
# blocked_endpoints = ["ExecProcess", "ReseedRandomDev"]
#
# Any client request to a blocked enpoint will return a "UNIMPLEMENTED" gRPC status code. 
blocked_endpoints = []
@sameo sameo added feature New functionality needs-review Needs to be assessed by the team. labels May 12, 2021
@sameo sameo added this to To do in Confidential containers via automation May 12, 2021
@sameo sameo changed the title Kata agent runtime configuration kata agent runtime configuration May 12, 2021
@bergwolf
Copy link
Member

@sameo How do you plan to handle container list/creation/deletion APIs? They cannot be blocked otherwise confidential containers cannot be created. But if they are not blocked, the host would be able to manipulate the guest at will.

@sameo
Copy link
Contributor Author

sameo commented May 13, 2021

@fitzthum
Copy link
Contributor

It seems like the easiest way to bring the config file into the TCB would be to bake it into the initrd, which most platforms seem to measure or encrypt. This might increase the burden on the guest-owner, however, who would have to keep track of a new initrd measurement for each variation of the configuration file. I'm not sure how much variation there would be in practice, but I can imagine a guest owner wanting to use the same initrd for most deployments.

@jimcadden
Copy link
Contributor

jimcadden commented May 13, 2021

@fitzthum seems an easier way would be to pass it in via kernel parameters. Something like agent.config.api={'exec':disabled, ... } This way it could be specified per-instance and likewise attested/validated along with the initrd.

@fitzthum
Copy link
Contributor

@jimcadden yeah good point the kernel command line should be a measured interface as well and we will probably be passing some important config information there no matter what. I am a bit wary of passing the entire config through the kernel command line in part because I think there is a length limitation.

@magowan
Copy link

magowan commented May 13, 2021

I am onboard with the idea.
Small typo in the sample :-)
blocked_enpoints = [] -> blocked_endpoints = []

@magowan
Copy link

magowan commented May 13, 2021

Capturing our discussion that the delivery of the configuration file might potentially be a "plugin" point. In the sense that we start with flexibility on how it gets there being a separate consideration to what might be in it and resulting behaviours.

I am however fully on board with concern that the creation/provision of such a file must be part of the flow before any container actions are taken, essentially part of the image, provided at sandbox creation or provided as part of initial boot (attestation etc).

@magowan
Copy link

magowan commented May 13, 2021

Are there permutations of blocked/allowed apis that make no sense. If you can pause a container but not resume for example?
It might be a large challenge to consider all permutations and how we check, possibly consider grouping them to simple problem space?

@magowan
Copy link

magowan commented May 13, 2021

Should we use allowed_endpoints rather than blocked_endpoints? It might be more secure but of course the tradeoff might be a little poorer user experience when creating such a file.
Perhaps a comment or link to api endpoints that can be specified helps.
And we talked about providing a sample.

Leads to backwards compatibility question perhaps? How do we decide to enable or disable the checks or do we just use provide a config file which enables everything for backwards compatibility?

@sameo
Copy link
Contributor Author

sameo commented May 20, 2021

I am onboard with the idea.
Small typo in the sample :-)
blocked_enpoints = [] -> blocked_endpoints = []

Fixed, thanks.

@sameo
Copy link
Contributor Author

sameo commented May 20, 2021

@sameo How do you plan to handle container list/creation/deletion APIs? They cannot be blocked otherwise confidential containers cannot be created. But if they are not blocked, the host would be able to manipulate the guest at will.

@bergwolf Sorry for the late reply
So first of all, this issue is not about defining which APIs should be blocked/allowed for confidential computing, but rather about providing a way to define those restrictions.

Now to answer your questions:

  • list: I think this is harmless and the host has many ways to know the list of containers running on a pod already. There are no reasons to block this.
  • create: We obviously can't block that one and the idea for dynamically adding new containers to a pod is for the agent to get decryption or verification credentials from a relying party, after being attested. In other words, the host can request for a new container to be added, but the agent (Which is part of the attested stack) would be able to a) verify that the remote attestation and key brokering services allow for that container image to run and b) be responsible for pulling and decrypting the container image.
  • delete: The host can already delete the whole pod if it wants to. Confidential computing is not about availability but rather about preventing the host from seeing or tampering with the tenant's data. So I think it's ok to allow for the host to delete containers, but with that configuration, it becomes a tenant's decision.

@sameo sameo self-assigned this May 20, 2021
@sameo
Copy link
Contributor Author

sameo commented May 20, 2021

Should we use allowed_endpoints rather than blocked_endpoints? It might be more secure but of course the tradeoff might be a little poorer user experience when creating such a file.

I thought about blocked endpoints because I believe this list answers the "What must we not support for confidential computing?" question.
I agree an allowed_endpoints may be safer, as in particular it would prevent new endpoints to accidentally be supported with older config files. The cost would be on the user experience side, but otoh users being explicit about the endpoints they want to support may not be the ones that care the most about the smoothest user experience. As long as we have a very easy way to specify all endpoints (like e.g. by not providing the api section at all or by setting allowed_endpoints = [*], I'm fine switching to allowed.

Perhaps a comment or link to api endpoints that can be specified helps.
And we talked about providing a sample.

I'm opening a separate issue to discuss that (edit: This is issue #1891), as I think it's a separate discussion from allowing the agent API to be configured.

Leads to backwards compatibility question perhaps? How do we decide to enable or disable the checks or do we just use provide a config file which enables everything for backwards compatibility?

I think the configuration should be optional. Without a config file the agent would run with a full API. With a config file, the API restrictions specified in that file would apply.

@bpradipt
Copy link
Contributor

I think the configuration should be optional. Without a config file the agent would run with a full API. With a config file, the API restrictions specified in that file would apply.

This approach of allowing full API access when agent runs without the config file seems reasonable. Ensures the current user experience doesn't break with confidential computing changes.

@ariel-adam ariel-adam added area/confidential-containers Issues related to confidential containers (see also CCv0 branch) and removed needs-review Needs to be assessed by the team. labels May 25, 2021
@c3d
Copy link
Member

c3d commented May 26, 2021

@bergwolf @sameo

How do you plan to handle container list/creation/deletion APIs? They cannot be blocked otherwise confidential containers cannot be created. But if they are not blocked, the host would be able to manipulate the guest at will.

This aspect of things is discussed to a large extent in #1834, notably with respect to who owns this or that API.

In the case of the create API, there is an additional complication. Today, this is where the stdio channels are setup, but in a split host/tenant model, this needs to be revisited. First, the host cannot access the I/O channels. Second, those might not be created before attestation, which happens after the pod/VM has started.

@c3d
Copy link
Member

c3d commented May 26, 2021

Add the existing kata-agent command line parameters to the configuration file. The command line parameters would overwrite the configuration file ones.

@sameo I don't think this is correct. In a confidential containers context, the tenant would supply the configuration file and rely on the measurements of the boot image to know that this is applied. If you allow agent command line options to override that, then the guarantee seems to be lost. Unless we find a way to add the agent command line to the measurements?

@c3d
Copy link
Member

c3d commented May 26, 2021

From configuration file:

server_addr = "vsock://-1:1024"

In the context of a split host/tenant trust realms like #1834 (I'm now forcing myself to no longer say "trust domain" because in the case of TDX, that's defined as the VM and not whatever else is in the same security "domain"), then the agent would presumably no longer get that over vsock.

@sameo Would you agree that this could later be extended to accept https://my-realm-secure-API-server? How complicated would it be in the agent to accept commands from such other source?

@jiangliu
Copy link
Contributor

We have two ways to disable some API services/code:

  1. statically disabled at compilation time
  2. dynamically disabled at runtime
    And I think these two methods should be used together.
    Another question, is an allowed/denied list of API services enough? Or eventually we need something like seccomp_bpf to also filter request parameters?

@jiangliu
Copy link
Contributor

@sameo How do you plan to handle container list/creation/deletion APIs? They cannot be blocked otherwise confidential containers cannot be created. But if they are not blocked, the host would be able to manipulate the guest at will.

We can't disable all API services for confidential computing, otherwise we never need Kata for confidential computing. The only solution will be running kubelet/containerd/runc within a confidential VM. So the answer may be audibility. We should audit/log every sensitive API service request.

@sameo
Copy link
Contributor Author

sameo commented May 26, 2021

@sameo How do you plan to handle container list/creation/deletion APIs? They cannot be blocked otherwise confidential containers cannot be created. But if they are not blocked, the host would be able to manipulate the guest at will.

We can't disable all API services for confidential computing, otherwise we never need Kata for confidential computing.

I fully agree, and the plan is certainly not to disable all API for CC. There is no need for it, and deciding which APIs to disable is also a business decision.

The only solution will be running kubelet/containerd/runc within a confidential VM.

That already exists and it's a fairly different use case.

So the answer may be audibility. We should audit/log every sensitive API service request.

Are you saying we should keep all APIs open and audit log the sensitive ones from the agent?

@sameo
Copy link
Contributor Author

sameo commented May 26, 2021

We have two ways to disable some API services/code:

1. statically disabled at compilation time

2. dynamically disabled at runtime
   And I think these two methods should be used together.

We discussed about it in our weekly use case meeting, and agreed having it configurable at runtime is more flexible.
One could argue that build time disabling endpoints is safer, but aside from the security aspects, do you see any advantage of doing it at build time over runtime?

   Another question, is an allowed/denied list of API services enough? Or eventually we need something like seccomp_bpf to also filter request parameters?

How would you map gRPC request parameter to seccomp rules?

@fitzthum
Copy link
Contributor

Add the existing kata-agent command line parameters to the configuration file. The command line parameters would overwrite the configuration file ones.

@sameo I don't think this is correct. In a confidential containers context, the tenant would supply the configuration file and rely on the measurements of the boot image to know that this is applied. If you allow agent command line options to override that, then the guarantee seems to be lost. Unless we find a way to add the agent command line to the measurements?

@c3d We have some patches for OVMF and maybe QEMU that extend the SEV launch measurement to the kernel command line, kernel, and initrd. See this discussion

sameo added a commit to sameo/kata-containers that referenced this issue Sep 13, 2021
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.

As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Sep 13, 2021
From the endpoints string described through the configuration file, we
build a hash set of blocked enpoints.
Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is, then we return ttrcp::UNIMPLEMENTED.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Sep 14, 2021
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.

As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Sep 14, 2021
From the endpoints string described through the configuration file, we
build a hash set of blocked enpoints.
Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is, then we return ttrcp::UNIMPLEMENTED.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Sep 16, 2021
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.

As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Sep 16, 2021
From the endpoints string described through the configuration file, we
build a hash set of blocked enpoints.
Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is, then we return ttrcp::UNIMPLEMENTED.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
@sameo sameo moved this from To do to In progress in Issue backlog Sep 16, 2021
@dcmiddle
Copy link
Contributor

Generally an allow-list is more secure than a deny-list. To offset the usability challenge an example file could be provided with the minimum required APIs allowed, and perhaps some guidance on what other APIs should be enabled if the user experiences a failure.

This issue and an associated PR are pretty far along, though. An alternative would be to stay with the deny-list, but provide an example where all but the minimum APIs are denied (blocked). That approach is less future-proof though, i.e., if a new API is added users would need to know to go back and update their agent configurations that are in production.

I'm not clear where the agent-configuration.toml will live. As noted above it needs to be part of the measurement.

sameo added a commit to sameo/kata-containers that referenced this issue Oct 1, 2021
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.

As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Oct 1, 2021
From the endpoints string described through the configuration file, we
build a hash set of allowed enpoints. If a configuration files does not
include an endpoints section, we assume all endpoints are allowed.

Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Oct 4, 2021
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.

As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Oct 4, 2021
From the endpoints string described through the configuration file, we
build a hash set of allowed enpoints. If a configuration files does not
include an endpoints section, we assume all endpoints are allowed.

Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Oct 4, 2021
From the endpoints string described through the configuration file, we
build a hash set of allowed enpoints. If a configuration files does not
include an endpoints section, we assume all endpoints are not allowed.

Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Oct 4, 2021
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.

As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Oct 4, 2021
From the endpoints string described through the configuration file, we
build a hash set of allowed enpoints. If a configuration files does not
include an endpoints section, we assume all endpoints are not allowed.
If there is no configuration file, then all endpoints are allowed.

Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Oct 7, 2021
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.

As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Oct 7, 2021
From the endpoints string described through the configuration file, we
build a hash set of allowed enpoints. If a configuration files does not
include an endpoints section, we assume all endpoints are not allowed.
If there is no configuration file, then all endpoints are allowed.

Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
sameo added a commit to sameo/kata-containers that referenced this issue Oct 7, 2021
From the endpoints string described through the configuration file, we
build a hash set of allowed enpoints. If a configuration files does not
include an endpoints section, we assume all endpoints are not allowed.
If there is no configuration file, then all endpoints are allowed.

Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED.

Fixes: kata-containers#1837

Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Issue backlog automation moved this from In progress to Done Oct 7, 2021
Confidential containers automation moved this from In progress to Done Oct 7, 2021
stevenhorsman added a commit to stevenhorsman/kata-containers that referenced this issue Dec 9, 2021
- Use clap to do command line parsing
- Add option to pass in config with -c/--config

Fixes: kata-containers#1837

Co-authored-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
stevenhorsman added a commit to stevenhorsman/kata-containers that referenced this issue Dec 9, 2021
- Use clap to do command line parsing
- Add option to pass in config with -c/--config

Fixes: kata-containers#1837

Co-authored-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
stevenhorsman added a commit to stevenhorsman/kata-containers that referenced this issue Dec 9, 2021
- Use clap to do command line parsing
- Add option to pass in config with -c/--config

Fixes: kata-containers#1837

Co-authored-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/confidential-containers Issues related to confidential containers (see also CCv0 branch) feature New functionality
Development

Successfully merging a pull request may close this issue.