kata agent runtime configuration #1837

sameo · 2021-05-12T14:42:00Z

Problem Statement

The kata-agent communicates with the host through a gRPC API, via a vsock channel. The host typically sends commands and request to the kata-agent, though this RPC mechanism.

In a confidential computing context, where the host software stack is out of the guest TCB, parts of that host <-> guest interface can not be supported without breaking the expected confidential computing threat model.

Proposal

We want to propose for the kata-agent to optionally use a configuration file that would allow for a more dynamically defined agent behaviour. The configuration file would be passed to the kata-agent as a command line option: kata-agent -c agent-configuration.toml

We expect this file to be added to the guest primary root filesystem (initramfs or virtio-block device). As a consequence, confidential computing technologies would include that configuration file as part of their verified TCB measurements. In other words, an attested guest could only start its kata-agent through a verified configuration file.

Goals

Although a kata-agent runtime configuration file could be used to extensively customize the agent's behaviour, the primary and initial goal of that proposal is to use it as a runtime definition of the supported agent gRPC API, in order to meet and build the confidential computing threat model.

In other words the goals of that proposal are:

Allow for the kata-agent binary to use an optional configuration file
Define an API restriction section for this configuration file
Add the existing kata-agent command line parameters to the configuration file. The command line parameters would overwrite the configuration file ones.

Configuration file format

For consistency sake with the other kata-containers components, we propose for the agent configuration file to use the TOML format.

API section

The kata-agent configuration file would contain an [api] section. The section would then contain a blocked_endpoints entry describing the blocked gRPC endpoints as a string list.

Each entry in the blocked_endpoints list should map exactly to one of the agent gRPC service method defined in the AgentService proto file.

The agent should fail to start if one or more entry in the blocked_endpoints does not map to an AgentService method.

Sample

# Agent runtime settings.
# A configured values will be overwritten when passing it through its corresponding command line interface option.
[agent]
debug_console = false
debug_console_vport = 0
log_level = "default"
log_vport = 0
hotplug_timeout = 3
container_pipe_size = 0
server_addr = "vsock://-1:1024"
unified_cgroup_hierarchy = false

# Agent gRPC API settings.
[api]
# Blocked API endpoints.
# A list of blocked gRPC endpoints. For example:
# blocked_endpoints = ["ExecProcess", "ReseedRandomDev"]
#
# Any client request to a blocked enpoint will return a "UNIMPLEMENTED" gRPC status code. 
blocked_endpoints = []

The text was updated successfully, but these errors were encountered:

bergwolf · 2021-05-13T04:08:15Z

@sameo How do you plan to handle container list/creation/deletion APIs? They cannot be blocked otherwise confidential containers cannot be created. But if they are not blocked, the host would be able to manipulate the guest at will.

sameo · 2021-05-13T11:42:05Z

cc @bpradipt @Jakob-Naucke @egernst @jimcadden @c3d @devimc

fitzthum · 2021-05-13T14:04:03Z

It seems like the easiest way to bring the config file into the TCB would be to bake it into the initrd, which most platforms seem to measure or encrypt. This might increase the burden on the guest-owner, however, who would have to keep track of a new initrd measurement for each variation of the configuration file. I'm not sure how much variation there would be in practice, but I can imagine a guest owner wanting to use the same initrd for most deployments.

jimcadden · 2021-05-13T14:34:41Z

@fitzthum seems an easier way would be to pass it in via kernel parameters. Something like agent.config.api={'exec':disabled, ... } This way it could be specified per-instance and likewise attested/validated along with the initrd.

fitzthum · 2021-05-13T14:44:19Z

@jimcadden yeah good point the kernel command line should be a measured interface as well and we will probably be passing some important config information there no matter what. I am a bit wary of passing the entire config through the kernel command line in part because I think there is a length limitation.

magowan · 2021-05-13T18:43:29Z

I am onboard with the idea.
Small typo in the sample :-)
blocked_enpoints = [] -> blocked_endpoints = []

magowan · 2021-05-13T18:43:41Z

Capturing our discussion that the delivery of the configuration file might potentially be a "plugin" point. In the sense that we start with flexibility on how it gets there being a separate consideration to what might be in it and resulting behaviours.

I am however fully on board with concern that the creation/provision of such a file must be part of the flow before any container actions are taken, essentially part of the image, provided at sandbox creation or provided as part of initial boot (attestation etc).

magowan · 2021-05-13T18:47:06Z

Are there permutations of blocked/allowed apis that make no sense. If you can pause a container but not resume for example?
It might be a large challenge to consider all permutations and how we check, possibly consider grouping them to simple problem space?

magowan · 2021-05-13T18:47:51Z

Should we use allowed_endpoints rather than blocked_endpoints? It might be more secure but of course the tradeoff might be a little poorer user experience when creating such a file.
Perhaps a comment or link to api endpoints that can be specified helps.
And we talked about providing a sample.

Leads to backwards compatibility question perhaps? How do we decide to enable or disable the checks or do we just use provide a config file which enables everything for backwards compatibility?

sameo · 2021-05-20T08:47:39Z

I am onboard with the idea.
Small typo in the sample :-)
blocked_enpoints = [] -> blocked_endpoints = []

Fixed, thanks.

sameo · 2021-05-20T09:15:35Z

@sameo How do you plan to handle container list/creation/deletion APIs? They cannot be blocked otherwise confidential containers cannot be created. But if they are not blocked, the host would be able to manipulate the guest at will.

@bergwolf Sorry for the late reply
So first of all, this issue is not about defining which APIs should be blocked/allowed for confidential computing, but rather about providing a way to define those restrictions.

Now to answer your questions:

list: I think this is harmless and the host has many ways to know the list of containers running on a pod already. There are no reasons to block this.
create: We obviously can't block that one and the idea for dynamically adding new containers to a pod is for the agent to get decryption or verification credentials from a relying party, after being attested. In other words, the host can request for a new container to be added, but the agent (Which is part of the attested stack) would be able to a) verify that the remote attestation and key brokering services allow for that container image to run and b) be responsible for pulling and decrypting the container image.
delete: The host can already delete the whole pod if it wants to. Confidential computing is not about availability but rather about preventing the host from seeing or tampering with the tenant's data. So I think it's ok to allow for the host to delete containers, but with that configuration, it becomes a tenant's decision.

sameo · 2021-05-20T12:38:10Z

Should we use allowed_endpoints rather than blocked_endpoints? It might be more secure but of course the tradeoff might be a little poorer user experience when creating such a file.

I thought about blocked endpoints because I believe this list answers the "What must we not support for confidential computing?" question.
I agree an allowed_endpoints may be safer, as in particular it would prevent new endpoints to accidentally be supported with older config files. The cost would be on the user experience side, but otoh users being explicit about the endpoints they want to support may not be the ones that care the most about the smoothest user experience. As long as we have a very easy way to specify all endpoints (like e.g. by not providing the api section at all or by setting allowed_endpoints = [*], I'm fine switching to allowed.

Perhaps a comment or link to api endpoints that can be specified helps.
And we talked about providing a sample.

I'm opening a separate issue to discuss that (edit: This is issue #1891), as I think it's a separate discussion from allowing the agent API to be configured.

Leads to backwards compatibility question perhaps? How do we decide to enable or disable the checks or do we just use provide a config file which enables everything for backwards compatibility?

I think the configuration should be optional. Without a config file the agent would run with a full API. With a config file, the API restrictions specified in that file would apply.

bpradipt · 2021-05-20T13:01:22Z

I think the configuration should be optional. Without a config file the agent would run with a full API. With a config file, the API restrictions specified in that file would apply.

This approach of allowing full API access when agent runs without the config file seems reasonable. Ensures the current user experience doesn't break with confidential computing changes.

c3d · 2021-05-26T10:14:09Z

@bergwolf @sameo

How do you plan to handle container list/creation/deletion APIs? They cannot be blocked otherwise confidential containers cannot be created. But if they are not blocked, the host would be able to manipulate the guest at will.

This aspect of things is discussed to a large extent in #1834, notably with respect to who owns this or that API.

In the case of the create API, there is an additional complication. Today, this is where the stdio channels are setup, but in a split host/tenant model, this needs to be revisited. First, the host cannot access the I/O channels. Second, those might not be created before attestation, which happens after the pod/VM has started.

c3d · 2021-05-26T10:17:05Z

Add the existing kata-agent command line parameters to the configuration file. The command line parameters would overwrite the configuration file ones.

@sameo I don't think this is correct. In a confidential containers context, the tenant would supply the configuration file and rely on the measurements of the boot image to know that this is applied. If you allow agent command line options to override that, then the guarantee seems to be lost. Unless we find a way to add the agent command line to the measurements?

c3d · 2021-05-26T10:20:32Z

From configuration file:

server_addr = "vsock://-1:1024"

In the context of a split host/tenant trust realms like #1834 (I'm now forcing myself to no longer say "trust domain" because in the case of TDX, that's defined as the VM and not whatever else is in the same security "domain"), then the agent would presumably no longer get that over vsock.

@sameo Would you agree that this could later be extended to accept https://my-realm-secure-API-server? How complicated would it be in the agent to accept commands from such other source?

jiangliu · 2021-05-26T12:11:17Z

We have two ways to disable some API services/code:

statically disabled at compilation time
dynamically disabled at runtime
And I think these two methods should be used together.
Another question, is an allowed/denied list of API services enough? Or eventually we need something like seccomp_bpf to also filter request parameters?

jiangliu · 2021-05-26T12:14:41Z

@sameo How do you plan to handle container list/creation/deletion APIs? They cannot be blocked otherwise confidential containers cannot be created. But if they are not blocked, the host would be able to manipulate the guest at will.

We can't disable all API services for confidential computing, otherwise we never need Kata for confidential computing. The only solution will be running kubelet/containerd/runc within a confidential VM. So the answer may be audibility. We should audit/log every sensitive API service request.

sameo · 2021-05-26T12:38:46Z

@sameo How do you plan to handle container list/creation/deletion APIs? They cannot be blocked otherwise confidential containers cannot be created. But if they are not blocked, the host would be able to manipulate the guest at will.

We can't disable all API services for confidential computing, otherwise we never need Kata for confidential computing.

I fully agree, and the plan is certainly not to disable all API for CC. There is no need for it, and deciding which APIs to disable is also a business decision.

The only solution will be running kubelet/containerd/runc within a confidential VM.

That already exists and it's a fairly different use case.

So the answer may be audibility. We should audit/log every sensitive API service request.

Are you saying we should keep all APIs open and audit log the sensitive ones from the agent?

sameo · 2021-05-26T12:48:21Z

We have two ways to disable some API services/code:

1. statically disabled at compilation time

2. dynamically disabled at runtime
   And I think these two methods should be used together.

We discussed about it in our weekly use case meeting, and agreed having it configurable at runtime is more flexible.
One could argue that build time disabling endpoints is safer, but aside from the security aspects, do you see any advantage of doing it at build time over runtime?

   Another question, is an allowed/denied list of API services enough? Or eventually we need something like seccomp_bpf to also filter request parameters?

How would you map gRPC request parameter to seccomp rules?

fitzthum · 2021-05-26T14:14:29Z

Add the existing kata-agent command line parameters to the configuration file. The command line parameters would overwrite the configuration file ones.

@sameo I don't think this is correct. In a confidential containers context, the tenant would supply the configuration file and rely on the measurements of the boot image to know that this is applied. If you allow agent command line options to override that, then the guarantee seems to be lost. Unless we find a way to add the agent command line to the measurements?

@c3d We have some patches for OVMF and maybe QEMU that extend the SEV launch measurement to the kernel command line, kernel, and initrd. See this discussion

When the kernel command line includes a agent.config_file=<path> entry, then we will try to override the default confiuguration values with the ones we parse from a TOML file at <path>. As the configuration file overrides the default values, we need to go through a simplified builder that convert a set of Option<> fields into the actual AgentConfig structure. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

From the endpoints string described through the configuration file, we build a hash set of blocked enpoints. Then for every ttrpc request, we check if the name of the endpoint is part of the hashset. If it is, then we return ttrcp::UNIMPLEMENTED. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

When the kernel command line includes a agent.config_file=<path> entry, then we will try to override the default confiuguration values with the ones we parse from a TOML file at <path>. As the configuration file overrides the default values, we need to go through a simplified builder that convert a set of Option<> fields into the actual AgentConfig structure. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

From the endpoints string described through the configuration file, we build a hash set of blocked enpoints. Then for every ttrpc request, we check if the name of the endpoint is part of the hashset. If it is, then we return ttrcp::UNIMPLEMENTED. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

When the kernel command line includes a agent.config_file=<path> entry, then we will try to override the default confiuguration values with the ones we parse from a TOML file at <path>. As the configuration file overrides the default values, we need to go through a simplified builder that convert a set of Option<> fields into the actual AgentConfig structure. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

From the endpoints string described through the configuration file, we build a hash set of blocked enpoints. Then for every ttrpc request, we check if the name of the endpoint is part of the hashset. If it is, then we return ttrcp::UNIMPLEMENTED. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

dcmiddle · 2021-09-23T13:39:54Z

Generally an allow-list is more secure than a deny-list. To offset the usability challenge an example file could be provided with the minimum required APIs allowed, and perhaps some guidance on what other APIs should be enabled if the user experiences a failure.

This issue and an associated PR are pretty far along, though. An alternative would be to stay with the deny-list, but provide an example where all but the minimum APIs are denied (blocked). That approach is less future-proof though, i.e., if a new API is added users would need to know to go back and update their agent configurations that are in production.

I'm not clear where the agent-configuration.toml will live. As noted above it needs to be part of the measurement.

When the kernel command line includes a agent.config_file=<path> entry, then we will try to override the default confiuguration values with the ones we parse from a TOML file at <path>. As the configuration file overrides the default values, we need to go through a simplified builder that convert a set of Option<> fields into the actual AgentConfig structure. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

From the endpoints string described through the configuration file, we build a hash set of allowed enpoints. If a configuration files does not include an endpoints section, we assume all endpoints are allowed. Then for every ttrpc request, we check if the name of the endpoint is part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

When the kernel command line includes a agent.config_file=<path> entry, then we will try to override the default confiuguration values with the ones we parse from a TOML file at <path>. As the configuration file overrides the default values, we need to go through a simplified builder that convert a set of Option<> fields into the actual AgentConfig structure. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

From the endpoints string described through the configuration file, we build a hash set of allowed enpoints. If a configuration files does not include an endpoints section, we assume all endpoints are allowed. Then for every ttrpc request, we check if the name of the endpoint is part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

From the endpoints string described through the configuration file, we build a hash set of allowed enpoints. If a configuration files does not include an endpoints section, we assume all endpoints are not allowed. Then for every ttrpc request, we check if the name of the endpoint is part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

When the kernel command line includes a agent.config_file=<path> entry, then we will try to override the default confiuguration values with the ones we parse from a TOML file at <path>. As the configuration file overrides the default values, we need to go through a simplified builder that convert a set of Option<> fields into the actual AgentConfig structure. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

From the endpoints string described through the configuration file, we build a hash set of allowed enpoints. If a configuration files does not include an endpoints section, we assume all endpoints are not allowed. If there is no configuration file, then all endpoints are allowed. Then for every ttrpc request, we check if the name of the endpoint is part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

When the kernel command line includes a agent.config_file=<path> entry, then we will try to override the default confiuguration values with the ones we parse from a TOML file at <path>. As the configuration file overrides the default values, we need to go through a simplified builder that convert a set of Option<> fields into the actual AgentConfig structure. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

From the endpoints string described through the configuration file, we build a hash set of allowed enpoints. If a configuration files does not include an endpoints section, we assume all endpoints are not allowed. If there is no configuration file, then all endpoints are allowed. Then for every ttrpc request, we check if the name of the endpoint is part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED. Fixes: kata-containers#1837 Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>

- Use clap to do command line parsing - Add option to pass in config with -c/--config Fixes: kata-containers#1837 Co-authored-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>

sameo added feature New functionality needs-review Needs to be assessed by the team. labels May 12, 2021

katacontainersbot added this to To do in Issue backlog May 12, 2021

sameo added this to To do in Confidential containers via automation May 12, 2021

sameo changed the title ~~Kata agent runtime configuration~~ kata agent runtime configuration May 12, 2021

sameo self-assigned this May 20, 2021

sameo mentioned this issue May 20, 2021

[WIP] agent: API restrictions reference for confidential computing #1891

Open

ariel-adam added area/confidential-containers Issues related to confidential containers (see also CCv0 branch) and removed needs-review Needs to be assessed by the team. labels May 25, 2021

sameo moved this from To do to In progress in Issue backlog Sep 16, 2021

egernst closed this as completed in #2517 Oct 7, 2021

Issue backlog automation moved this from In progress to Done Oct 7, 2021

Confidential containers automation moved this from In progress to Done Oct 7, 2021

stevenhorsman mentioned this issue Dec 10, 2021

agent: Add command line option for configuration to kata-agent #3252

Closed

stevenhorsman mentioned this issue Mar 31, 2022

Identify and detail more non-TEE tests confidential-containers/confidential-containers#3

Closed

fitzthum mentioned this issue Nov 7, 2022

CC: Agent endpoint config logic inconsistent #5590

Closed

danmihai1 mentioned this issue Jul 20, 2023

[RFC] Proposal for Container Metadata Validation confidential-containers/confidential-containers#126

Open

18 tasks

danmihai1 mentioned this issue Aug 17, 2023

consider replacing the Agent Config feature with the Agent Policy #7678

Open

danmihai1 mentioned this issue Oct 13, 2023

remove endpoint blocking ability from agent-config.toml #8228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kata agent runtime configuration #1837

kata agent runtime configuration #1837

sameo commented May 12, 2021 •

edited

bergwolf commented May 13, 2021

sameo commented May 13, 2021 •

edited

fitzthum commented May 13, 2021

jimcadden commented May 13, 2021 •

edited

fitzthum commented May 13, 2021

magowan commented May 13, 2021

magowan commented May 13, 2021

magowan commented May 13, 2021

magowan commented May 13, 2021

sameo commented May 20, 2021

sameo commented May 20, 2021

sameo commented May 20, 2021 •

edited

bpradipt commented May 20, 2021

c3d commented May 26, 2021

c3d commented May 26, 2021 •

edited

c3d commented May 26, 2021

jiangliu commented May 26, 2021

jiangliu commented May 26, 2021

sameo commented May 26, 2021

sameo commented May 26, 2021

fitzthum commented May 26, 2021

dcmiddle commented Sep 23, 2021

kata agent runtime configuration #1837

kata agent runtime configuration #1837

Comments

sameo commented May 12, 2021 • edited

Problem Statement

Proposal

Goals

Configuration file format

API section

Sample

bergwolf commented May 13, 2021

sameo commented May 13, 2021 • edited

fitzthum commented May 13, 2021

jimcadden commented May 13, 2021 • edited

fitzthum commented May 13, 2021

magowan commented May 13, 2021

magowan commented May 13, 2021

magowan commented May 13, 2021

magowan commented May 13, 2021

sameo commented May 20, 2021

sameo commented May 20, 2021

sameo commented May 20, 2021 • edited

bpradipt commented May 20, 2021

c3d commented May 26, 2021

c3d commented May 26, 2021 • edited

c3d commented May 26, 2021

jiangliu commented May 26, 2021

jiangliu commented May 26, 2021

sameo commented May 26, 2021

sameo commented May 26, 2021

fitzthum commented May 26, 2021

dcmiddle commented Sep 23, 2021

sameo commented May 12, 2021 •

edited

sameo commented May 13, 2021 •

edited

jimcadden commented May 13, 2021 •

edited

sameo commented May 20, 2021 •

edited

c3d commented May 26, 2021 •

edited