-
Notifications
You must be signed in to change notification settings - Fork 14k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #45821 from saschagrunert/blog-cri-streaming-expla…
…ined Add blog post about: CRI streaming explained
- Loading branch information
Showing
1 changed file
with
338 additions
and
0 deletions.
There are no files selected for viewing
338 changes: 338 additions & 0 deletions
338
content/en/blog/_posts/2024-05-01-cri-streaming-explained.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,338 @@ | ||
--- | ||
layout: blog | ||
title: "Container Runtime Interface streaming explained" | ||
date: 2024-05-01 | ||
slug: cri-streaming-explained | ||
author: Sascha Grunert | ||
--- | ||
|
||
The Kubernetes [Container Runtime Interface (CRI)](/docs/concepts/architecture/cri) | ||
acts as the main connection between the [kubelet](/docs/reference/command-line-tools-reference/kubelet) | ||
and the [Container Runtime](/docs/setup/production-environment/container-runtimes). | ||
Those runtimes have to provide a [gRPC](https://grpc.io) server which has to | ||
fulfill a Kubernetes defined [Protocol Buffer](https://protobuf.dev) interface. | ||
[This API definition](https://github.com/kubernetes/cri-api/blob/63929b3/pkg/apis/runtime/v1/api.proto) | ||
evolves over time, for example when contributors add new features or fields are | ||
going to become deprecated. | ||
|
||
In this blog post, I'd like to dive into the functionality and history of three | ||
extraordinary Remote Procedure Calls (RPCs), which are truly outstanding in | ||
terms of how they work: `Exec`, `Attach` and `PortForward`. | ||
|
||
**Exec** can be used to run dedicated commands within the container and stream | ||
the output to a client like [kubectl](/docs/reference/kubectl) or | ||
[crictl](/docs/tasks/debug/debug-cluster/crictl). It also allows interaction with | ||
that process using standard input (stdin), for example if users want to run a | ||
new shell instance within an existing workload. | ||
|
||
**Attach** streams the output of the currently running process via [standard I/O](https://en.wikipedia.org/wiki/Standard_streams) | ||
from the container to the client and also allows interaction with them. This is | ||
particularly useful if users want to see what is going on in the container and | ||
be able to interact with the process. | ||
|
||
**PortForward** can be utilized to forward a port from the host to the container | ||
to be able to interact with it using third party network tools. This allows it | ||
to bypass [Kubernetes services](/docs/concepts/services-networking/service) | ||
for a certain workload and interact with its network interface. | ||
|
||
## What is so special about them? | ||
|
||
All RPCs of the CRI either use the [gRPC unary calls](https://grpc.io/docs/what-is-grpc/core-concepts/#unary-rpc) | ||
for communication or the [server side streaming](https://grpc.io/docs/what-is-grpc/core-concepts/#server-streaming-rpc) | ||
feature (only `GetContainerEvents` right now). This means that mainly all RPCs | ||
retrieve a single client request and have to return a single server response. | ||
The same applies to `Exec`, `Attach`, and `PortForward`, where their [protocol definition](https://github.com/kubernetes/cri-api/blob/63929b3/pkg/apis/runtime/v1/api.proto#L94-L99) | ||
looks like this: | ||
|
||
```protobuf | ||
// Exec prepares a streaming endpoint to execute a command in the container. | ||
rpc Exec(ExecRequest) returns (ExecResponse) {} | ||
``` | ||
|
||
```protobuf | ||
// Attach prepares a streaming endpoint to attach to a running container. | ||
rpc Attach(AttachRequest) returns (AttachResponse) {} | ||
``` | ||
|
||
```protobuf | ||
// PortForward prepares a streaming endpoint to forward ports from a PodSandbox. | ||
rpc PortForward(PortForwardRequest) returns (PortForwardResponse) {} | ||
``` | ||
|
||
The requests carry everything required to allow the server to do the work, | ||
for example, the `ContainerId` or command (`Cmd`) to be run in case of `Exec`. | ||
More interestingly, all of their responses only contain a `url`: | ||
|
||
```protobuf | ||
message ExecResponse { | ||
// Fully qualified URL of the exec streaming server. | ||
string url = 1; | ||
} | ||
``` | ||
|
||
```protobuf | ||
message AttachResponse { | ||
// Fully qualified URL of the attach streaming server. | ||
string url = 1; | ||
} | ||
``` | ||
|
||
```protobuf | ||
message PortForwardResponse { | ||
// Fully qualified URL of the port-forward streaming server. | ||
string url = 1; | ||
} | ||
``` | ||
|
||
Why is it implemented like that? Well, [the original design document](https://docs.google.com/document/d/1MreuHzNvkBW6q7o_zehm1CBOBof3shbtMTGtUpjpRmY) | ||
for those RPCs even predates [Kubernetes Enhancements Proposals (KEPs)](https://github.com/kubernetes/enhancements) | ||
and was originally outlined back in 2016. The kubelet had a native | ||
implementation for `Exec`, `Attach`, and `PortForward` before the | ||
initiative to bring the functionality to the CRI started. Before that, | ||
everything was bound to [Docker](https://www.docker.com) or the later abandoned | ||
container runtime [rkt](https://github.com/rkt/rkt). | ||
|
||
The CRI related design document also elaborates on the option to use native RPC | ||
streaming for exec, attach, and port forward. The downsides outweighed this | ||
approach: the kubelet would still create a network bottleneck and future | ||
runtimes would not be free in choosing the server implementation details. Also, | ||
another option that the Kubelet implements a portable, runtime-agnostic solution | ||
has been abandoned over the final one, because this would mean another project | ||
to maintain which nevertheless would be runtime dependent. | ||
|
||
This means, that the basic flow for `Exec`, `Attach` and `PortForward` | ||
was proposed to look like this: | ||
|
||
{{< mermaid >}} | ||
sequenceDiagram | ||
participant crictl | ||
participant kubectl | ||
participant API as API Server | ||
participant kubelet | ||
participant runtime as Container Runtime | ||
participant streaming as Streaming Server | ||
alt Client alternatives | ||
Note over kubelet,runtime: Container Runtime Interface (CRI) | ||
kubectl->>API: exec, attach, port-forward | ||
API->>kubelet: | ||
kubelet->>runtime: Exec, Attach, PortForward | ||
else | ||
Note over crictl,runtime: Container Runtime Interface (CRI) | ||
crictl->>runtime: Exec, Attach, PortForward | ||
end | ||
runtime->>streaming: New Session | ||
streaming->>runtime: HTTP endpoint (URL) | ||
alt Client alternatives | ||
runtime->>kubelet: Response URL | ||
kubelet->>API: | ||
API-->>streaming: Connection upgrade (SPDY or WebSocket) | ||
streaming-)API: Stream data | ||
API-)kubectl: Stream data | ||
else | ||
runtime->>crictl: Response URL | ||
crictl-->>streaming: Connection upgrade (SPDY or WebSocket) | ||
streaming-)crictl: Stream data | ||
end | ||
{{< /mermaid >}} | ||
|
||
Clients like crictl or the kubelet (via kubectl) request a new exec, attach or | ||
port forward session from the runtime using the gRPC interface. The runtime | ||
implements a streaming server that also manages the active sessions. This | ||
streaming server provides an HTTP endpoint for the client to connect to. The | ||
client upgrades the connection to use the [SPDY](https://en.wikipedia.org/wiki/SPDY) | ||
streaming protocol or (in the future) to a [WebSocket](https://en.wikipedia.org/wiki/WebSocket) | ||
connection and starts to stream the data back and forth. | ||
|
||
This implementation allows runtimes to have the flexibility to implement | ||
`Exec`, `Attach` and `PortForward` the way they want, and also allows a | ||
simple test path. Runtimes can change the underlying implementation to support | ||
any kind of feature without having a need to modify the CRI at all. | ||
|
||
Many smaller enhancements to this overall approach have been merged into | ||
Kubernetes in the past years, but the general pattern has always stayed the | ||
same. The kubelet source code transformed into [a reusable library](https://github.com/kubernetes/kubernetes/blob/db9fcfe/staging/src/k8s.io/kubelet/pkg/cri/streaming), | ||
which is nowadays usable from container runtimes to implement the basic | ||
streaming capability. | ||
|
||
## How does the streaming actually work? | ||
|
||
At a first glance, it looks like all three RPCs work the same way, but that's | ||
not the case. It's possible to group the functionality of **Exec** and | ||
**Attach**, while **PortForward** follows a distinct internal protocol | ||
definition. | ||
|
||
### Exec and Attach | ||
|
||
Kubernetes defines **Exec** and **Attach** as _remote commands_, where its | ||
protocol definition exists in [five different versions](https://github.com/kubernetes/kubernetes/blob/9791f0d/staging/src/k8s.io/apimachinery/pkg/util/remotecommand/constants.go#L28-L52): | ||
|
||
| # | Version | Note | | ||
| --- | ------------------- | ---------------------------------------------------------------------------------------------------------------------- | | ||
| 1 | `channel.k8s.io` | Initial (unversioned) SPDY sub protocol ([#13394](https://issues.k8s.io/13394), [#13395](https://issues.k8s.io/13395)) | | ||
| 2 | `v2.channel.k8s.io` | Resolves the issues present in the first version ([#15961](https://github.com/kubernetes/kubernetes/pull/15961)) | | ||
| 3 | `v3.channel.k8s.io` | Adds support for resizing container terminals ([#25273](https://github.com/kubernetes/kubernetes/pull/25273)) | | ||
| 4 | `v4.channel.k8s.io` | Adds support for exit codes using JSON errors ([#26541](https://github.com/kubernetes/kubernetes/pull/26541)) | | ||
| 5 | `v5.channel.k8s.io` | Adds support for a CLOSE signal ([#119157](https://github.com/kubernetes/kubernetes/pull/119157)) | | ||
|
||
On top of that, there is an overall effort to replace the SPDY transport | ||
protocol using WebSockets as part [KEP #4006](https://github.com/kubernetes/enhancements/issues/4006). | ||
Runtimes have to satisfy those protocols over their life cycle to stay up to | ||
date with the Kubernetes implementation. | ||
|
||
Let's assume that a client uses the latest (`v5`) version of the protocol as | ||
well as communicating over WebSockets. In that case, the general flow would be: | ||
|
||
1. The client requests an URL endpoint for **Exec** or **Attach** using the CRI. | ||
|
||
- The server (runtime) validates the request, inserts it into a connection | ||
tracking cache, and provides the HTTP endpoint URL for that request. | ||
|
||
1. The client connects to that URL, upgrades the connection to establish | ||
a WebSocket, and starts to stream data. | ||
|
||
- In the case of **Attach**, the server has to stream the main container process | ||
data to the client. | ||
- In the case of **Exec**, the server has to create the subprocess command within | ||
the container and then streams the output to the client. | ||
|
||
If stdin is required, then the server needs to listen for that as well and | ||
redirect it to the corresponding process. | ||
|
||
Interpreting data for the defined protocol is fairly simple: The first | ||
byte of every input and output packet [defines](https://github.com/kubernetes/kubernetes/blob/9791f0d/staging/src/k8s.io/apimachinery/pkg/util/remotecommand/constants.go#L57-L64) | ||
the actual stream: | ||
|
||
| First Byte | Type | Description | | ||
| ---------- | --------------- | ---------------------------------------- | | ||
| `0` | standard input | Data streamed from stdin | | ||
| `1` | standard output | Data streamed to stdout | | ||
| `2` | standard error | Data streamed to stderr | | ||
| `3` | stream error | A streaming error occurred | | ||
| `4` | stream resize | A terminal resize event | | ||
| `255` | stream close | Stream should be closed (for WebSockets) | | ||
|
||
How should runtimes now implement the streaming server methods for **Exec** and | ||
**Attach** by using the provided kubelet library? The key is that the streaming | ||
server implementation in the kubelet [outlines an interface](https://github.com/kubernetes/kubernetes/blob/db9fcfe/staging/src/k8s.io/kubelet/pkg/cri/streaming/server.go#L63-L68) | ||
called `Runtime` which has to be fulfilled by the actual container runtime if it | ||
wants to use that library: | ||
|
||
```go | ||
// Runtime is the interface to execute the commands and provide the streams. | ||
type Runtime interface { | ||
Exec(ctx context.Context, containerID string, cmd []string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error | ||
Attach(ctx context.Context, containerID string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error | ||
PortForward(ctx context.Context, podSandboxID string, port int32, stream io.ReadWriteCloser) error | ||
} | ||
``` | ||
|
||
Everything related to the protocol interpretation is | ||
already in place and runtimes only have to implement the actual `Exec` and | ||
`Attach` logic. For example, the container runtime [CRI-O](https://github.com/cri-o/cri-o) | ||
does it [like this pseudo code](https://github.com/cri-o/cri-o/blob/2a0867/server/container_exec.go#L27-L46): | ||
|
||
```go | ||
func (s StreamService) Exec( | ||
ctx context.Context, | ||
containerID string, | ||
cmd []string, | ||
stdin io.Reader, stdout, stderr io.WriteCloser, | ||
tty bool, | ||
resizeChan <-chan remotecommand.TerminalSize, | ||
) error { | ||
// Retrieve the container by the provided containerID | ||
// … | ||
|
||
// Update the container status and verify that the workload is running | ||
// … | ||
|
||
// Execute the command and stream the data | ||
return s.runtimeServer.Runtime().ExecContainer( | ||
s.ctx, c, cmd, stdin, stdout, stderr, tty, resizeChan, | ||
) | ||
} | ||
``` | ||
|
||
### PortForward | ||
|
||
Forwarding ports to a container works a bit differently when comparing it to | ||
streaming IO data from a workload. The server still has to provide a URL | ||
endpoint for the client to connect to, but then the container runtime has to | ||
enter the network namespace of the container, allocate the port as well as | ||
stream the data back and forth. There is no simple protocol definition available | ||
like for **Exec** or **Attach**. This means that the client will stream the | ||
plain SPDY frames (with or without an additional WebSocket connection) which can | ||
be interpreted using libraries like [moby/spdystream](https://github.com/moby/spdystream). | ||
|
||
Luckily, the kubelet library already provides the `PortForward` interface method | ||
which has to be implemented by the runtime. [CRI-O does that]() by (simplified): | ||
|
||
```go | ||
func (s StreamService) PortForward( | ||
ctx context.Context, | ||
podSandboxID string, | ||
port int32, | ||
stream io.ReadWriteCloser, | ||
) error { | ||
// Retrieve the pod sandbox by the provided podSandboxID | ||
sandboxID, err := s.runtimeServer.PodIDIndex().Get(podSandboxID) | ||
sb := s.runtimeServer.GetSandbox(sandboxID) | ||
// … | ||
|
||
// Get the network namespace path on disk for that sandbox | ||
netNsPath := sb.NetNsPath() | ||
// … | ||
|
||
// Enter the network namespace and stream the data | ||
return s.runtimeServer.Runtime().PortForwardContainer( | ||
ctx, sb.InfraContainer(), netNsPath, port, stream, | ||
) | ||
} | ||
``` | ||
|
||
## Future work | ||
|
||
The flexibility Kubernetes provides for the RPCs `Exec`, `Attach` and | ||
`PortForward` is truly outstanding compared to other methods. Nevertheless, | ||
container runtimes have to keep up with the latest and greatest implementations | ||
to support those features in a meaningful way. The general effort to support | ||
WebSockets is not only a plain Kubernetes thing, it also has to be supported by | ||
container runtimes as well as clients like `crictl`. | ||
|
||
For example, `crictl` v1.30 features a new `--transport` flag for the | ||
subcommands `exec`, `attach` and `port-forward` | ||
([#1383](https://github.com/kubernetes-sigs/cri-tools/pull/1383), | ||
[#1385](https://github.com/kubernetes-sigs/cri-tools/pull/1385)) | ||
to allow choosing between `websocket` and `spdy`. | ||
|
||
CRI-O is going an experimental path by moving the streaming server | ||
implementation into [conmon-rs](https://github.com/containers/conmon-rs) | ||
(a substitute for the container monitor [conmon](https://github.com/containers/conmon)). conmon-rs is | ||
a [Rust](https://www.rust-lang.org) implementation of the original container | ||
monitor and allows streaming WebSockets directly using supported libraries | ||
([#2070](https://github.com/containers/conmon-rs/pull/2070)). The major benefit | ||
of this approach is that CRI-O does not even have to be running while conmon-rs | ||
can keep active **Exec**, **Attach** and **PortForward** sessions open. The | ||
simplified flow when using crictl directly will then look like this: | ||
|
||
{{< mermaid >}} | ||
sequenceDiagram | ||
autonumber | ||
participant crictl | ||
participant runtime as Container Runtime | ||
participant conmon-rs | ||
Note over crictl,runtime: Container Runtime Interface (CRI) | ||
crictl->>runtime: Exec, Attach, PortForward | ||
Note over runtime,conmon-rs: Cap’n Proto | ||
runtime->>conmon-rs: Serve Exec, Attach, PortForward | ||
conmon-rs->>runtime: HTTP endpoint (URL) | ||
runtime->>crictl: Response URL | ||
crictl-->>conmon-rs: Connection upgrade to WebSocket | ||
conmon-rs-)crictl: Stream data | ||
{{< /mermaid >}} | ||
|
||
All of those enhancements require iterative design decisions, while the original | ||
well-conceived implementation acts as the foundation for those. I really hope | ||
you've enjoyed this compact journey through the history of CRI RPCs. Feel free | ||
to reach out to me anytime for suggestions or feedback using the | ||
[official Kubernetes Slack](https://kubernetes.slack.com/team/U53SUDBD4). |