New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Direct Attachable CNIs For Kata Containers #1922
Comments
|
Stefano Brivio is working on the following user space solution that could also be relevant for kata: https://passt.top/passt/about/ |
A few details about this: the general idea (not ready for consumption or demo, at this stage) would be to add vhost-user ring support to it, and provide full network capabilities (except for custom IP protocols) without requiring special privileges, while bypassing qemu for the data path, and the possibility of having rootless containers (at least as far as network capabilities are involved). This is relatively independent from the presence of a further network namespace -- maintaining it allows for easier configuration, and also operation of rootless containers similar to what slirp4netns provides, but indeed dropping the separate network namespace on the host should have significant potential to improve packet transfer performance. |
|
Some comments/questions:
|
|
There are two motivations in the original proposal:
When security is a concern, we can just create a fake netns when CNI doesn't create one, as you suggested in the comments. I don't see a need for equivalent security solutions. Do you have other concern/usecase in mind?
It is CNI agnostic in the sense that the API extension does not bind to any specific CNI implementation and any CNI can support it if it wants to. And ALL exiting CNIs create netns for PODs at the moment. So I would not rush to implementing a fake netns in kata in the current stage. |
This is actually why I was asking: should -- but do you have some data? Note that the network stack isn't traversed multiple times because of that -- indeed there must be overhead, I just wonder if we already have some indications of it.
Wait, sorry, my comment was probably not clear then. By "faking a network namespace" I mean pretending there is one, when in fact there's none. In any case, if it's fake, it serves no purpose, so it shouldn't affect security negatively nor positively.
Say ARP spoofing. With the current solution implementing a separate network namespace, we have a rather convenient place (tc filter interface between tap and veth) to drop or forward frames based on their Ethernet addresses. If the tap interface moves outside of a separate network namespace, this possibility is gone.
If the CNI needs to know about this, then I would still argue that it's not CNI agnostic. :) But yes, I understand your point, it can be implemented by multiple CNIs, it doesn't require a specific CNI to be implemented for it.
I meant almost the opposite: given that CNIs need a separate network namespace, we keep it, but we might optionally not use it (that's what makes it "fake"), and the CNI doesn't even know about it (that's what makes it agnostic). |
|
Here is the performance test result between
Note: the extra TC mirror overhead is not tested |
|
AFAIK, containerd instead of CNI created the netns https://github.com/containerd/cri/blob/master/pkg/server/sandbox_run.go#L122 |
|
After some test with tap device, once the tap device is moved to netns, the ovs can not find the tap device and lost link to it. |
Upgrade from v0.20.0 to v1.0.0-RC3.
Git log
4bfa0034 Release prep v1.0.0-RC3 (kata-containers#2218)
c7ae470a Refactor SDK span creation and implementation (kata-containers#2213)
db317fce Verify and update OTLP trace exporter documentation (kata-containers#2053)
04de34a2 Update the website getting started docs (kata-containers#2203)
a7b9d021 Rename metric instruments to match feature-freeze API specification (kata-containers#2202)
1f527a52 Update trace API config creation functions (kata-containers#2212)
361a2096 Fix RC2 header in changelog (kata-containers#2215)
e209ee75 chore(exporter/zipkin): improves logging on invalid collector. (kata-containers#2191)
c0c5ef65 Fix typos in resource.go. (kata-containers#2201)
abf6afe0 Update otel example guide (kata-containers#2210)
3b05ba02 Bump actions/setup-go from 2.1.3 to 2.1.4 (kata-containers#2206)
bcd7ff7b Bump codecov/codecov-action from 2.0.2 to 2.0.3 (kata-containers#2205)
c912b179 Print JSON objects to stdout without a wrapping array (kata-containers#2196)
add511c1 Make WithoutTimestamps work (kata-containers#2195)
85c27e01 Bump github.com/golangci/golangci-lint from 1.41.1 to 1.42.0 in /internal/tools (kata-containers#2199)
bf6500b3 Bump google.golang.org/grpc from 1.39.1 to 1.40.0 in /exporters/otlp/otlptrace (kata-containers#2184)
9392af96 Bump google.golang.org/grpc in /exporters/otlp/otlptrace/otlptracegrpc (kata-containers#2185)
c95694dc Bump google.golang.org/grpc from 1.39.1 to 1.40.0 in /example/otel-collector (kata-containers#2183)
0528fa66 Bump google.golang.org/grpc from 1.39.1 to 1.40.0 in /exporters/otlp/otlpmetric (kata-containers#2186)
3a26ed21 Deprecate the oteltest package (kata-containers#2188)
c885435f Website: support GH page links to canonical src (kata-containers#2189)
6da20a27 Add cross-module test coverage (kata-containers#2182)
dfc866bd Support capturing stack trace (kata-containers#2163)
41588fea Deprecate the attribute.Any function (kata-containers#2181)
4e8d667f Support a single Resource per MeterProvider in the SDK (kata-containers#2120)
a8bb0bf8 Make the tracetest.SpanRecorder concurrent safe (kata-containers#2178)
87d09df3 Deprecate Array attribute in favor of *Slice types (kata-containers#2162)
df384a9a Move InstrumentKind into the new metric/sdkapi package (kata-containers#2091)
1cb5cdca Unify the OTLP attribute transform (kata-containers#2170)
a882ee37 Clarify the attribute package documentation and order/grouping (kata-containers#2168)
5d25c4d2 Add support for int32 in attribute.Any (kata-containers#2169)
2b0e139e Refactor attributes benchmark tests (kata-containers#2167)
4c7470d9 Bump google.golang.org/grpc from 1.39.0 to 1.39.1 in /exporters/otlp/otlptrace (kata-containers#2176)
990c534a Bump google.golang.org/grpc in /example/otel-collector (kata-containers#2172)
b45c9d31 Bump google.golang.org/grpc from 1.39.0 to 1.39.1 in /exporters/otlp/otlpmetric (kata-containers#2174)
a3d4ff5c Deprecated the bridge/opencensus/utils package (kata-containers#2166)
b1d1d529 Move OC bridge integration tests to own mod (kata-containers#2165)
89a9489c Add OC bridge internal unit tests (kata-containers#2164)
56c743ba Allow global ErrorHandler to be set multiple times (kata-containers#2160)
d18c135f Add OpenCensus bridge internal package (kata-containers#2146)
fcf945a4 Just a little typo fix in code documentation. (kata-containers#2159)
59a82eba Update version.go (kata-containers#2157)
21d4686f Add ErrorHandlerFunc to simplify creating ErrorHandlers (kata-containers#2149)
23cb9396 Remove `internal/semconv-gen` (kata-containers#2155)
39acab32 Fix code sample in otel.GetTraceProvider (kata-containers#2147)
2b1bb29e Update OpenCensus bridge docs with limitations (kata-containers#2145)
fd7c327b Fix Jaeger exporter agent port default value and docs (kata-containers#2131)
b8561785 fix(2138): add guard to constructOTResources to return an empty resource (kata-containers#2139)
11f62640 Add a SpanRecorder to the sdk/trace/tracetest (kata-containers#2132)
fd9de7ec rename assertsocketbuffersize.go to *_test (kata-containers#2136)
a6b4d90c nit doc fix (kata-containers#2135)
79398418 pre-release v1.0.0-RC2 (kata-containers#2133)
2501e0fd Use semconv.SchemaURL in STDOUT exporter example (kata-containers#2134)
ef03dbc9 Bump codecov/codecov-action from 1 to 2.0.2 (kata-containers#2129)
bbe6ca40 Deprecate oteltest.Harness for removal (kata-containers#2123)
7a624ac2 Deprecated the oteltest.TraceStateFromKeyValues function (kata-containers#2122)
ece1879f Removed dropped link's attributes field from API package (kata-containers#2118)
03902d98 Rename sdk/trace/tracetest test.go -> exporter.go (kata-containers#2128)
cb607b0a Unify OTLP exporter retry logic (kata-containers#2095)
abe22437 API: create new linked span from current context (kata-containers#2115)
db81d4aa Update internal/global/trace testing (kata-containers#2111)
7f10ef72 Remove propagation testing types from oteltest (kata-containers#2116)
25d739b0 Remove resource.WithBuiltinDetectors() which has not been maintained (kata-containers#2097)
d57c5a56 Remove several metrics test helpers (kata-containers#2105)
49359495 Simplify trace_context tests (#2108)
56d42011 Simplify trace context benchmark test (#2109)
63dfe64a Correct status transform in OTLP exporter (kata-containers#2102)
9b1a5f70 Performance improvement: avoid creating multiple same read-only objects (kata-containers#2104)
ab78dbd0 Update release URL (kata-containers#2106)
647af3a0 Pre release experimental metrics v0.22.0 (kata-containers#2101)
0a562337 Fixed OS type value for DragonFly BSD (kata-containers#2092)
62c21ffb Bump golang.org/x/tools from 0.1.4 to 0.1.5 in /internal/tools (kata-containers#2096)
4a3da55a Ensure sample code in website_docs getting started page works (kata-containers#2094)
d3063a3d Update otel.Meter to global.Meter in Getting Started Document.(kata-containers#2087) (kata-containers#2093)
00a1ec5f Add documentation guidelines and improve Jaeger exporter readme (kata-containers#2082)
12f737c7 oteltest: ensure valid SpanContext created for span started WithNewRoot (kata-containers#2073)
484258eb OS description attribute detector (kata-containers#1840)
d8c9a955 Bump google.golang.org/grpc from 1.38.0 to 1.39.0 in /example/otel-collector (kata-containers#2054)
4ffdf034 Add @pellard as an Approver (kata-containers#2047)
1a74b399 Bump google.golang.org/protobuf from 1.26.0 to 1.27.0 in /exporters/otlp/otlpmetric (kata-containers#2040)
57c2e8fb Bump golang.org/x/tools from 0.1.3 to 0.1.4 in /internal/tools (kata-containers#2036)
7cff31a9 Bump google.golang.org/protobuf from 1.26.0 to 1.27.0 in /exporters/otlp/otlptrace (kata-containers#2035)
9e8f523d when using WithNewRoot, don't use the parent context for sampling (kata-containers#2032)
62af6c70 semconv-gen: fix capitalization at word boundaries, add stability/deprecation indicators (kata-containers#2033)
0bceed7e Fix docs on otel-collector example (kata-containers#2034)
6428cd69 Update doc.go (kata-containers#2030)
311a6396 fix documentation for trace.Status (kata-containers#2029)
16f83ce6 export ToZipkinSpanModels for use outside this library (kata-containers#2027)
d5d4c87f Add HTTP metrics exporter for OTLP (kata-containers#2022)
d6e8f60f Bump github.com/golangci/golangci-lint from 1.40.1 to 1.41.1 in /internal/tools (kata-containers#2023)
51dbe3cb Remove deprecated exporters (kata-containers#2020)
257ef7fc Update project status in README (kata-containers#2017)
ced177b7 Pre-release 1.0.0-RC1 (kata-containers#2013)
694c9a41 Interface stability documentation (kata-containers#2012)
39fe8092 Add span.TracerProvider() (kata-containers#2009)
d020e1a2 Add more tests for go.opentelemetry.io/otel/trace package. (kata-containers#2004)
6d4a38f1 replace WithSyncer with WithBatcher in opencensus example (kata-containers#2007)
c30cd1d0 Split stdout exporter into stdouttrace and stdoutmetric (kata-containers#2005)
80ca2b1e otlp: mark unix endpoints to work without transport security (kata-containers#2001)
65140985 Update codecov ignore (kata-containers#2006)
3be9813d Deprecate the exporters in the "trace" and "metric" sub-directories (kata-containers#1993)
377f7ce4 remove WithTrace* options from otlptrace exporters (kata-containers#1997)
b33edaa5 OTLP metrics gRPC exporter (kata-containers#1991)
64b640cc Remove old OTLP exporter (kata-containers#1990)
7728a521 Remove dependency on metrics packages (kata-containers#1988)
135ac4b6 Moved internal/tools duplicated findRepoRoot function to common package (kata-containers#1978)
cdf67ddf Update semantic conventions to v1.4.0, move to versioned package (kata-containers#1987)
4883cb11 Refactor exporter creation functions (kata-containers#1985)
87cc1e1f Test BatchSpanProcessor export timeout directly (kata-containers#1982)
7ffe2845 Added inputPath validation to semconv-gen (kata-containers#1986)
a113856a Add caveat about installing opencensus bridge (kata-containers#1983)
741cb9a3 Fix generator.go call typo in RELEASING.md (kata-containers#1977)
7a0cee7b Replaces golint by revive and fix newly reported linter issues (kata-containers#1946)
46d9687a Add Schema URL support to Resource (kata-containers#1938)
0827aa62 Use mock server as jaeger agent listener. (kata-containers#1930)
20886012 Bugfix jaeger exporter test panic (kata-containers#1973)
4bf6150f Add baggage implementation based on the W3C and OpenTelemetry specification (kata-containers#1967)
bbe2b8a3 Bump github.com/itchyny/gojq from 0.12.3 to 0.12.4 in /internal/tools (kata-containers#1971)
4949bf05 Bump github.com/cenkalti/backoff/v4 from 4.1.0 to 4.1.1 in /exporters/otlp/otlptrace (kata-containers#1972)
015b4c17 Bump github.com/cenkalti/backoff/v4 from 4.1.0 to 4.1.1 in /exporters/otlp (kata-containers#1970)
13eb12ac Bump github.com/prometheus/client_golang from 1.10.0 to 1.11.0 in /exporters/metric/prometheus (kata-containers#1974)
2371bb0a add otlp trace http exporter (kata-containers#1963)
a75ade4e sdk/resource: honor OTEL_SERVICE_NAME in fromEnv resource detector (kata-containers#1969)
aed45802 Bump go.opentelemetry.io/proto/otlp from 0.8.0 to 0.9.0 in /exporters/otlp/otlptrace (kata-containers#1959)
c4ebae6a Bump go.opentelemetry.io/proto/otlp (kata-containers#1960)
b1d2be3b Bump google.golang.org/grpc from 1.37.1 to 1.38.0 in /exporters/otlp/otlptrace (kata-containers#1958)
f6daea5e Generate semantic conventions according to specification latest tagged version (kata-containers#1933)
435a63b3 Bump github.com/google/go-cmp from 0.5.5 to 0.5.6 (kata-containers#1954)
6c46af66 Bump github.com/google/go-cmp from 0.5.5 to 0.5.6 in /exporters/trace/jaeger (kata-containers#1953)
4d294853 Bump actions/cache from 2.1.5 to 2.1.6 (kata-containers#1952)
dfe2b6f1 OTLP trace gRPC exporter (kata-containers#1922)
5a8f7ff7 Bump go.opentelemetry.io/proto/otlp from 0.8.0 to 0.9.0 in /exporters/otlp (kata-containers#1943)
bd935866 Add schema URL support to Tracer (kata-containers#1889)
c1f460e0 Update API configs. (kata-containers#1921)
270cc603 Small fixes on some Span method's documentation headers (kata-containers#1950)
8603b902 Fix typo in doc (kata-containers#1949)
acbb1882 Bump google.golang.org/grpc from 1.37.1 to 1.38.0 in /exporters/otlp (kata-containers#1942)
b1621501 Add codecov badge (kata-containers#1940)
ea1434c3 Fix some golint issues (kata-containers#1947)
0eeb8f87 Refactor Tracestate (kata-containers#1931)
d3b12808 Add Passthrough example (kata-containers#1912)
f06cace6 Add @MadVikingGod as a project Approver (kata-containers#1923)
ab5facb3 Bump github.com/golangci/golangci-lint in /internal/tools (kata-containers#1925)
d23cc61b Refactor configs (kata-containers#1882)
6324adaa Add tracer option argument to global Tracer function (kata-containers#1902)
035fc650 Do not include authentication information in the http.url attribute (kata-containers#1919)
d8ac212c Fix sporadic test failure in otlp exporter http driver (kata-containers#1906)
a3df00f4 Create .gitattributes (kata-containers#1920)
fb88e926 Bump google.golang.org/grpc from 1.37.0 to 1.37.1 in /exporters/otlp (kata-containers#1914)
1982dc46 Bump google.golang.org/grpc in /example/prom-collector (kata-containers#1915)
1759c630 Bump github.com/golangci/golangci-lint in /internal/tools (kata-containers#1916)
7342aa47 Bump google.golang.org/grpc in /example/otel-collector (kata-containers#1913)
21c16418 Add support for scheme in OTEL_EXPORTER_OTLP_ENDPOINT (kata-containers#1886)
5cb62636 Semantic Convention generation tooling (kata-containers#1891)
6219221f Move the unit package to the metric module (kata-containers#1903)
63e0ecfc Implement global default non-recording span (kata-containers#1901)
b6d5442f Remove the Tracer method from the Span API (kata-containers#1900)
ae85fab3 Document functional options (kata-containers#1899)
cabf0c07 Fix default Jaeger collector endpoint (kata-containers#1898)
1e3fa3a3 Bump go.opentelemetry.io/proto/otlp from 0.7.0 to 0.8.0 in /exporters/otlp (kata-containers#1872)
696af787 Bump github.com/benbjohnson/clock from 1.0.3 to 1.1.0 in /sdk/metric (kata-containers#1532)
97eea6c3 Fix some golint issues (kata-containers#1894)
79d9852e fix container port mismatch issue (kata-containers#1895)
d20e7228 CI builds validate against last two versions of Go, dropping 1.14 and adding 1.16 (kata-containers#1865)
cbcd4b1a Redefine ExportSpans of SpanExporter with ReadOnlySpan (kata-containers#1873)
c99d5e99 Split large jaeger span batch to admire the udp packet size limit (kata-containers#1853)
42a84509 Unembed SpanContext (kata-containers#1877)
b7d02db1 Add Status type to SDK (kata-containers#1874)
f90d0d93 Update README (kata-containers#1876)
a1349944 Update resource.go (kata-containers#1871)
f40cad5e Add markdown link check configuration and action (kata-containers#1869)
9bc28f6b Fix existing markdown lint issues (kata-containers#1866)
08f4c270 Add documentation for tracer.Start() (kata-containers#1864)
2bd4840c remove Set.Encoded(Encoder) enconding cache (kata-containers#1855)
7674eebf Removed different types of Detectors for Resources. (kata-containers#1810)
f92a6d83 Implement retry policy for the OTLP/gRPC exporter (kata-containers#1832)
ec75390f Fix BSP context done tests (kata-containers#1863)
8e55f10a Move the Event type from the API to the SDK (kata-containers#1846)
e399d355 drop failed to exporter batches and return error when forcing flush a span processor (kata-containers#1860)
f6a9279a Honor context deadline or cancellation in SimpleSpanProcessor.Shutdown (kata-containers#1856)
aeef8e00 Add markdown lint GitHub action (kata-containers#1849)
d4c8ffad Replace spaces to tabs in Go code snippets (kata-containers#1854)
cb097250 fixed typo (kata-containers#1857)
392a44fa Refine configuration design docs (kata-containers#1841)
62cd933d Handle Resource env error when non-nil (kata-containers#1851)
24a91628 Document the SSP is not for production use (kata-containers#1844)
ec26ac23 Update RELEASING.md (kata-containers#1843)
8eb0bb99 Fix golint issue caused by typo (kata-containers#1847)
ca130e54 Markdownlint (kata-containers#1842)
1144a83d Small typo fixes to existing CHANGELOG entries (kata-containers#1839)
e6086958 Update website_docs to v0.20.0 (kata-containers#1838)
0f4e454c Change NewSplitDriver paramater and initialization (kata-containers#1798)
Fixes kata-containers#2591
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
@oilbeater Then how about we don't move it to the netns, but instead add a new |
|
By the way, some months ago, I worked on an alternative approach using (passt)[https://passt.top]. This diagrams (only visible on a bright background, open image separately if needed) outlines the approach: and I prepared a PoC at https://passt.top/passt/tree/contrib/kata-containers. Mere days after finishing that patch, it wouldn't apply anymore before of a rework in kata-runtime. This has the advantage of being really agnostic to whatever architecture, not needing root, and it should be performance-wise comparable with a veth pair. I'm not active on this at the moment, I just left this behind: https://bugs.passt.top/show_bug.cgi?id=26. If somebody happens to play with this, I'll be more than happy to support them. |
|
@sbrivio-rh Thanks for the nice diagrams. Here my main focus is to let Kata integrate with different network devices (tap, vfio, vhost-user etc.) that may be provided by some VM-native CNIs (such as kubeovn). The main change of the proposal is about telling CNI and containerd to provide these network devices (and related nic setup information) when configured properly. passt looks to be a sound solution to unify vmm network interfaces. It can be complementary to the above direct attachable NIC proposal if it is able to handle tap/vfio/vhost-user devices. Can you elaborate a bit on that part? |
|
@bergwolf with passt you don't need a tap device, that's why I'm mentioning it here. You can have whatever interface you want inside the guest, as long as it can connect to a qemu socket back-end. That won't work for vfio, indeed. About vhost-user: we're working on adding it as a back-end, for performance reasons. But this is not so relevant in this sense -- the idea is just to abstract away the network interface between guest and host by making it pretty much transparent. The CNI can then do whatever it needs to do while ignoring the fact that there's a virtual machine. |
|
@sbrivio-rh We constantly get user requests to have tap/vfio/vhost-user/physical-nic etc. support in Kata Containers (for performance reasons obviously). While we can support it with some additional control path (such as the kata-runtime network subcommand), it is always our goal to solve it within the context of CNI. That's why I'm looking at the issue from time to time and would like to push it forward constantly. As for passt, IIUC, it replaces the TC+tap solution. But TC+tap also works for all CNI implementations. The CNI does not need to know if it is runc or kata running there. It seems that the advantage of passt is to allow rootless containers(including rootless crio/containerd/kata-runtime), am I understanding it correctly? It seems that passt is solving a different problem than the direct attachable CNI proposal here. |
Correct, that was the primary, failed (for the moment) goal. However,
...I was thinking that it might also help with this, because it transform the Kata Containers case into something that looks like a host container, and for that (I guess) you already have full CNI support for other cases requested by the user. But indeed, it wouldn't work with VFIO, so, if that's the current focus, my suggestion doesn't really apply. In terms of performance, passt might become useful, also in this perspective, the day it gains native vhost-user support, but that's not available at the moment. |
It sounds like the host network mode. As for now, kubelet will not call CNI if the pod runs with hostnetwork set to true. If we skip kubelet and set it from containerd side, the containerd can pass the host netns to CNI, and the CNI can just follow the usual process but at this time it will not move the tap to other ns. |
|
Hi community, I would like to give a feasible idea for this issue. As discussed in this thread and #4914, the tap devices in the host netns and vhost-user sockets should be attached to the hypervisor as well. Setting up dummy devices in the netns as the Kata containers 2.0 does is not a good idea. We use a file, instead of dummy devices, to exchange the network information because of flexibility and expandability. The first question: Where are those network configs stored? The name of netns identifies the network configs of a pod. Assumed that we have a netns, named The second question: What is the format of the network configs? There are two types at least, vhost-user-socket and tap in the host netns, to be supported. The content of the file is an array in the format of json as shown in the following. [{
"name": "example",
// the possible values are "vhost-user" or "host-tap"
"type": "vhost-user",
// !!todo: to be determined
"dev_conf": {
"path": "/var/run/openvswitch/vhost_sockets/sock"
},
"network_info": {
"interface": {
"ip_addrs": [{
"family": "v4",
"address": "192.168.1.1",
"mask": "16"
}],
"device": "device_name",
"hardware_addr": "xx:xx:xx:xx:xx",
"mtu": 1500,
"raw_flags": "0x11"
},
"routers": [{
"dest": "172.18.0.0/16",
"src": "172.18.0.1",
"gw": "172.18.31.1",
"scope": 1
}],
"neighbors": [{
"to_ip_addr": {
"family": "v4",
"address": "192.168.1.1",
"mask": "16"
},
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]The last question: How do the Kata containers collaborate with the CNI plugins to set up the network? The CNI plugins should set up net devices and save the network configs into the path we mentioned above. Before scanning the network devices in the netns, the Kata containers will load the json config and set up corresponding endpoints if the file exists. The process of setting up the network looks like this: |
|
We may use RuntimeConf.ContainerID instead of netns, so that we can support network without creating a new netns. // A RuntimeConf holds the arguments to one invocation of a CNI plugin
// excepting the network configuration, with the nested exception that
// the `runtimeConfig` from the network configuration is included
// here.
type RuntimeConf struct {
ContainerID string
NetNS string
IfName string |
Sounds great to me. One of the important things is to define the fields of the For "vhost-user", it might look like this. where the For "host-tap", it might look like this. where the |
As previously discussed at kata-containers#1922, the Kata containers as VM-based containers are allowed to run in the host netns. That is, the network is able to isolate in the L2. The network performance will benefit from this architecture, which eliminates as many hops as possible. We called it a Directly Attachable Network (DAN for short). The network devices are placed at the host netns by the CNI plugins. The configs are saved at `{dan_conf}/{sandbox_id}.json` in the format of JSON, including device name, type, and network info. As a very beginning stage, the DAN only supports host tap devices. More devices, like the DPDK, will be supported in later versions. By the way, a CNI plugin named "dantap" could set up a bridge and a tap device at the host netns. Please refer to [dantap](https://github.com/justxuewei/cni-plugins/tree/feat/dan/plugins/main/dantap). Fixes: kata-containers#1922 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
As previously discussed at kata-containers#1922, the Kata containers as VM-based containers are allowed to run in the host netns. That is, the network is able to isolate in the L2. The network performance will benefit from this architecture, which eliminates as many hops as possible. We called it a Directly Attachable Network (DAN for short). The network devices are placed at the host netns by the CNI plugins. The configs are saved at `{dan_conf}/{sandbox_id}.json` in the format of JSON, including device name, type, and network info. At a very beginning stage, the DAN only supports host tap devices. More devices, like the DPDK, will be supported in later versions. By the way, a CNI plugin named "dantap" could set up a bridge and a tap device at the host netns. Please refer to [dantap](https://github.com/justxuewei/cni-plugins/tree/feat/dan/plugins/main/dantap). Fixes: kata-containers#1922 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Removed useless methods from the Network trait. Moved checks of disabling netns from the sandbox to the runtime handler manager. Fixes: kata-containers#1922 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Removed useless methods from the Network trait. Moved checks of disabling netns from the sandbox to the runtime handler manager. Fixes: kata-containers#1922 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Removed useless methods from the Network trait. Moved checks of disabling netns from the sandbox to the runtime handler manager. Fixes: kata-containers#1922 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Added the dan_config field to the USER_VAR of the Makefile. Fixes: kata-containers#1922 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Added the dan_config field to the USER_VAR of the Makefile. Fixes: kata-containers#1922 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
As previously discussed at kata-containers#1922, the Kata containers as VM-based containers are allowed to run in the host netns. That is, the network is able to isolate in the L2. The network performance will benefit from this architecture, which eliminates as many hops as possible. We called it a Directly Attachable Network (DAN for short). The network devices are placed at the host netns by the CNI plugins. The configs are saved at `{dan_conf}/{sandbox_id}.json` in the format of JSON, including device name, type, and network info. At a very beginning stage, the DAN only supports host tap devices. More devices, like the DPDK, will be supported in later versions. By the way, a CNI plugin named "dantap" could set up a bridge and a tap device at the host netns. Please refer to [dantap](https://github.com/justxuewei/cni-plugins/tree/feat/dan/plugins/main/dantap). Fixes: kata-containers#1922 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Removed useless methods from the Network trait. Moved checks of disabling netns from the sandbox to the runtime handler manager. Fixes: kata-containers#1922 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Added the dan_config field to the USER_VAR of the Makefile. Fixes: kata-containers#1922 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Hi, it seems that when using the 'vhost-user' network type, the 'dev_conf' section in the configuration JSON file should also include 'queue_num' and 'queue_size' fields. Additionally, a 'mode' field is also expected if we want to configure the frontend as either a client or server. Could you please add the above support? |
@mzweilz Thanks for providing information! So, I would like to give two JSON templates for vhost-user and host-tap respectively. Please feel free to let me know if you have any question. [{
"device": {
"type": "vhost-user",
"path": "/var/run/openvswitch/vhost_sockets/sock",
"queue_num": 1,
"queue_size": 1,
// the possible values are "server" and "client", the default is "server"
"mode": "server"
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/16", "192.168.0.3/16"],
"mtu": 1500,
"flags": 0,
"ntype": "tuntap"
},
"routers": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0,
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
},
{
"device": {
"type": "host-tap",
"name": "dantap0"
},
"network_info": {
// snipped
}
}] |
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
[{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
[{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
[{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
[{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
[{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
[{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
[{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
[{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
{
"netns": "/path/to/netns",
"devices": [{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
}
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
{
"netns": "/path/to/netns",
"devices": [{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
}
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
{
"netns": "/path/to/netns",
"devices": [{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
}
```
Fixes: #1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
{
"netns": "/path/to/netns",
"devices": [{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
}
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).
The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.
The format of file looks like as below:
```json
{
"netns": "/path/to/netns",
"devices": [{
"name": "eth0",
"guest_mac": "xx:xx:xx:xx:xx",
"device": {
"type": "vhost-user",
"path": "/tmp/test",
"queue_num": 1,
"queue_size": 1
},
"network_info": {
"interface": {
"ip_addresses": ["192.168.0.1/24"],
"mtu": 1500,
"ntype": "tuntap",
"flags": 0
},
"routes": [{
"dest": "172.18.0.0/16",
"source": "172.18.0.1",
"gateway": "172.18.31.1",
"scope": 0,
"flags": 0
}],
"neighbors": [{
"ip_address": "192.168.0.3/16",
"device": "",
"state": 0,
"flags": 0,
"hardware_addr": "xx:xx:xx:xx:xx"
}]
}
}]
}
```
Fixes: kata-containers#1922
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>



Background
Kata Containers is an open source container runtime, building lightweight virtual machines that seamlessly plug into the containers ecosystem. It aims to bring the speed of a container and the security of a virtual machine to its users.
As Kata Containers matures, how it interacts with Kubernetes CNI and connects to the outside network, has become increasingly important. The issue covers the current status of the Kata Containers networking model, its pros and cons, and a proposal to further improve it.
This literrally revives kata-containers/runtime#592, with better explaination on why we need it and how it can be implemented.
Status
A classic CNI deployment would result in a networking model like below:

Where a pod sits inside a network namespace, and connects to the outside world via a veth pair. In order to work with this networking model, Kata Containers has implemented a TC based networking model.


Where inside the pod network namespace, a tap device tap0_kata is created and Kata sets up TC mirror rules to copy packages between eth0 and tap0_kata. The eth0 device is a veth pair endpoint and its peer is a veth device attached to the host bridge. So the data flow is like:
As we can see, there are as many as five jumps before a package can reach the guest on the host. The network stack jumps are costly and the architecture needs to be simplified.
Proposal
We can see that all Kata need is a tap device on the host, and it doesn't care how it is created (being it a tuntap, or a ovs tap, or a ipvtap, or a macvtap). So we can create a simple architecture and use tap devices (or similar devices) as the pod network setup entrypoint rather than veth pairs. Something like:

With this architecture, we can remove the need for a host network namespace, and the veth-pair to connect through it. And we don't care how the tap device is created so that CNI plugins can still have different implementation details hidden from us.
A possible control flow for the direct attachable CNIs:

To make it work, there are a few changes to CNI, containerd and Kata:
disable_cni_netnsand the direct attachable network capability of the underlying runtimenet_tap_device_capable = true,net_vhost_user_device_capable,net_vfio_device_capable, andnet_vfio_user_device_capableas well)."tapDevice": true, (and"vhostUserDevice": trueetc.) in thecapabilitiesfield.ADDcommand adds options like"tapDevice": true(and"vhostUserDevice": trueetc.) in theruntimeConfigfield.ADDcommand result adds a plugin-specific result field like (for vhost-user/vfio/vfio-user only):ADDcommand result) to kata in the very first taskCreateAPI of a pod.Example Workflow for CNI+Containerd+Kata in the Tap Device Case
runtimeConfigto includeADDcommand with emptyNetNSargument and with the followingPluginArgsargument: (FIXME: how do we handle IP allocation with kubeovn?)ADDresult to Kata in the taskCreateTaskRequeststruct.tap10to the vmm and sets it up in the guest via provided IP/route/mac/dns informationRelated Projects
There are a few CNI projects that are capable of creating tap devices, and thus can benefit from the direct attachable CNIs. Most notably those built upon OVS:
/cc @egernst @amshinde @fidencio
The text was updated successfully, but these errors were encountered: