Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime-rs: Introduce directly attachable network #6091

Closed
wants to merge 3 commits into from

Conversation

justxuewei
Copy link
Member

@justxuewei justxuewei commented Jan 17, 2023

As previously discussed at #1922, the Kata containers as VM-based containers are allowed to run in the host netns. That is, the network is able to isolate in the L2. The network performance will benefit from this architecture, which eliminates as many hops as possible. We called it a Directly Attachable Network (DAN for short).

The network devices are placed at the host netns by the CNI plugins. The configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON, including device name, type, and network info. At the very beginning stage, the DAN only supports host tap devices. More devices, like the DPDK, will be supported in later versions. By the way, a CNI plugin named "dantap" could set up a bridge and a tap device at the host netns. Please refer to dantap.

Fixes: #1922

Signed-off-by: Xuewei Niu niuxuewei.nxw@antgroup.com

@katacontainersbot katacontainersbot added the size/huge Largest and most complex task (probably needs breaking into small pieces) label Jan 17, 2023
@katacontainersbot
Copy link
Contributor

Can one of the admins verify this patch?

@justxuewei justxuewei force-pushed the feat/host-network branch 3 times, most recently from 1faa599 to 29b81f2 Compare January 17, 2023 05:18
}

#[derive(Clone, Debug, Deserialize)]
pub struct IPAddress {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the types at first, but then I changed my mind since they are different things: The types defined at the agent crate ("agent types" for short) are used to communicate with the agent, and the types defined in this pull request ("DAN types" for short) are used to deserialize the DAN config. Although the two types look similar at this time, the DAN types' structure might change if the CNI plugins ask for it. Besides, the DAN types have different behaviors from the agent types, like different default values, etc. In summary, they should be independent, but the DAN types could be converted to the agent types.

/// plugins, is used to tell the Kata containers what devices are attached
/// to the hypervisor.
#[serde(default = "default_dan_conf")]
pub dan_conf: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this path configurable? If not, I think it's better to use const

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the dan_conf represents a base directory of the configs. It should be consistent with the CNI plugins. For example, if the CNI plugins save the configs at /tmp/dans, the dan_conf should be changed to the path simultaneously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think you could add this field into configuration.toml.inand Makefile, or annotation. Then, it will be configurable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

src/runtime-rs/crates/resource/src/manager.rs Outdated Show resolved Hide resolved
@justxuewei
Copy link
Member Author

Thanks for your code review, @Tim-0731-Hzt! I've pushed a new commit to address the abovementioned issues, PTAL!

@justxuewei justxuewei force-pushed the feat/host-network branch 3 times, most recently from e9fcf6a to 9f6c111 Compare January 18, 2023 07:25
@justxuewei
Copy link
Member Author

@liubin @bergwolf Could you help me run the integrated tests? Thanks!

@Tim-0731-Hzt
Copy link
Member

/test

@studychao
Copy link
Member

Hi @justxuewei ,could you help to rebase this PR and I could help to review

@justxuewei
Copy link
Member Author

Hi @justxuewei ,could you help to rebase this PR and I could help to review

Of course, I'll rebase this ASAP.

@YushuoEdge
Copy link
Contributor

Hi @justxuewei , maybe I don't understand well enough, so please forgive me if I'm not asking a proper question. ASAIK, support of this (full)feature requires the cooperation of containerd or cni. Can you provide information about the current containerd or which cnis have related support, and what support they need to do?

@justxuewei
Copy link
Member Author

justxuewei commented Mar 24, 2023

Hi @justxuewei , maybe I don't understand well enough, so please forgive me if I'm not asking a proper question. ASAIK, support of this (full)feature requires the cooperation of containerd or cni. Can you provide information about the current containerd or which cnis have related support, and what support they need to do?

Yes, the CNI plugins should be adapted if they want to use this feature. The containerd is completely compatible so far, but it would leave an unused netns. One of the goals of the DAN is to make the Kata containers connect to networks through devices in the host, instead of in the netns, to eliminate data copy costs. The current CNI plugins place devices into the netns designed for the runC containers. That is why the CNI plugins must be adapted. And I have provided an example of a CNI plugin called "dantap" to create a tap device into the host netns.

Next, I'll explain how the CNI plugins work. For the standard CNI plugins, they place the devices into a netns. Then, the Kata containers scan and attach all devices in that netns. This is a typical way of passing the networking information between the CNI and the containers. What if we don't have a netns? In that case, the plugins have to write information of network devices into a JSON file on the filesystem, just like below. Please refer to DirectlyAttachableNetworkDevice structure for more details.

[
  {
    "name": "dantap0",
    "type": "tap",
    "dev_conf": {...},
    "network_info": {...}
  }
]

The Kata containers read the file and know which devices need to be attached.

Just please feel free to let me know if you have any question about this.

@YushuoEdge
Copy link
Contributor

Hi @justxuewei , maybe I don't understand well enough, so please forgive me if I'm not asking a proper question. ASAIK, support of this (full)feature requires the cooperation of containerd or cni. Can you provide information about the current containerd or which cnis have related support, and what support they need to do?

Yes, the CNI plugins should be adapted if they want to use this feature. The containerd is completely compatible so far, but it would leave an unused netns. One of the goals of the DAN is to make the Kata containers connect to networks through devices in the host, instead of in the netns, to eliminate data copy costs. The current CNI plugins place devices into the netns designed for the runC containers. That is why the CNI plugins must be adapted. And I have provided an example of a CNI plugin called "dantap" to create a tap device into the host netns.

Next, I'll explain how the CNI plugins work. For the standard CNI plugins, they place the devices into a netns. Then, the Kata containers scan and attach all devices in that netns. This is a typical way of passing the networking information between the CNI and the containers. What if we don't have a netns? In that case, the plugins have to write information of network devices into a JSON file on the filesystem, just like below. Please refer to DirectlyAttachableNetworkDevice structure for more details.

[
  {
    "name": "dantap0",
    "type": "tap",
    "dev_conf": {...},
    "network_info": {...}
  }
]

The Kata containers read the file and know which devices need to be attached.

Just please feel free to let me know if you have any question about this.

I would like to know if there are already some CNIs that already use this feature? Or do you plan to raise PRs to those mainstream CNI plugins to be compatible with this feature?

BTW, I have looked through the disscussion in the issue page, it seems what @bergwolf proposed is easier to be integrated and accepted by CNIs plugins (for me🤣)

@justxuewei
Copy link
Member Author

justxuewei commented Mar 25, 2023

I would like to know if there are already some CNIs that already use this feature? Or do you plan to raise PRs to those mainstream CNI plugins to be compatible with this feature?

BTW, I have looked through the disscussion in the issue page, it seems what @bergwolf proposed is easier to be integrated and accepted by CNIs plugins (for me🤣)

The answer is no, since all the CNI plugins are built for runC. So it needs a new type of plugin to set up networks for VM-based containers. I think we should establish standards, and then push the CNI plugins that are interested in this forward.

This pull request completely follows the architecture proposed by @bergwolf. However, a key point that the architecture doesn't reflect is how to inform the containers which devices need to be attached.

Let me give you a more detailed example. Assume that we have two containers named "container0" and "container1". Each container has an exclusive tap device named "tap0" and "tap1" respectively. Those tap devices are placed into the host netns. In other words, the host netns has "tap0" and "tap1" as two devices.

// The ideal situation
container0 <--> tap0 <--> Internet
container1 <--> tap1 <--> Internet

If there is no other information, how could the containers know which one belongs to them? They would just scan the host netns and then each container will attach to all devices. This breaks the "exclusive" constraint. That's a disaster totally.

// The reality if the constraints are not applied 
container0 <--> tap0 <--> Internet
         ^----> tap1 <--> Internet
         ^----> yet-another-tap-device-in-the-host
container1 <--> tap0 <--> Internet
         ^----> tap1 <--> Internet
         ^----> yet-another-tap-device-in-the-host

So we must tell "container0": you should only connect to "tap0". And tell "container1": you should only connect to "tap1". That's why we need those JSON files. The architecture is not changed, but we are just telling the containers some extra networking information.

I admit that those CNI plugins that focus on VM-based containers need to be adapted. But it is worth doing. With this architecture, the Kata containers could support DPDK in the future, not just limited to the host tap devices.

  • New feature (devices in the host) --> New CNI plugins
  • Classical ways (devices in their netns) --> Compatible with existing CNI plugins

@YushuoEdge
Copy link
Contributor

I would like to know if there are already some CNIs that already use this feature? Or do you plan to raise PRs to those mainstream CNI plugins to be compatible with this feature?
BTW, I have looked through the disscussion in the issue page, it seems what @bergwolf proposed is easier to be integrated and accepted by CNIs plugins (for me🤣)

The answer is no, since all the CNI plugins are built for runC. So it needs a new type of plugin to set up networks for VM-based containers. I think we should establish standards, and then push the CNI plugins that are interested in this forward.

This pull request completely follows the architecture proposed by @bergwolf. However, a key point that the architecture doesn't reflect is how to inform the containers which devices need to be attached.

Let me give you a more detailed example. Assume that we have two containers named "container0" and "container1". Each container has an exclusive tap device named "tap0" and "tap1" respectively. Those tap devices are placed into the host netns. In other words, the host netns has "tap0" and "tap1" as two devices.

// The ideal situation
container0 <--> tap0 <--> Internet
container1 <--> tap1 <--> Internet

If there is no other information, how could the containers know which one belongs to them? They would just scan the host netns and then each container will attach to all devices. This breaks the "exclusive" constraint. That's a disaster totally.

// The reality if the constraints are not applied 
container0 <--> tap0 <--> Internet
         ^----> tap1 <--> Internet
         ^----> yet-another-tap-device-in-the-host
container1 <--> tap0 <--> Internet
         ^----> tap1 <--> Internet
         ^----> yet-another-tap-device-in-the-host

So we must tell "container0": you should only connect to "tap0". And tell "container1": you should only connect to "tap1". That's why we need those JSON files. The architecture is not changed, but we are just telling the containers some extra networking information.

I admit that those CNI plugins that focus on VM-based containers need to be adapted. But it is worth doing. With this architecture, the Kata containers could support DPDK in the future, not just limited to the host tap devices.

  • New feature (devices in the host) --> New CNI plugins
  • Classical ways (devices in their netns) --> Compatible with existing CNI plugins

Fair enough! Thank you very much for your detailed explanation!

As previously discussed at
kata-containers#1922, the Kata
containers as VM-based containers are allowed to run in the host netns. That is,
the network is able to isolate in the L2. The network performance will benefit
from this architecture, which eliminates as many hops as possible. We called it
a Directly Attachable Network (DAN for short).

The network devices are placed at the host netns by the CNI plugins. The configs
are saved at `{dan_conf}/{sandbox_id}.json` in the format of JSON, including
device name, type, and network info. At a very beginning stage, the DAN only
supports host tap devices. More devices, like the DPDK, will be supported in
later versions. By the way, a CNI plugin named "dantap" could set up a bridge
and a tap device at the host netns. Please refer to
[dantap](https://github.com/justxuewei/cni-plugins/tree/feat/dan/plugins/main/dantap).

Fixes: kata-containers#1922

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Removed useless methods from the Network trait. Moved checks of disabling
netns from the sandbox to the runtime handler manager.

Fixes: kata-containers#1922

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Added the dan_config field to the USER_VAR of the Makefile.

Fixes: kata-containers#1922

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
@justxuewei
Copy link
Member Author

Hi @studychao @YushuoEdge, I've just rebased the branch onto the latest main branch, could you help me run the ci testing?

@YushuoEdge
Copy link
Contributor

/test

@justxuewei justxuewei closed this Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
runtime-rs size/huge Largest and most complex task (probably needs breaking into small pieces)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Direct Attachable CNIs For Kata Containers
5 participants