Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Node network Part1 #499

Closed
wants to merge 24 commits into from
Closed

[WIP] Node network Part1 #499

wants to merge 24 commits into from

Conversation

@rmohr
Copy link
Member

rmohr commented Oct 9, 2017

Create a macvtap based "nodeNetwork". Such a special nodeNetwork is the same network the kubelet uses to register it's node. It allows a nested VM to potentially also register itself as a kubernetes node.

Partially implements #367

@rmohr rmohr requested review from davidvossel and vladikr Oct 9, 2017
@rmohr rmohr force-pushed the rmohr:node-network branch 2 times, most recently from b2476c2 to 199bfb3 Oct 10, 2017
Copy link
Member

davidvossel left a comment

Great work @rmohr

I made an initial pass over the changes and left a few comments.


func handleErr(err error) {
if err != nil {
fmt.Println(err)

This comment has been minimized.

Copy link
@davidvossel

davidvossel Oct 10, 2017

Member

print to stderr here instead?

Device: link.Name,
Mode: "bridge",
}
vm.Spec.Domain.Devices.Interfaces[i] = *newIf

This comment has been minimized.

Copy link
@davidvossel

davidvossel Oct 10, 2017

Member

This change to the VM spec seems like it could conflict with migrations. If the VM is migrated, the interface won't get re-detected because its been converted to type "direct".

toolDir string
}

func NewIntrospector(toolDir string) IntrospectorInterface {

This comment has been minimized.

Copy link
@davidvossel

davidvossel Oct 10, 2017

Member

maybe fast fail with a fatal error if we detect network helper binary isn't present?


Context("VirtualMachine With nodeNetwork definition given", func() {

It("should connect via DHCP to the node network", func() {

This comment has been minimized.

Copy link
@davidvossel

davidvossel Oct 10, 2017

Member

Since this test is primarily exercising egress connectivity, I think it would pass even with the previous default bridge network.

I'd like to see an ingress test that validates the VM is accessible within the node network.

@davidvossel

This comment has been minimized.

Copy link
Member

davidvossel commented Oct 10, 2017

@rmohr You marked this as "part1". Can you outline what you'd like to do in this iteration and what you'd like to follow up on with "part2"?

@rmohr rmohr force-pushed the rmohr:node-network branch from 199bfb3 to a4a49b3 Oct 16, 2017
@rmohr rmohr changed the title Node network Part1 [WIP] Node network Part1 Oct 16, 2017
@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Oct 17, 2017

@vladikr moving the whole network code out of virt-handler is now done. Next thing is code cleanup and adding more tests.

@rmohr rmohr changed the title [WIP] Node network Part1 [WIP] Node network Part1 [skip ci] Oct 18, 2017
@rmohr rmohr force-pushed the rmohr:node-network branch 2 times, most recently from a7e7947 to 2037dbd Oct 18, 2017
Copy link
Member

davidvossel left a comment

From a high level, I really like what I'm seeing here. I need to give this some more thought. The things I'm still contemplating revolve around the use of the libvirt domain metadata and the local persistent file store used by CNI.

Have you given any thought about how we would migrate a VM IPs during live migration?

hostPID: true
initContainers:
- name: containernetworking-cni-plugins
image: rmohr/cni-plugins

This comment has been minimized.

Copy link
@davidvossel

davidvossel Oct 18, 2017

Member

can we get this container into the kubevirt repo? I was super confused about where the dhcp plugin came from until I finally realized you had a special image you were using.

- name: dhcp
image: {{ docker_prefix }}/virt-handler:{{ docker_tag }}
command: ["/bin/sh", "-c"]
args: ["rm -rf /run/cni/dhcp.sock && nsenter -t 1 -n /tools/plugins/dhcp daemon"]

This comment has been minimized.

Copy link
@davidvossel

davidvossel Oct 18, 2017

Member

ha, I can't believe that usage of nsenter works. Basically you want the dhcp cni plugin daemon to run in the host network namespace, but there are other containers in the pod you don't want in the host network namespace.

I think we'll want to do this differently in the end.

This comment has been minimized.

Copy link
@davidvossel

davidvossel Oct 18, 2017

Member

also, this daemon is smart enough to continue to renew dhcp leases if gets restarted right?

This comment has been minimized.

Copy link
@rmohr

rmohr Oct 19, 2017

Author Member

ad nsenter: We can just run the whole Pid on the node network namespace, this is an artefact from a previous stage where I had the dhcp daemon attached to the virt-handler pod. Will change that.

ad smartness of the dhcp daemon: Right now it does not. I started to prepare patches to do that.

"github.com/containernetworking/plugins/pkg/ipam"
)

const socketPath = "/run/cni/dhcp.sock"

This comment has been minimized.

Copy link
@davidvossel

davidvossel Oct 18, 2017

Member

fyi, I think this is left over from some skeleton code you pulled in from container networking.

@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Oct 19, 2017

Have you given any thought about how we would migrate a VM IPs during live migration?

I see a few possibilities there, which depend on a few factors:

  • Do we want to introduce something like ADD_MIGRATE/DEL_MIGRATE?
  • Is the networking solution synchronized across node boundaries?
  • If the networking solution not synchronized, do we want to take some logic into virt-handler to allow hand overs as a kubevirt core functionality?
  • If the migration mode is post-copy or pre-copy

It is not yet clear for me, what I would want to pursue.

@rmohr rmohr changed the title [WIP] Node network Part1 [skip ci] [WIP] Node network Part1 Oct 19, 2017
@rmohr rmohr force-pushed the rmohr:node-network branch 3 times, most recently from 91763df to a2d0198 Oct 19, 2017
@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Oct 19, 2017

retest this please

types.NetConf
IPAM struct {
Type string `json:"type,omitempty"`
Via string `json:"via,omitempoty"`

This comment has been minimized.

Copy link
@stu-gott

stu-gott Oct 19, 2017

Member

typo: omitempty?

cacheDir := pflag.String("cache-dir", "/var/lib/cni/networks", "Location where the CNI plugins store their state")
pflag.StringVar(&hostname, "hostname-override", "", "Kubernetes Pod to monitor for changes")

if hostname == "" {

This comment has been minimized.

Copy link
@stu-gott

stu-gott Oct 19, 2017

Member

Parse() hasn't been called yet.

cnitool := networking.NewCNITool(*toolsDir, *cniDir, *cniConfigDir)

// Let's check if we need to adjust the node network
iface, err := networkIntrospector.GetLinkByName("kubevirt0", 1)

This comment has been minimized.

Copy link
@stu-gott

stu-gott Oct 19, 2017

Member

Can "kubevirt0" be a constant?

// No device means that we either deal with a node restart or the first deployment.
// Delete the CNI plugin cache. It is impossible that any VM can run at this point and
// we are not allowed to reuse anything
err := os.RemoveAll(*cacheDir)

This comment has been minimized.

Copy link
@stu-gott

stu-gott Oct 19, 2017

Member

This also deletes the cache directory itself. By the looks of it that's a system level thing like "/var/lib/cni/networks"...

CmdDel = "del"
)

func parseArgs(args string) ([][2]string, error) {

This comment has been minimized.

Copy link
@stu-gott

stu-gott Oct 19, 2017

Member

Is this code copy/pasted? Could it be a common utility function?

func SetNetConfMaster(cniConfigDir, name, master, via string) error {
b, err := ioutil.ReadFile(cniConfigDir + "/" + name)
if err != nil {
return fmt.Errorf("error reading config file %s", name)

This comment has been minimized.

Copy link
@stu-gott

stu-gott Oct 19, 2017

Member

Is it helpful to include the original err object in this message?

func GetDockerTag() string {
dockerTag := os.Getenv("docker_test_tag")
if dockerTag == "" {
dockerTag = "devel"

This comment has been minimized.

Copy link
@stu-gott

stu-gott Oct 19, 2017

Member

Is this always going to be true?

return &job
return tests.RenderJob("vm-killer", tests.GetDockerTag(),
[]string{
"pkill",

This comment has been minimized.

Copy link
@stu-gott

stu-gott Oct 19, 2017

Member

In what PID namespace is this pkill command run? Can this kill processes outside the test suite?

Mode: "bridge",
}
newIf.MAC = &api.MAC{MAC: result.Interfaces[0].Mac}
domain.Metadata.Interfaces.Interfaces[idx].IP = result.IPs[0].Address.IP.String()

This comment has been minimized.

Copy link
@davidvossel

davidvossel Oct 20, 2017

Member

@rmohr It took me awhile to understand exactly what is happening here, but I think I get it all now. There are two DHCP IP address requests going on in this current design.

  1. The local host's dhcp client daemon which attempts to reserve an IP for the mac address used by a VM.

  2. The VM guest itself will request the an IP over DHCP using the same MAC during initialization.

From our discussion earlier, the concern we have is that we aren't 100% guaranteed that the IP address returned and picked by request 1 and 2 will be identical. The two requests/responses are identical in our vagrant setups, but that's not a guarantee we can make across all networks.

The solution you and @vladikr are working on would involve a local DHCP server that would respond to guest VM DHCP requests. Essentially, the KubeVirt runtime would be asking the network for an IP address for a specific MAC, then we'd use a local DHCP server to pass that IP to the guest VM in order to ensure the right IP is used for the guest.

Is my understanding correct here?

rmohr added 7 commits Oct 6, 2017
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Accompany virt-handler with the CNI dhcp lease daemon. This allows
reusing the macvlan CNI plugin with the dhcp IPAM module.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
rmohr added 17 commits Oct 11, 2017
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Save metadata before we call the CNI plugins, do delete calls based on
the metadata.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
If no MAC is specified on the device, generate a MAC in the plugin and
pass it to the IPAM plugin.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Make sure that the kernel does not add a default route for the whole
subnet to the interface.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Use the store to ensure route clean up and to re-use already acquired
macs, if no specific macs are requested.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Delete dhcp client socket on every pod start.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
CNI plugins and configs are initiated and provided by an external
network provider to virt-handler. The external provider also performs
the initial setup of the host. virt-handler now only invokes CNI add/del
operations on the provided binaries and configs.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
* Fix volume mount error
* Instead of using nsenter, run everything in the node network namespace

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
@rmohr rmohr force-pushed the rmohr:node-network branch from 4093a8c to 9510466 Nov 7, 2017
@rmohr rmohr closed this Jan 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.