Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure pod network for launcher from virt-handler #2837

Merged
merged 58 commits into from
Feb 22, 2020

Conversation

booxter
Copy link
Contributor

@booxter booxter commented Oct 24, 2019

What this PR does / why we need it: this PR moves networking configuration phase from virt-launcher to virt-handler. This should eventually allow to remove NET_ADMIN from launcher pods, after libvirtd is capable of operating in unprivileged mode.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Special notes for your reviewer:

Release note:

Launcher pod network is now configured by virt-handler. Only DHCP server is managed by virt-launcher.

TODO:
1. Make bridge, masquerade, slirp, sriov interfaces to boot with all network configuration (except DHCP) happening in virt-handler.
2. Fix unit test failures.
3. Fix functional test failures.

Not in scope of this PR: moving DHCP server to handler (has its own complexities better to consider separately - clean up, management of the thread etc.); removing NET_ADMIN from virt-launcher. (Requires an updated libvirt but should be easy overall.)

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: no Indicates the PR's author has not DCO signed all their commits. size/XXL kind/build-change Categorizes PRs as related to changing build files of virt-* components labels Oct 24, 2019
@booxter
Copy link
Contributor Author

booxter commented Oct 24, 2019

/hold

This is NOT ready. This is merely a start, but it allows to boot a VMI using masquerade pod networking and get it booted (even if without DHCP allocated address, yet). More patches are coming.

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 24, 2019
Copy link
Contributor

@alonSadan alonSadan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small thing

pkg/virt-handler/vm.go Outdated Show resolved Hide resolved
@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 26, 2019
@SchSeba SchSeba self-requested a review November 3, 2019 15:31
@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 11, 2019
@kubevirt-bot kubevirt-bot added dco-signoff: no Indicates the PR's author has not DCO signed all their commits. and removed dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Nov 14, 2019
@booxter
Copy link
Contributor Author

booxter commented Nov 14, 2019

The latest upload fixes remaining issues with masquerade mode where DHCP server was advertising incorrect default gateway routes that broke external connectivity. (The issue was that IP address of the router was not interpreted as IPv4 string-of-bytes but as IPv6 string-of-bytes which resulted in gateways set to values like "0.0.0.0" and "0.0.255.255" in addition to the correct IPv4 address.) This upload also fixes a segmentation fault in bridge mode.

Overall, this version is now at a point where VMIs can be booted and can reach outside.

@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Nov 14, 2019
@booxter booxter force-pushed the net-admin-in-handler branch 2 times, most recently from 661a9ce to dd79a52 Compare November 19, 2019 21:42
@booxter
Copy link
Contributor Author

booxter commented Nov 19, 2019

@phoracek @qinqon this PR got close enough for me to not be completely embarrassed by its shape. If you could find some time to check it, I would appreciate it. Let me know what you think about where this is going. Note this is not a complete solution to remove NET_ADMIN from launcher pods, just one piece. Other pieces are: move DHCP server out of launcher; validate latest RHEL libvirt with backports allows to preconfigure tap device for qemu; finally, remove NET_ADMIN from launcher.

This is just a first step that moves most network operations, but not everything, into handler.

@booxter
Copy link
Contributor Author

booxter commented Nov 19, 2019

/hold

Need to fix generated files (apparently my make generate screwed something up comparing to what's in CI) and also unit test coverage is probably affected because phase2 is not triggered from relevant test files.

@rmohr
Copy link
Member

rmohr commented Nov 20, 2019

@booxter first thanks for tackling this difficult topic.

Other pieces are: move DHCP server out of launcher

Regarding to DHCP, this is an interesting sub-problem. I am thinking here about situations where virt-handler is not present, because it is e.g. updated. We would then probably not have a dhcp server available for all VMIs at that time.

@booxter
Copy link
Contributor Author

booxter commented Nov 20, 2019

One other thing to consider re: this PR is that right now all launchers will share storage with VIF / interface cache files, which is information leak. Perhaps the template for launcher should have a separate and specific mount point for a directory shared with handler that would contain only the files that belong to this particular launcher.

But, if we finally merge phase2 into phase1 and leave the whole network configuration to handler, then there is no need for a shared directory at all, and this becomes a moot point.

@booxter
Copy link
Contributor Author

booxter commented Feb 20, 2020

/hold cancel

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 20, 2020
Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this once we get the os thread locking thing fixed. Like i said in a previous comment, i'm okay with merging the unoptimized solution and following up with a optimization afterwards. This PR is already so complex i don't see any reason to block it any further.

@booxter
Copy link
Contributor Author

booxter commented Feb 20, 2020

Is there some new issue with Travis? It failed on both platforms, one failure seems completely unrelated; another is in unit tests though I couldn't reproduce it. Is there a way to re-trigger the job? I don't have creds to do it on travis myself.

@vladikr
Copy link
Member

vladikr commented Feb 21, 2020

Is there some new issue with Travis? It failed on both platforms, one failure seems completely unrelated; another is in unit tests though I couldn't reproduce it. Is there a way to re-trigger the job? I don't have creds to do it on travis myself.

Not sure why did it fail. I've restarted the job.

@openshift-ci-robot
Copy link
Collaborator

@booxter: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e 4196d7a link /test e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@davidvossel
Copy link
Member

Is there some new issue with Travis? It failed on both platforms, one failure seems completely unrelated; another is in unit tests though I couldn't reproduce it. Is there a way to re-trigger the job? I don't have creds to do it on travis myself.

i re-triggered it.

@rmohr
Copy link
Member

rmohr commented Feb 21, 2020

/lgtm
/approve

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 21, 2020
@rmohr
Copy link
Member

rmohr commented Feb 21, 2020

/override ci/prow/e2e

that one should not be reported.

@kubevirt-bot
Copy link
Contributor

@rmohr: Overrode contexts on behalf of rmohr: ci/prow/e2e

In response to this:

/override ci/prow/e2e

that one should not be reported.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rmohr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 21, 2020
@rmohr
Copy link
Member

rmohr commented Feb 21, 2020

/retest pull-kubevirt-e2e-k8s-1.15.1-ceph

@rmohr
Copy link
Member

rmohr commented Feb 21, 2020

/retest

1 similar comment
@vladikr
Copy link
Member

vladikr commented Feb 21, 2020

/retest

@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Feb 21, 2020

@booxter: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-e2e-k8s-1.17.0 4b5c610 link /test pull-kubevirt-e2e-k8s-1.17.0
pull-kubevirt-e2e-k8s-1.14 4196d7a link /test pull-kubevirt-e2e-k8s-1.14
pull-kubevirt-e2e-k8s-1.15-ceph 4196d7a link /test pull-kubevirt-e2e-k8s-1.15-ceph
pull-kubevirt-e2e-k8s-1.16 4196d7a link /test pull-kubevirt-e2e-k8s-1.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

super happy to see this going in. great work @booxter !

@davidvossel
Copy link
Member

/test pull-kubevirt-e2e-k8s-1.15.1-ceph

@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot kubevirt-bot merged commit 58faac5 into kubevirt:master Feb 22, 2020
@rmohr
Copy link
Member

rmohr commented Feb 22, 2020

Yay! 👏

@booxter
Copy link
Contributor Author

booxter commented Feb 24, 2020

Thanks to all reviewers that helped with pushing this. I appreciate your patience with multiple revisions of the PR. With your help, it's a lot less rough than I originally proposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/build-change Categorizes PRs as related to changing build files of virt-* components lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants