Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network: Extract setup and status update logic from the virt-handler package #6781

Merged
merged 12 commits into from
Dec 23, 2021

Conversation

EdDev
Copy link
Member

@EdDev EdDev commented Nov 11, 2021

What this PR does / why we need it:

Decouple network setup and status logic out of the virt-handler.
The network setup is now handled by the NetConf object and the status updates by NetStat.

Production code and tests moved under the network package, leaving only a slim check that both setup and status are actually called in the overall flow.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:
Following PR/s should continue and:

  • Simplify the NentConf/Setup signature by embedding DoNetNS into the setup itself.
  • Remove some other network related logic from vm.go (e.g. checkNetworkInterfacesForMigration).
  • Refactor and simplified network code further.

Release note:

NONE

@kubevirt-bot kubevirt-bot added the release-note-none Denotes a PR that doesn't merit a release note. label Nov 11, 2021
@EdDev EdDev marked this pull request as draft November 11, 2021 09:33
@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L labels Nov 11, 2021
@EdDev
Copy link
Member Author

EdDev commented Nov 11, 2021

/sig network

@EdDev
Copy link
Member Author

EdDev commented Nov 11, 2021

/cc @AlonaKaplan @qinqon @maiqueb

@EdDev
Copy link
Member Author

EdDev commented Nov 11, 2021

/retest

@EdDev
Copy link
Member Author

EdDev commented Nov 14, 2021

/test all

@EdDev EdDev marked this pull request as ready for review November 14, 2021 12:20
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2021
@kubevirt-bot kubevirt-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL and removed size/XL labels Nov 24, 2021
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2021
@maiqueb
Copy link
Contributor

maiqueb commented Nov 24, 2021

/assign @maiqueb

Internal logic of the network setup is decoupled from the virt-handler
unit tests by using a stub.

The existing tests can check now that the network setup is invoked as
before, but without the need to know the internals of how the network
setup works.

Signed-off-by: Edward Haas <edwardh@redhat.com>
Signed-off-by: Edward Haas <edwardh@redhat.com>
Signed-off-by: Edward Haas <edwardh@redhat.com>
Signed-off-by: Edward Haas <edwardh@redhat.com>
Signed-off-by: Edward Haas <edwardh@redhat.com>
Signed-off-by: Edward Haas <edwardh@redhat.com>
Keep a test that checks the scenario where the network status update is
called from the main flow.

Signed-off-by: Edward Haas <edwardh@redhat.com>
For consistency, the (semi) reverse operation of the setup is now called
teardown.

The method rename is introduced with documentation to clarify that the
teardown involves only the cache cleanup, and not the removal of the
network entities (these are removed as part of the pod/container
destruction).

Signed-off-by: Edward Haas <edwardh@redhat.com>
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2021
@EdDev
Copy link
Member Author

EdDev commented Dec 14, 2021

change: Answered comment.

@maiqueb
Copy link
Contributor

maiqueb commented Dec 14, 2021

change: Answered comment.

/lgtm

thanks!

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2021
Copy link
Member

@AlonaKaplan AlonaKaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 1 review

}

func (c *Controller) Teardown(vmi *v1.VirtualMachineInstance, do func() error) error {
c.setupCompleted.Delete(vmi.UID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be safer to move the Delete to be after the do. Is there a special reason for the current order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once teardown is called, the cache should be cleared even if the operation partially fails. Otherwise, we will have leftovers.
But again, this is following the current logic.

return fmt.Errorf("setup failed, err: %w", err)
}

c.setupCompleted.Store(id, struct{}{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since sync.Map is used I understand the code here should be thread safe. The suggested code is not thread safe. Multiple threads can perform the setup simultaneously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mimicking the existing code, nothing logically changed.

The thread-safety if for the cache itself, as multiple threads can access that "database". The setup itself is thread safe as only one VMI can be treated at a time, so there is nothing special to protect here.

return neterrors.CreateCriticalNetworkError(fmt.Errorf("failed to set up vhost-net device, %s", err))
return d.networkController.Setup(vmi, isolationRes.Pid(), isolationRes.DoNetNS, func() error {
if requiresDeviceClaim {
if err := d.claimDeviceOwnership(rootMount, "vhost-net"); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move rootMount := isolationRes.MountRoot() outside the if?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order not to bind (called closure) the whole isolation object to the function, just the needed data. I wanted to take only the minimum needed.

@@ -591,7 +597,6 @@ var _ = Describe("VirtualMachineInstance", func() {
Expect(mockQueue.GetRateLimitedEnqueueCount()).To(Equal(0))
_, err := os.Stat(mockWatchdog.File(vmi))
Expect(os.IsNotExist(err)).To(BeFalse())
Expect(controller.phase1NetworkSetupCache.Size()).To(Equal(1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't substitute with Expect(controller.networkController.SetupCompleted(vmi)).To(BeTrue())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw no reason why we do need to check the state of the network cache in this test.

Do you see any reason for it?

@@ -664,7 +669,6 @@ var _ = Describe("VirtualMachineInstance", func() {
mockHotplugVolumeMounter.EXPECT().Mount(gomock.Any()).Return(nil)
controller.Execute()
testutils.ExpectEvent(recorder, VMIDefined)
Expect(controller.phase1NetworkSetupCache.Size()).To(Equal(1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer, I do not see why we need to check this in this test context. The creation of the domain should be checked, not if there network setup was invoked.
There are dedicated tests for networking.

@@ -2582,8 +2585,7 @@ var _ = Describe("VirtualMachineInstance", func() {
PodIP: podIPs[0],
PodIPs: podIPs,
}
err = controller.networkCacheStoreFactory.CacheForVMI(vmi).Write(interfaceName, podCacheInterface)
Expect(err).ToNot(HaveOccurred())
controller.networkController.CachePodInterfaceVolatileData(vmi, interfaceName, podCacheInterface)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test used the non-volatile cache. Do we have unit test coverage for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-volatile was used to inject data to see if reports the right thing. I do not think the intention was to check this cache, but to check that the status update does its thing.

So replacing it with the volatile data is both faster and closer to the intention here.

Per what I saw, how the non-volatile cache is used is already covered by other tests, like the ones under https://github.com/kubevirt/kubevirt/blob/main/pkg/network/setup/network_test.go .

Looking at the reporting/status part, at the end of the refactoring some scenarios are checked against the volatile cache and some against the non-volatile one.
But I think this is a small part, as the non-volatile and volatile caches interactions is simple. Maybe it is worth adding a specific test that checks that the non-volatile cache is populated in the volatile cache on first usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a test to cover the last point here.


c.networkController = netsetup.NewController(c.networkCacheStoreFactory)
c.networkController = netsetup.NewController(netcache.NewInterfaceCacheFactory())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you pass the cache factory as a parameter to the NewController and not initialize it inside the controller?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason it was done until now... it needs to be stubbed/mocked in the tests with one that does not use the filesystem. See here.

@AlonaKaplan
Copy link
Member

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AlonaKaplan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 21, 2021
@EdDev
Copy link
Member Author

EdDev commented Dec 21, 2021

/retest

2 similar comments
@EdDev
Copy link
Member Author

EdDev commented Dec 21, 2021

/retest

@EdDev
Copy link
Member Author

EdDev commented Dec 21, 2021

/retest

@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@EdDev
Copy link
Member Author

EdDev commented Dec 22, 2021

/retest

2 similar comments
@EdDev
Copy link
Member Author

EdDev commented Dec 22, 2021

/retest

@EdDev
Copy link
Member Author

EdDev commented Dec 22, 2021

/retest

@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@dhiller
Copy link
Contributor

dhiller commented Dec 23, 2021

/retest

@kubevirt-bot kubevirt-bot merged commit d01219d into kubevirt:main Dec 23, 2021
@EdDev EdDev deleted the refactor-network-setup branch December 28, 2021 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/network size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants