Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: Windows unit tests #110981

Closed
wants to merge 13 commits into from

Conversation

claudiubelu
Copy link
Contributor

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

@claudiubelu: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/dependency Issues or PRs related to dependency changes area/ipvs area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Jul 6, 2022
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Jul 6, 2022
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 6, 2022
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@claudiubelu claudiubelu force-pushed the wip-unittests-3 branch 3 times, most recently from 7f7b3d0 to bf89d22 Compare July 7, 2022 14:44
@leilajal
Copy link
Contributor

leilajal commented Jul 7, 2022

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 7, 2022
@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 7, 2022
@claudiubelu
Copy link
Contributor Author

/test pull-ci-kubernetes-unit-windows

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 6, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 6, 2023
@claudiubelu
Copy link
Contributor Author

/test pull-ci-kubernetes-unit-windows

Currently, there are some unit tests that are failing on Windows due to
various reasons:

- getHostDNSConfig is reading a resolv.conf file. However, we don't have
  that on Windows. Instead, we can get the DNS server list and the DNS
  suffix list from Windows itself.

Based on the code from kubernetes/test/images/agnhost/dns/dns_windows.go
The module pkg/kubelet/winstats has almost no coverage for Windows. This
commit adds unit tests to cover the mentioned module.
The Windows file permissions / ACLs are a bit more complex and fine-grained
than the Linux file permissions. However, the Linux file permissions
could be translated to some extent to Windows ACLs.

We could translate the Linux user / group / other to the Windows SIDs:
Creator Owner ID / Creator Group ID / World, or more precisely,
S-1-3-0 / S-1-3-1 / S-1-1-0.

As for the permissions, we can use the Generic Access Rights:
GENERIC_READ / GENERIC_WRITE / GENERIC_EXECUTE.

Adds a Windows implementation of Chmod which takes into consideration
the details mentioned above. Adds a unit test which verifies that the
permissions are set and behave as expected.
Ports the atomic_writer unit tests to Windows.
Fixes a few other unit tests for Windows.
The path module has a few different functions:
Clean, Split, Join, Ext, Dir, Base, IsAbs. These functions do not
take into account the OS-specific path separator, meaning that they
won't behave as intended on Windows.

For example, Dir is supposed to return all but the last element of the
path. For the path "C:\some\dir\somewhere", it is supposed to return
"C:\some\dir\", however, it returns ".".

Instead of these functions, the ones in filepath should be used instead.
filepath.IsAbs does not consider "/" or "\" as absolute paths, even
though files can be addressed as such. [1][2]

Currently, there are some unit tests that are failing on Windows due to
this reason.

[1] https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats#traditional-dos-paths
[2] https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#fully-qualified-vs-relative-paths
The path module has a few different functions:
Clean, Split, Join, Ext, Dir, Base, IsAbs. These functions do not
take into account the OS-specific path separator, meaning that they
won't behave as intended on Windows.

For example, Dir is supposed to return all but the last element of the
path. For the path "C:\some\dir\somewhere", it is supposed to return
"C:\some\dir\", however, it returns ".".

Instead of these functions, the ones in filepath should be used instead.
Currently, when a Kubelet Plugin is being added in the DesiredStateOfWorld,
a timestamp is saved in the PluginInfo. This timestamp is then updated on
subsequent plugin reregistrations.

The Reconciler, when it detects different timestamps for a Plugin in its
DesiredStateOfWorld and ActualStateOfWorld, it will then trigger a Plugin unregister
and then a new Plugin registration.

Basically, the timestamp is being used to detect whether or not a Plugin needs to
be reregistered or not. However, this can be an issue on Windows, where the time
measurements are not as fine-grained. time.Now() calls within the same ~1-15ms
window will have the same timestamp. This can mean that Plugin Reregistration events
can be missed on Windows [1]. Because of this, some of the Plugin registration unit
tests fail on Windows.

This commit updates the behaviour, instead of relying on different timestamps,
the Reconciler will check the set PluginInfo UUID to detect a Plugin Reregistration.
With this change, the unit tests mentioned above will also pass on Windows.

[1] golang/go#8687
GetFileType is meant to return the type of the given file by using os.Stat.
However, os.Stat doesn't work on Windows for Unix Sockets, causing an error to occur:

[2-Socket Test] unexpected error :
CreateFile C:\Users\Administrator\AppData\Local\Temp\test-get-filetype-2776877299\mt.sock:
The file cannot be accessed by the system.

This is a known issue and we're already using a workaround for this in
pkg/kubelet/util/util_windows.go.

This commit fixes this issue for GetFileType on Windows.
Currently, there are some unit tests that are failing on Windows due to
various reasons:

- On Windows, spaces at the end of file names are automatically trimmed when opening files.
  Thus, "Continent/Zone " will actually refer to "Continent/Zone".
- Unix sockets on Windows may take a few seconds to become active. This may cause the test
  TestDevicePluginReRegistrationProbeMode to fail due to its exponential backoff times.
- tests in kuberuntime_container_windows_test.go fail on Nodes that have fewer than 3 CPUs,
  expecting the CPU max set to be more than 100% of available CPUs, which is not possible.
- calls in summary_windows_test.go are missing context.
- filterTerminatedContainerInfoAndAssembleByPodCgroupKey will filter and group container
  information by the Pod cgroup key, if it exists. However, we don't have cgroups on Windows,
  thus we can't make the same assertions.
- if a powershell command that could return an array (e.g.: Get-Disk) would return an array of
  only one element, powershell will in fact return that object directly, and **not** an array
  containing that element. In a few cases, these commands are used and their output is converted
  to json, after which they're unmarshalled in golang, with the expectation that the unmarshalled
  data to be an array. If it's not an array, we get an error.
- flexvolume coverts its paths to absolute paths, which means that on Windows the C:\ prefix
  will be added. This becomes an issue when prober.fs.Walk is called, which will join 2 absolute
  paths, both containing the C:\ prefix, resulting in an incorrect path.
- when mounting Block Devices, Windows expects the given source to be a Disk Number, not a path.
- for rbd_windows_test.go, we should start with Disk Number 0, which exists on all hosts.
- if a Disk has multiple volumes, Get-Volume doesn't return the volumes in the same order. This
  can result in various assertions failing.
- the pkg/volume/rbd/rdb_test.TestPlugin test expects that mounter.MountSensitive is called when
  attacher.MountDevice is called. The Windows attacher doesn't currently make that call.
Some of the pkg/controller/nodelifecycle unit tests are calling
doNoExecuteTainingPass and expect certain taints to be set appropriately.

However, the tainter works are rate limited, meaning that they may not
process all the nodes in the queue if not enough times elapses, resulting
in potential flaky tests.

This issue is even more obvious on Windows nodes where time keeping is
less precise (2 consecutive time.Now() calls may return the same timestamp
if called within a ~1-15ms window), meaning that the rate limiter will have
fewer necessary tokens to process all the queued nodes.

This commit adds the possibility to disable the rate limiter and disables
the rate limiter for tests in which doNoExecuteTaintingPass is called.
@claudiubelu
Copy link
Contributor Author

/test pull-ci-kubernetes-unit-windows

enforceRequirements will run preflight checks, including whether the user
is privileged is not. Because of this, the test will make different assertions
based on the user's UID. However, we don't have UIDs on Windows, so we're asserting
the wrong thing.

This fix addresses the issue.
@claudiubelu
Copy link
Contributor Author

/test pull-ci-kubernetes-unit-windows

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: claudiubelu
Once this PR has been reviewed and has the lgtm label, please assign liggitt for approval by writing /assign @liggitt in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@claudiubelu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-files-remake 7f7b3d01a46d925400cf6cd489ca47b775d721e6 link true /test pull-kubernetes-files-remake
pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2 2f53651befd8bbe1f73308732a459c8aee9700b5 link false /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
pull-kubernetes-e2e-capz-azure-file-vmss b7206f6 link false /test pull-kubernetes-e2e-capz-azure-file-vmss
pull-kubernetes-verify b7206f6 link true /test pull-kubernetes-verify
pull-kubernetes-e2e-gce-csi-serial b7206f6 link false /test pull-kubernetes-e2e-gce-csi-serial

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2023
@k8s-ci-robot
Copy link
Contributor

@claudiubelu: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@claudiubelu
Copy link
Contributor Author

Most unit tests on Windows are now green or skipped, so we don't need this anymore.

/close

@k8s-ci-robot
Copy link
Contributor

@claudiubelu: Closed this PR.

In response to this:

Most unit tests on Windows are now green or skipped, so we don't need this anymore.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SIG Node PR Triage automation moved this from Waiting on Author to Done Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloudprovider area/ipvs area/kubeadm area/kubectl area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
SIG Auth Old
Needs Triage
Development

Successfully merging this pull request may close these issues.

None yet

7 participants