Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pause image: Disable DiagTrack service on Windows image #95950

Merged
merged 1 commit into from Nov 5, 2020

Conversation

claudiubelu
Copy link
Contributor

What type of PR is this?

/kind bug

/sig windows
/sig node

What this PR does / why we need it:

It has been observed that the DiagTrack service in the pause image is consuming a non-trivial amount of CPU. We don't need
this service in the pause image, so we should disable it.

We can disable the service by running chntpw in a docker buildx Linux stage and then copy the SYSTEM file back to the final Windows image.

Co-Authored-By: Mark Rossetti marosset@microsoft.com

Which issue(s) this PR fixes:

Partially Fixes #95735

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/windows Categorizes an issue or PR as relevant to SIG Windows. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 28, 2020
@k8s-ci-robot
Copy link
Contributor

@claudiubelu: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Oct 28, 2020
@claudiubelu
Copy link
Contributor Author

/cc @marosset @dims

@marosset
Copy link
Contributor

We also have #95840 for tracking this issue specifically.

@marosset
Copy link
Contributor

/lgtm
/milestone v1.20

@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Oct 28, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2020
@dims
Copy link
Member

dims commented Oct 28, 2020

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 28, 2020
COPY --from=windows-base /Windows/System32/config/SYSTEM /windows/

RUN apk add chntpw
RUN printf "ed ControlSet001\Services\DiagTrack\Start\n\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very interesting technique :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were REALLY hoping to be able to do this and still build on Linux machines.

@dims
Copy link
Member

dims commented Oct 28, 2020

@claudiubelu one nit. LGTM

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2020
@dims
Copy link
Member

dims commented Oct 28, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2020
@dims
Copy link
Member

dims commented Nov 3, 2020

@marosset @claudiubelu - ooh! one more what happens if you just ensure that diagtrack related dll/exe are excluded from the final image? (delete the thing! or don't copy it over)

@marosset
Copy link
Contributor

marosset commented Nov 4, 2020

ooh! one more what happens if you just ensure that diagtrack related dll/exe are excluded from the final image? (delete the thing! or don't copy it over)

@dims It looks like this will actually work....!

I wrote some garbage into a file, named it diagtrack.dll, then copied it over the actual dll with
ADD diagtack.dll /Windows/System32/diagtrack.dll

When I exec into the detached pause image I see

C:\>sc.exe query diagtrack
[SC] EnumQueryServicesStatus:OpenService FAILED 1060:

The specified service does not exist as an installed service.


C:\>REG QUERY HKLM\SYSTEM\CurrentControlSet\Services\DiagTrack
ERROR: The system was unable to find the specified registry key or value.

C:\>REG QUERY HKLM\SYSTEM\CurrentControlSet\Services

<snip>
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\condrv
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\CryptSvc
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DCLocator
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DcomLaunch
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dfsc
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dhcp
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\disk
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dnscache
</snip>

It looks like the service never existed now - which is really strange...

@dims
Copy link
Member

dims commented Nov 4, 2020

LOL i swear i haven't touched a windows box for over a decade. So i have no clue :)

@claudiubelu
Copy link
Contributor Author

LOL i swear i haven't touched a windows box for over a decade. So i have no clue :)

You might have just found your true calling. :D Come, join us. :D

IMO, we could go with the solution proposed by dims. The image can be built: https://paste.ubuntu.com/p/kDJzrqtyxF/ , and checking the service on the newly built images on all the different hosts shows that the DiagTrack service is STOPPED: https://paste.ubuntu.com/p/7Vy6CzsqJp/

It has been observed that the DiagTrack service in the pause
image is consuming a non-trivial amount of CPU. We don't need
this service in the pause image, so we should disable it.

We can disable the service by running chntpw in a docker buildx Linux stage
and then copy the SYSTEM file back to the final Windows image.

Co-Authored-By: Mark Rossetti <marosset@microsoft.com>
Co-Authored-By: Davanum Srinivas <davanum@gmail.com>
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 4, 2020
@claudiubelu
Copy link
Contributor Author

/test pull-kubernetes-bazel-test

@dims
Copy link
Member

dims commented Nov 4, 2020

/approve

Do we need to update version #?

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 4, 2020
@claudiubelu
Copy link
Contributor Author

We've bumped the pause image version to 3.4 with the initial Windows support in the image. That version has not been published / promoted yet [1]. It would be a good question if we should promote pause:3.4 as is, or include this PR into it, especially considering the DiagTrack CPU consumption issue. @marosset , what do you think?

[1] https://github.com/kubernetes/k8s.io/blob/master/k8s.gcr.io/images/k8s-staging-kubernetes/images.yaml#L1273

@marosset
Copy link
Contributor

marosset commented Nov 4, 2020

Since we haven't published/promoted this yet and we this change will help with resource consumption I think we should include this change in pause:3.4

@marosset
Copy link
Contributor

marosset commented Nov 4, 2020

LGTM but someone else should probably add the label since I am listed as a co-author on this PR :)

@k8s-ci-robot
Copy link
Contributor

@marosset: GitHub didn't allow me to assign the following users: m2.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @m2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@marosset
Copy link
Contributor

marosset commented Nov 4, 2020

/assign @michmike

@jsturtevant
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2020
Copy link
Contributor

@michmike michmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: claudiubelu, dims, michmike

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@michmike
Copy link
Contributor

michmike commented Nov 5, 2020

good workaround :)

@marosset
Copy link
Contributor

marosset commented Nov 5, 2020

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 5, 2020
@k8s-ci-robot k8s-ci-robot merged commit 1fdd68f into kubernetes:master Nov 5, 2020
SIG-Windows automation moved this from In Progress (v1.20) to Done (v1.20) Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/stats/summary endpoints failing with "Context Deadline Exceeded" on Windows nodes with very high CPU usage
6 participants