Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register command for the windows node does not register the node #25582

Closed
sowmyav27 opened this issue Feb 25, 2020 · 8 comments
Closed

Register command for the windows node does not register the node #25582

sowmyav27 opened this issue Feb 25, 2020 · 8 comments
Assignees
Labels
kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement status/blocker
Milestone

Comments

@sowmyav27
Copy link
Contributor

sowmyav27 commented Feb 25, 2020

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

  • Bring up a custom cluster - 3 linux nodes - 1 etcd, 1 control plane and 1 worker.
  • Register a windows node, by getting the command from the Rancher UI.
  • The command is stuck here.

Screen Shot 2020-02-25 at 12 37 41 PM

  • The node does not get registered.
  • Only see rancher-agent container Created

Screen Shot 2020-02-25 at 12 37 55 PM

Expected Result:
The register command should run. The node should be registered in the cluster.

Other details that may be helpful:
Docker logs on the rancher-agent container command also was stuck. No logs could be seen

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): master-head - commit id: 6085a90e2
  • Installation option (single install/HA): single

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): custom
  • Kubernetes version (use kubectl version):
1.17
@sowmyav27 sowmyav27 self-assigned this Feb 25, 2020
@sowmyav27 sowmyav27 added kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement status/blocker labels Feb 25, 2020
@sowmyav27 sowmyav27 added this to the v2.4 milestone Feb 25, 2020
@sowmyav27
Copy link
Contributor Author

sowmyav27 commented Feb 26, 2020

On further validation

Using this AMI:
Windows_Server-1903-English-Core-ContainersLatest-2020.02.12 (ami-01ce25dd3dd9591f0)
Windows cluster is deployed successfully in 2.3.5 and latest master-head

Using AMI:
Windows_Server-1903-English-Core-Base-2019.12.16
Windows cluster is deployed successfully in 2.3.5 but does not work in master-head.

Windows_Server-1903-English-Core-Base-2019.12.16 used to work before, definitely in the alpha release. It should work for master-head also.

@sowmyav27
Copy link
Contributor Author

sowmyav27 commented Feb 27, 2020

on latest master-head - commit id: 5e9cc17e8

  • Windows AMI used - Windows_Server-1903-English-Core-Base-2020.02.12 (AWS)
  • Cluster comes up successfully. - 1 linux all roles, 1 windows worker
  • User is able to deploy workloads.

The original issue is also seen on a 2.3.6-rc2 setup.
The node does not get registered in a Windows_Server-1903-English-Core-Base-2019.12.16 AMI.

@sangeethah
Copy link
Contributor

Removed "to-test" flag , to discuss impact on users upgrading from previous releases to 2.4 and 2.3.6 releases.

@deniseschannon
Copy link

@luthermonson Do you know how we build our Windows agent images? How did our agent images start building with the security patch?

@luthermonson
Copy link
Contributor

@deniseschannon @sangeethah we build them like all other arch agents in drone in a dapper windows container which we pull directly from a MS container repo

https://github.com/rancher/rancher/blob/master/package/windows/Dockerfile.agent
https://github.com/rancher/rancher/blob/master/package/windows/Dockerfile.agent#L68

@sowmyav27
Copy link
Contributor Author

Verified on an upgrade from rancher:2.3.5 from rancher:2.3.6-rc2

Scenario#1

  • With windows AMI - Windows_Server-1903-English-Core-Base-2019.12.16
  • Deploy custom cluster - 1 linux - all roles and 3 windows nodes- N1, N2 and N3.
  • Deploy workloads on the windows nodes.
  • Upgrade rancher to 2.3.6-rc3.
  • One of the windows -Agent node is stuck in Unavailable state. Issue seen is the original issue mentioned in this issue - Only see rancher-agent container Created.
  • Add a windows node - N4 - AMI - Windows_Server-1903-English-Core-Base-2020.02.12
  • Delete N1 node from the cluster.
  • Rancher agent on the newly added node N4 is deployed successfully.
  • Workload pods are removed from old node N1 and added to the new node N4.
  • However, any nodes "scheduled" to deploy on N1 will fail to re deploy, when N1 is deleted from the cluster.

Scenario#2

  • Deploy a custom cluster - 6 nodes - 3 nodes - 1 etcd, 1 control plane and 1 worker linux nodes. and 3 worker nodes - windows - AMI - Windows_Server-1903-English-Core-Base-2020.02.12
  • Cluster is deployed successfully.
  • Deploy few workloads.
  • Upgrade rancher to 2.3.6-rc3.
  • windows node Agent get upgraded successfully.
  • workloads can be deployed on the cluster.

@luthermonson
Copy link
Contributor

Documenting this issue more: The servercore build container we use comes from this: https://hub.docker.com/_/microsoft-windows-servercore which auto updated to include the patch which you can see here:
old KB# pre 2/12: https://hub.docker.com/layers/rancher/rancher-agent/v2.3.1-windows-1809/images/sha256-04d011c4cc21a0eb9b9e5c715d1adadbdeb017cfd19b27f14c5e5feed70f5d62?context=explore
new KB# post 2/12: https://hub.docker.com/layers/rancher/rancher-agent/master-head-windows-1809/images/sha256-b5a22f23615ba30e03c155b7b6d09390b34601ad31d0bd6a999af5a5c971352a?context=explore

Ultimately the KB article surrounding this issue is here: https://support.microsoft.com/en-us/help/4542617/you-might-encounter-issues-when-using-windows-server-containers-with-t

Since this says

We strongly recommend you update the container host to the February 11, 2020 security update

We're going to document the upgrade process and not change the container build process and push anyone using windows containers to make sure their host's are up to date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement status/blocker
Projects
None yet
Development

No branches or pull requests

5 participants