Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingress controller has labels in pods #30045

Closed
sowmyav27 opened this issue Nov 12, 2020 · 7 comments
Closed

Ingress controller has labels in pods #30045

sowmyav27 opened this issue Nov 12, 2020 · 7 comments
Assignees
Labels
kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement
Milestone

Comments

@sowmyav27
Copy link
Contributor

sowmyav27 commented Nov 12, 2020

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):
Upgrade usecase:

  • On 2.4.8, deploy a DO cluster with Project Isolation enabled.
  • create workload and ingress ing01 pointing to this
  • Upgrade Rancher to 2.5.2
  • Notice user ingresses go to Initializing state.
  • The ingress still has only one worker node's ip address under loadBalancer when doing a View/Edit in Yaml for the ingress
  • Ingress-controller's pods have podName label on and off
  • disable and then enable project network isolation
  • field.cattle.io/podName: nginx-ingress-controller-bfnkt label on the ingress controller is seen.
  • And User Ingresses are stuck in Initializing state.
  • Rancher logs in debug mode:
2020/11/12 20:02:19 [DEBUG] podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel map[app:ingress-nginx controller-revision-hash:7649c75c8c field.cattle.io/podName:nginx-ingress-controller-2r8pd pod-template-generation:1] in ingress-nginx
2020/11/12 20:02:19 [DEBUG] netpolMgr: delete: existing=nil, err=networkPolicy.networking.k8s.io "ingress-nginx/hp-nginx-ingress-controller-2r8pd" not found
2020/11/12 20:02:19 [DEBUG] podHandler: Sync: {TypeMeta:{Kind:Pod APIVersion:v1} ObjectMeta:{Name:nginx-ingress-controller-2r8pd GenerateName:nginx-ingress-controller- Namespace:ingress-nginx SelfLink:/api/v1/namespaces/ingress-nginx/pods/nginx-ingress-controller-2r8pd UID:c7437eac-db2f-446b-ad3f-93faec7be429 ResourceVersion:26731 Generation:0 CreationTimestamp:2020-11-12 19:29:32 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[app:ingress-nginx controller-revision-hash:7649c75c8c pod-template-generation:1] Annotations:map[prometheus.io/port:10254 prometheus.io/scrape:true] OwnerReferences:[{APIVersion:apps/v1 Kind:DaemonSet Name:nginx-ingress-controller UID:cc372d02-48ab-4082-b5e2-c974431a0ebb Controller:0xc0099ff579 BlockOwnerDeletion:0xc0099ff57a}] Finalizers:[] ClusterName: ManagedFields:[{Manager:kube-controller-manager Operation:Update APIVersion:v1 Time:2020-11-12 19:29:32 +0000 UTC FieldsType:FieldsV1 FieldsV1:&FieldsV1{Raw:*<redacted>

Expected Result:

  • field.cattle.io/podName: nginx-ingress-controller-bfnkt label on the ingress controller should NOT be seen

Other details that may be helpful:

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.4.8 to 2.5.2
  • Installation option (single install/HA): HA

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): RKE DO
  • Kubernetes version (use kubectl version):
1.18.10-rancher1-2
@sowmyav27 sowmyav27 added the kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement label Nov 12, 2020
@sowmyav27 sowmyav27 added this to the v2.5.3 milestone Nov 12, 2020
@UberKuber
Copy link

@sowmyav27, the process I've tested now is as per below:

  1. Disable project network isolation on the cluster
  2. Redeploy nginx ingress controller daemonset
  3. Enable project network isolation on the cluster

I have tested this process against multiple downstream clusters behind Rancher v2.5.2 and Rancher v2.4.10 and it's resolved the issue from what I can pick up.

@maggieliu maggieliu modified the milestones: v2.5.3, v2.4.11 Nov 12, 2020
@al45tair
Copy link

Personally, I've just installed my patched ingress-nginx into our cluster again (here is the patch).

This resolves the problem by making the ingress controller ignore Rancher's podName label.

@kinarashah
Copy link
Member

kinarashah commented Nov 18, 2020

For testing:

Fresh Install Rancher v2.4-head and v2.5-head

  • Create cluster with Project Network Isolation Enabled >=2 worker nodes.
  • Confirm pods of nginx-ingress-controller in system project don't have podName labels.
  • Turn on debug, redeploy nginx-ingress-controller and confirm the pods don't get updated without cause. Can be confirmed by looking for podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel in logs.
  • Create user ingress, confirm the status.LoadBalancer shows IPs of all the worker nodes.

Upgrade Scenario:

  • Rancher v2.4.8 or earlier, Create cluster with Project Network Isolation Enabled >=2 worker nodes.
  • Create few network policies. These would be in addition to the ones Rancher creates on default for project network isolation.
  • Create few user ingresses. status.LoadBalancer might not have IPs of all the worker nodes.
  • Upgrade to v2.4-head/v2.5-head/corresponding RCs
  • Confirm pods of nginx-ingress-controller in system project don't have podName labels.
  • Turn on debug, redeploy nginx-ingress-controller and confirm the pods don't get updated without cause. Can be confirmed by looking for podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel in logs.
  • Delete pods of nginx-ingress-controller in system project one by one. Confirm user ingresses don't go to Initializing state.
  • Confirm user network policies remain as is and don't get deleted.
  • On user ingresses, confirm the status.LoadBalancer shows IPs of all the worker nodes.
  • Check Kubelet logs to confirm there are no unlimited SyncLoop updates on nginx ingress.

Note: There are a few ways Ingress could be redeployed.

  • Redeploy from UI / kubectl rollout restart ds nginx-ingress-controller -n ingress-nginx: These could temporarily let user ingresses go to Initializing state. But they should become active after all the pods are Active and running.
  • Delete pods of the ds manually and wait for the new pod to be running, then delete the next pod. This doesn't result into user ingresses go to Initializing state. (AFAIK, needs confirmation).

Upgrade Scenario from v2.4.9/v2.4.10/v2.5.2:

  • Rancher v2.4.8 or earlier, Create cluster with Project Network Isolation Enabled >=2 worker nodes.
  • Create few network policies. These would be in addition to the ones Rancher creates on default for project network isolation.
  • Create few user ingresses. status.LoadBalancer might not have IPs of all the worker nodes.
  • Upgrade to v2.4.9/v2.4.10/v2.5.2
  • Bug is reproduced
  • Upgrade to v2.4-head/v2.5-head/corresponding RCs
  • Confirm pods of nginx-ingress-controller in system project don't have podName labels.
  • Turn on debug, redeploy nginx-ingress-controller and confirm the pods don't get updated without cause. Can be confirmed by looking for podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel in logs.
  • Delete pods of nginx-ingress-controller in system project one by one. Confirm user ingresses don't go to Initializing state.
  • Confirm user network policies remain as is and don't get deleted.
  • On user ingresses, confirm the status.LoadBalancer shows IPs of all the worker nodes.
  • Check Kubelet logs to confirm there are no unlimited SyncLoop updates on nginx ingress.

@sowmyav27
Copy link
Contributor Author

sowmyav27 commented Nov 18, 2020

Fresh Install use case - On 2.4-head - commit id: 6a09f523f

  • Create cluster with Project Network Isolation Enabled 1 etcd/control and 2 worker nodes.
  • Pods of nginx-ingress-controller in system project do not have podName labels.
  • In Debug mode, redeploy nginx-ingress-controller and confirm the pods don't get updated without cause. podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel is NOT seen in logs.
  • Deploy user ingress, verified the status.LoadBalancer shows IPs of all the worker nodes - 2 worker nodes.
status:
  loadBalancer:
    ingress:
    - ip: <ip-1>
    - ip: <ip-2>

Upgrade from 2.4.8 to 2.4-head

On 2.4.8

  • Deploy a cluster - 2 worker nodes, 1 etcd/control plane node
  • Deploy a workload and an ingress pointing to the workload.
  • User ingress has only one worker node's ip address in status.LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: <wk01>
  • Create network policies using doc
  • Upgrade rancher to 2.4-head
  • User ingress in Initializing state. And does not recover
  • User ingress has no worker nodes' ip address in status.LoadBalancer
status:
  loadBalancer: {}
  • podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel --> is seen in the logs
2020/11/19 02:57:49 [DEBUG] podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel map[app:ingress-nginx controller-revision-hash:7649c75c8c field.cattle.io/podName:nginx-ingress-controller-p79hj pod-template-generation:1] in ingress-nginx
  • pods of nginx-ingress-controller in system project don't have podName labels
  • Redeploy nginx-ingress-controller. Wait for the workload and pods to come to Active
  • User ingress is seen in Active state.
  • pods of nginx-ingress-controller in system project don't have podName labels
  • User ingress has both the worker nodes' ip address in status.LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: <wk01>
    - ip: <wk02>
  • User created network policies do not get deleted
  • Kubelet logs on the worker node do not show repeated Sync Loop updates/messages

Upgrade from 2.4.8 to 2.4-head - Scenario#2

On 2.4.8

  • Deploy a cluster - 2 worker nodes, 1 etcd/control plane node
  • Deploy a workload and an ingress pointing to the workload.
  • User ingress has only one worker node's ip address in status.LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: <wk01>
  • Create network policies using doc
  • Upgrade rancher to 2.4.10
  • Bug is reproduced
  • Upgrade rancher to 2.4-head - commit id: 6a09f523f
  • User ingress in Initializing state. And does not recover
  • User ingress has no worker nodes' ip address in status.LoadBalancer
status:
  loadBalancer: {}
  • pods of nginx-ingress-controller in system project don't have podName labels
  • Redeploy nginx-ingress-controller. - by deleting the pods of ingress controller. Wait for the workload and pods to come to Active
  • User ingress is seen in Active state.
  • pods of nginx-ingress-controller in system project don't have podName labels
  • User ingress has both the worker nodes' ip address in status.LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: <wk01>
    - ip: <wk02>
  • User created network policies do not get deleted
  • Kubelet logs on the worker node do not show repeated Sync Loop updates/messages
  • Delete pods of the ds ingress controller manually and wait for the new pod to be running, then delete the next pod. This doesn't result into user ingresses go to Initializing state.

@leflambeur
Copy link

Thanks, @kinarashah and @sowmyav27 for the quick turnaround on this :)

@sowmyav27
Copy link
Contributor Author

sowmyav27 commented Nov 21, 2020

Fresh Install use case - On 2.5.3-rc1

  • Create cluster with Project Network Isolation Enabled 1 etcd/control and 2 worker nodes.
  • Pods of nginx-ingress-controller in system project do not have podName labels.
  • In Debug mode, redeploy nginx-ingress-controller and confirm the pods don't get updated without cause. podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel is NOT seen in logs.
  • Deploy user ingress, verified the status.LoadBalancer shows IPs of all the worker nodes - 2 worker nodes.
status:
  loadBalancer:
    ingress:
    - ip: <ip-1>
    - ip: <ip-2>

Upgrade from 2.5.2 to 2.5.3-rc1

On 2.5.2

  • Deploy a cluster - 2 worker nodes, 1 etcd/control plane node
  • Enable cluster level monitoring
  • Deploy a workload and an ingress pointing to the workload.
  • User ingress has only one worker node's ip address in status.LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: <wk01>
  • ingress controller pods in the system project have podName labels
  • Create network policies using doc
  • Upgrade rancher to 2.5.3-rc1
  • User ingress in Initializing state. And does not recover
  • User ingress has no worker nodes' ip address in status.LoadBalancer
status:
  loadBalancer: {}
  • podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel --> is seen in the logs
2020/11/19 02:57:49 [DEBUG] podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel map[app:ingress-nginx controller-revision-hash:7649c75c8c field.cattle.io/podName:nginx-ingress-controller-p79hj pod-template-generation:1] in ingress-nginx
  • pods of nginx-ingress-controller in system project don't have podName labels
  • Redeploy nginx-ingress-controller. Wait for the workload and pods to come to Active
  • User ingress is seen in Active state.
  • pods of nginx-ingress-controller in system project don't have podName labels
  • User ingress has both the worker nodes' ip address in status.LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: <wk01>
    - ip: <wk02>
  • User created network policies do not get deleted
  • Kubelet logs on the worker node do not show repeated Sync Loop updates/messages
  • cluster metrics : Around 20.30 upgrade happened in rancher setup

Screen Shot 2020-11-21 at 8 59 46 PM

Upgrade from 2.5.1 to 2.5.2 to 2.5.3-rc1

On 2.5.1

  • Deploy a cluster - 3 worker nodes, 1 etcd/control plane node
  • Enable cluster level monitoring
  • Deploy a workload and an ingress pointing to the workload.
  • User ingress has only one worker node's ip address in status.LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: <wk01>
  • Create networkpolicies
  • Upgrade to 2.5.2
  • Bug will be reproduced.
  • Upgrade to 2.5.3-rc1
  • podHandler: addLabelIfHostPortsPresent: deleting podNameFieldLabel --> is seen in the logs
  • User ingress in Initializing state. And does not recover
  • pods of nginx-ingress-controller in system project don't have podName labels
  • Redeploy nginx-ingress-controller. - by deleting the pods of ingress controller. Wait for the workload and pods to come to Active
  • User ingress is seen in Active state.
  • pods of nginx-ingress-controller in system project don't have podName labels
  • User ingress has both the worker nodes' ip address in status.LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: <wk01>
    - ip: <wk02>
    - ip: <wk03>
  • User created network policies do not get deleted
  • Kubelet logs on the worker node do not show repeated Sync Loop updates/messages
  • Cluster metrics: 21.40 - upgrade to 2.5.2 and 21.52 - upgrade to 2.5.3

Screen Shot 2020-11-21 at 9 59 38 PM

@sowmyav27
Copy link
Contributor Author

Closing this as its been validated on 2.5.3 and 2.4 branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants