Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.6] Automatic creation/sync of services <-> workload ports permanently lost after 2.5.8->2.6.0 #34639

Closed
GiGurra opened this issue Sep 3, 2021 · 11 comments
Assignees
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release release-note Note this issue in the milestone's release notes status/dev-validate status/has-dependency
Milestone

Comments

@GiGurra
Copy link

GiGurra commented Sep 3, 2021

Rancher Server Setup

  • Rancher version: 2.6.0
  • Installation option (Docker install/Helm Chart): docker standalone/single instance
  • Proxy/Cert Details: letsencrypt

Information about the Cluster

  • Kubernetes version: 1.20.8
  • Cluster Type (Local/Downstream): imported digitalocean managed kubernetes

Describe the bug
after upgrading from 2.5.8 -> 2.6.0, services are no longer automatically created/removed when configuring ports in the new "edit deployment config" gui. All existing services were seemingly preserved in the upgrade, but all corresponding deployments ended up in "do not create service" in the workload config editor. I can no longer edit the ports in any workloads, if I try to set them to "cluster IP" i get errors saying "service already exists". If I manually attempt to delete the service objects, they are instantly automatically recreated. If I add new ports to the workload config, these changes are not automatically added to the service objects.

To Reproduce
just guessing:

  1. create a 2.5.8 cluster
  2. create deployment with cluster ip ports
  3. create ingress to it
  4. upgrade to 2.6.0
  5. Check if port config under deployment ended up in "do not create service"
  6. try to change to "create service" or add any new ports. None of these settings have any affect on service objects (automatic service <-> workload sync is permanently lost)

manually Deleting all ports, then the service object, and then re-adding ports to the deployment config doesnt help. (actually it creates a new service, even though the webpage says "service already exists", but then it ends up in the same de-synced state as explained above where you cant edit anything on the workload)

Result
automatic service <-> workload sync is permanently lost. I can delete all ports on workload and services are still kept. This is a security issues in some sense. Can no longer reliably create port mappings in deployment config gui, so starting from 2.6.0. I must manually create services separately.

Expected Result
Same functionality as in previous rancher < 2.6, that is, you can edit ports on workloads and services are edited automatically.

Other info
Have been using rancher since 1.x days and just about every single 2.x version. Never seen this issue before, and it is one of the worst issues in terms of usability Ive had so far. I would not dare upgrading 2.5.x production clusters to 2.6.0 risking hundreds of deployments to lose sync with service configurations/services maintained by rancher.

@GiGurra GiGurra changed the title Automatic creation/sync of services to all workloads permanently lost after upgrading 2.5.8->2.6.0 [2.6] Automatic creation/sync of services to all workloads permanently lost after upgrading 2.5.8->2.6.0 Sep 3, 2021
@GiGurra
Copy link
Author

GiGurra commented Sep 3, 2021

I can verify that if I create entirely new deployments AFTER the upgrade from 2.5.8->2.6.0, service<->deployment port config is synchronized and automatic (both adding and removing ports work fine), but for all deployments that existed from prior to the upgrade, sync is permanently lost

@GiGurra GiGurra changed the title [2.6] Automatic creation/sync of services to all workloads permanently lost after upgrading 2.5.8->2.6.0 [2.6] Automatic creation/sync of services <-> workload ports permanently lost after 2.5.8->2.6.0 Sep 3, 2021
@GiGurra
Copy link
Author

GiGurra commented Sep 3, 2021

Well this is bad... I cannot create services for the workloads even manually (if svc name same as workload name). They are auto reverted to the configuration that existed before the rancher upgrade. Even if I delete all ports from workloads and then delete/edit the services, the services are reverted back to their pre 2.6 config (after a few minutes)

@GiGurra
Copy link
Author

GiGurra commented Sep 4, 2021

This is even worse.. even if I delete the entire workload and recreate it again, the issue persists.
It seems like rancher remembers the workload name, and all workload names ever used before are broken.

@anupama2501
Copy link
Contributor

Reproduced this on v2.5.8 to v2.6.0 rancher server.

  • Created a downstream rke1 cluster on v2.5.8
  • Create a few wokrloads with port ClusterIP, Nodeport, Hostport
  • Created ingress for each of the workload
  • Upgraded from v2.5.8 to v2.6.0 rancher version.
  • Workloads have ports Do not create a service status

Screen Shot 2021-09-07 at 2 39 13 PM

- If we edit the config by removing the previous ports and add a new port with service type `ClusterIP` , the config fails with the error:
type: "error", links: {}, code: "AlreadyExists", message: "services "nodeport1" already exists",…}
code: "AlreadyExists"
links: {}
message: "services \"nodeport1\" already exists"
status: 409

This is even worse.. even if I delete the entire workload and recreate it again, the issue persists.
It seems like rancher remembers the workload name, and all workload names ever used before are broken.

This is happening as the Service discovery entry for workload is present in the Services tab. If we delete the SD entry and re-create the workload, able to create the deployment.

Issues seen:

  • Services should be deleted when the corresponding workload is deleted
  • Workload edit config should let add new ports [if workload is created with a service type port eg cluster IP on v2.6.0 and when edited if we add new ports with the same service type, this is not erroring out].

@anupama2501 anupama2501 added this to the v2.6.1 milestone Sep 7, 2021
@anupama2501 anupama2501 added the kind/bug Issues that are defects reported by users or that we know have reached a real release label Sep 7, 2021
@zube zube bot removed the [zube]: Next Up label Sep 8, 2021
@cbron
Copy link
Contributor

cbron commented Sep 17, 2021

Waiting on that UI issue ^, once that is merged this can go to test. At this point there is no dev work to be done.

@deniseschannon
Copy link

Since both issues linked here have been labeled to test in the UI, can we please test to see if this is still an issue @kinarashah before we verify and move it over to QA to finalize testing on this use case?

@kinarashah
Copy link
Member

kinarashah commented Sep 21, 2021

We should release-note this for users upgrading to 2.6.1, all legacy ui services will need to be deleted manually when workloads are deleted rancher/dashboard#4212 (comment)

On upgrade, if workloads created from legacy UI are deleted, the corresponding services won't be deleted. Users will have to manually clean them up. UI now shows a text msg (screenshot here rancher/dashboard#4212 (comment)) to inform users when they're deleting such workloads.

@kinarashah kinarashah added the release-note Note this issue in the milestone's release notes label Sep 21, 2021
@brendarearden brendarearden self-assigned this Sep 28, 2021
@kinarashah
Copy link
Member

Validation update:

This issue needs to be tested for 3 issues mentioned in rancher/dashboard#4159, out of which first one is failing for the following scenario:

  • Create workload in old ui without passing any ports
  • In new UI, try to edit workload and add port
  • Edit/save is blocked with an error msg: service already exists

@gaktive
Copy link
Member

gaktive commented Sep 29, 2021

Per rancher/dashboard#4159 (comment), UI is running into a backend issue tied to some Norman vs. Steve differences. Not sure if we need to spawn a new ticket to look into that but for UI, there is a workaround to allow editing at least through the old UI.

@kinarashah
Copy link
Member

Fresh Install v2.6.1-rc7:

  • Fresh Install v2.6.1-rc7 single node docker Install
  • Create workload in Ember with clusterIP, service for workload gets created automatically
  • Edit Config the workload in Vue, the ports show up correctly as clusterIP
  • Add a new port clusterIP/NodePort, no error shows up and Edit goes through successfully
  • Delete the workload, Warning shows up in UI saying this is a legacy workload and service needs to be deleted manually

Upgrade v2.5.8 to v2.6.1-rc7:

  • Fresh Install v2.5.8 single node docker Install
  • Create workload with clusterIP port, service for workload gets created automatically
  • Upgrade to v2.6.1-rc7
  • Edit Config the workload, the ports show up correctly as clusterIP
  • Add a new port clusterIP/NodePort, no error shows up and Edit goes through successfully
  • Delete the workload, Warning shows up in UI saying this is a legacy workload and service needs to be deleted manually

Upgrade v2.5.8 to v2.6.0 to v2.6.1-rc7:

  • Fresh Install v2.5.8 single node docker Install
  • Create workload with clusterIP port, service for workload gets created automatically
  • Upgrade to v2.6.0
  • Edit Config the workload, the service shows up as Do not create service
  • Try to add clusterIP port, save gives an error mentioned in the issue, service already exists
  • Try to save multiple times, gives same error.
  • Upgrade to v2.6.1-rc7
  • Edit Config the workload, the ports show up correctly as clusterIP
  • Add a new port clusterIP/NodePort, no error shows up and Edit goes through successfully
  • Delete the workload, Warning shows up in UI saying this is a legacy workload and service needs to be deleted manually

Screen Shot 2021-09-29 at 7 23 33 PM

Screen Shot 2021-09-29 at 7 23 24 PM

@kinarashah
Copy link
Member

@sowmyav27 ^ This issue has been validated, ok to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release release-note Note this issue in the milestone's release notes status/dev-validate status/has-dependency
Projects
None yet
Development

No branches or pull requests

10 participants