Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenStack] Use new hash format in instance names #10557

Merged
merged 4 commits into from
Jan 13, 2021

Conversation

zetaab
Copy link
Member

@zetaab zetaab commented Jan 11, 2021

So this will modify the instance name format that we will see in OpenStack instance names AND in Kubernetes cluster. The current problem that we have is that if we delete instance from OpenStack and run kops update cluster --yes - the new instance will come back with same name. However, the OpenStack instanceid is changed and it leads to problems (there is mismatch between Kubernetes API node instanceid and real OpenStack instanceid). The current way to fix this is to delete node from Kubernetes API and restart OpenStack instance.

old format: <ig name>-<index>-<clustername>
new format: <ig name>-<6 chars random hash>

tested with following cases:

with new code

  • create new cluster calc.k8s.local
% kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
master-zone-1-kivf21   Ready    master   2m16s   v1.19.5
master-zone-2-8hxz8y   Ready    master   2m29s   v1.19.5
master-zone-3-yv9sdw   Ready    master   2m25s   v1.19.5
nodes-zone-1-nm2gcp    Ready    node     67s     v1.19.5
nodes-zone-2-ryxop5    Ready    node     64s     v1.19.5
  • run kubectl update cluster without changing cluster config (no changes, kops update cluster --name calc.k8s.local)
  • update k8s version in cluster config and run update cluster (no changes, kops update cluster --name calc.k8s.local)
  • run rolling-update (kops rolling-update cluster calc.k8s.local --yes)
% kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
master-zone-1-chvtmx   Ready    master   28m     v1.19.6
master-zone-2-n7zs4y   Ready    master   21m     v1.19.6
master-zone-3-wmskzc   Ready    master   11m     v1.19.6
nodes-zone-1-miqpky    Ready    node     6m23s   v1.19.6
nodes-zone-2-bujc69    Ready    node     48s     v1.19.6
  • scale down instancegroup nodes from 1 -> 0 (should delete one node, kops edit ig nodes-zone-2 && kops update cluster --name calc.k8s.local)
% kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
master-zone-1-chvtmx   Ready    master   32m     v1.19.6
master-zone-2-n7zs4y   Ready    master   26m     v1.19.6
master-zone-3-wmskzc   Ready    master   16m     v1.19.6
nodes-zone-1-miqpky    Ready    node     11m     v1.19.6
  • delete cluster

with old code create and then updating with newer code

  • create new cluster with old naming format (using old code)
% kubectl get nodes
NAME                             STATUS   ROLES    AGE     VERSION
master-zone-1-1-calc-k8s-local   Ready    master   2m38s   v1.19.5
master-zone-2-1-calc-k8s-local   Ready    master   2m35s   v1.19.5
master-zone-3-1-calc-k8s-local   Ready    master   2m2s    v1.19.5
nodes-zone-1-1-calc-k8s-local    Ready    node     94s     v1.19.5
nodes-zone-2-1-calc-k8s-local    Ready    node     90s     v1.19.5
  • compile new code and use it
  • run kubectl update cluster without changing cluster config (no changes, kops update cluster --name calc.k8s.local)
  • update k8s version in cluster config and run update cluster (no changes, kops update cluster --name calc.k8s.local)
  • run rolling-update (kops rolling-update cluster calc.k8s.local --yes)
% kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
master-zone-1-orb9gv   Ready    master   31m     v1.19.6
master-zone-2-diw28n   Ready    master   18m     v1.19.6
master-zone-3-srsk15   Ready    master   12m     v1.19.6
nodes-zone-1-lhgo7i    Ready    node     6m51s   v1.19.6
nodes-zone-2-gyziid    Ready    node     2m12s   v1.19.6
  • scale down instancegroup nodes from 1 -> 0 (should delete one node, kops edit ig nodes-zone-2 && kops update cluster --name calc.k8s.local)
% kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
master-zone-1-orb9gv   Ready    master   33m     v1.19.6
master-zone-2-diw28n   Ready    master   20m     v1.19.6
master-zone-3-srsk15   Ready    master   14m     v1.19.6
nodes-zone-1-lhgo7i    Ready    node     8m35s   v1.19.6
  • delete cluster

with old code create and then scaling down with new code

  • create new cluster with old naming format (using old code)
% kubectl get nodes
NAME                             STATUS   ROLES    AGE     VERSION
master-zone-1-1-calc-k8s-local   Ready    master   2m20s   v1.19.6
master-zone-2-1-calc-k8s-local   Ready    master   2m32s   v1.19.6
master-zone-3-1-calc-k8s-local   Ready    master   2m34s   v1.19.6
nodes-zone-1-1-calc-k8s-local    Ready    node     82s     v1.19.6
nodes-zone-2-1-calc-k8s-local    Ready    node     85s     v1.19.6
  • compile new code and use it
  • scale down instancegroup nodes from 1 -> 0 (should delete one node, kops edit ig nodes-zone-2 && kops update cluster --name calc.k8s.local)
% kubectl get nodes
NAME                             STATUS   ROLES    AGE     VERSION
master-zone-1-1-calc-k8s-local   Ready    master   4m17s   v1.19.6
master-zone-2-1-calc-k8s-local   Ready    master   4m29s   v1.19.6
master-zone-3-1-calc-k8s-local   Ready    master   4m31s   v1.19.6
nodes-zone-1-1-calc-k8s-local    Ready    node     3m19s   v1.19.6
  • delete cluster

this code is backward compatible with old clusters with old format

@zetaab zetaab requested a review from olemarkus January 11, 2021 17:28
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/provider/openstack Issues or PRs related to openstack provider labels Jan 11, 2021
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 11, 2021
@zetaab
Copy link
Member Author

zetaab commented Jan 11, 2021

seems that tests will fail, I will fix tests tomorrow

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 11, 2021
@zetaab
Copy link
Member Author

zetaab commented Jan 12, 2021

@olemarkus @rifelpet could you guys help me here, why tests are failing? I cannot find the reason for that, some fields missing from dummy data?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zetaab

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zetaab
Copy link
Member Author

zetaab commented Jan 12, 2021

I do not understand why its trying to create two instances with same name? If I add debug https://github.com/kubernetes/kops/blob/master/cloudmock/openstack/mockcompute/servers.go#L193 I can see that it will create two nodes always with same name!? Its not working like that when I execute it against real OpenStack

@olemarkus
Copy link
Member

Does that happen prior to this PR? I tried to duplicate that on master, but doesn't seem to happen there.

@olemarkus
Copy link
Member

Just wondering if Find actually can find anything using the mock. The new behaviour expects a tag to be set, but we don't set metadata in the mock.

@zetaab
Copy link
Member Author

zetaab commented Jan 12, 2021

@zetaab
Copy link
Member Author

zetaab commented Jan 12, 2021

@olemarkus now it works! What is your opinion about this?

Ping @mitch000001

@olemarkus
Copy link
Member

I think this looks good.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2021
Copy link
Contributor

@mitch000001 mitch000001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just some minor naming issues :)

pkg/model/openstackmodel/servergroup.go Outdated Show resolved Hide resolved
pkg/resources/openstack/floatingip.go Show resolved Hide resolved
upup/pkg/fi/cloudup/openstacktasks/instance.go Outdated Show resolved Hide resolved
upup/pkg/fi/cloudup/openstacktasks/instance.go Outdated Show resolved Hide resolved
@olemarkus
Copy link
Member

/hold

In case you want to implement the requested changes

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2021
@mitch000001
Copy link
Contributor

looks good to me

@zetaab
Copy link
Member Author

zetaab commented Jan 13, 2021

@olemarkus can you recheck (and remove hold if ok)

@mitch000001
Copy link
Contributor

@zetaab is that a change we want to also cherry-pick for 1.19?

@zetaab
Copy link
Member Author

zetaab commented Jan 13, 2021

@mitch000001 no idea, do we want it already in 1.19? I am fine with that if someone wants it

@olemarkus
Copy link
Member

/hold cancel
/lgtm

I think this needs to bake a while, so a bit too late for 1.19.

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jan 13, 2021
@k8s-ci-robot k8s-ci-robot merged commit fb0fbb5 into kubernetes:master Jan 13, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Jan 13, 2021
@zetaab zetaab deleted the feature/hashname branch January 13, 2021 20:35
@kciredor
Copy link

Missed this one, but while upgrading to kops 1.20 from 1.19 on OpenStack I found out ;-) I understand this solves a problem, but it also means that the server name does not include the cluster name anymore. Meaning a quick openstack server list does not show which vm belongs to which cluster (It's not uncommon to have multiple clusters per OpenStack project). Ofcourse it's easy to display vm's in more detail, but still. What if introducing the hash would not imply dropping the cluster name?

@mitch000001
Copy link
Contributor

I think the introduced change does more than just adding a hash to the name. The prior problem was that the server name also contained the order in the instance group, e.g. if the instance group has 3 members (size 3) then the first node was called ig-name-1-cluster-name, the second ig-name-2-cluster-name and so on. Secondly the hostnames may only be 64 characters normally, so if your cluster name was too long you also ran into problems.
If you "just" want to have the cluster name in place to make it easier to find servers, I mean, I don't want to just point out greping for it (as we do) but I am not convinced that this outweighs the solved problems with this PR.
Having said that, this also is a preparation on enabling rolling update features later on like detaching nodes from the management of kOps (surging, +1/-1) which is only possible by stripping the instance group number because otherwise we can't easily create a replacement node because names could collide.

@kciredor
Copy link

Thanks @mitch000001 this makes a lot of sense now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/openstack Issues or PRs related to openstack provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants