[OpenStack] Use new hash format in instance names #10557

zetaab · 2021-01-11T17:28:51Z

So this will modify the instance name format that we will see in OpenStack instance names AND in Kubernetes cluster. The current problem that we have is that if we delete instance from OpenStack and run kops update cluster --yes - the new instance will come back with same name. However, the OpenStack instanceid is changed and it leads to problems (there is mismatch between Kubernetes API node instanceid and real OpenStack instanceid). The current way to fix this is to delete node from Kubernetes API and restart OpenStack instance.

old format: <ig name>-<index>-<clustername>
new format: <ig name>-<6 chars random hash>

tested with following cases:

with new code

create new cluster calc.k8s.local

% kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
master-zone-1-kivf21   Ready    master   2m16s   v1.19.5
master-zone-2-8hxz8y   Ready    master   2m29s   v1.19.5
master-zone-3-yv9sdw   Ready    master   2m25s   v1.19.5
nodes-zone-1-nm2gcp    Ready    node     67s     v1.19.5
nodes-zone-2-ryxop5    Ready    node     64s     v1.19.5

run kubectl update cluster without changing cluster config (no changes, kops update cluster --name calc.k8s.local)
update k8s version in cluster config and run update cluster (no changes, kops update cluster --name calc.k8s.local)
run rolling-update (kops rolling-update cluster calc.k8s.local --yes)

% kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
master-zone-1-chvtmx   Ready    master   28m     v1.19.6
master-zone-2-n7zs4y   Ready    master   21m     v1.19.6
master-zone-3-wmskzc   Ready    master   11m     v1.19.6
nodes-zone-1-miqpky    Ready    node     6m23s   v1.19.6
nodes-zone-2-bujc69    Ready    node     48s     v1.19.6

scale down instancegroup nodes from 1 -> 0 (should delete one node, kops edit ig nodes-zone-2 && kops update cluster --name calc.k8s.local)

% kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
master-zone-1-chvtmx   Ready    master   32m     v1.19.6
master-zone-2-n7zs4y   Ready    master   26m     v1.19.6
master-zone-3-wmskzc   Ready    master   16m     v1.19.6
nodes-zone-1-miqpky    Ready    node     11m     v1.19.6

delete cluster

with old code create and then updating with newer code

create new cluster with old naming format (using old code)

% kubectl get nodes
NAME                             STATUS   ROLES    AGE     VERSION
master-zone-1-1-calc-k8s-local   Ready    master   2m38s   v1.19.5
master-zone-2-1-calc-k8s-local   Ready    master   2m35s   v1.19.5
master-zone-3-1-calc-k8s-local   Ready    master   2m2s    v1.19.5
nodes-zone-1-1-calc-k8s-local    Ready    node     94s     v1.19.5
nodes-zone-2-1-calc-k8s-local    Ready    node     90s     v1.19.5

compile new code and use it
run kubectl update cluster without changing cluster config (no changes, kops update cluster --name calc.k8s.local)
update k8s version in cluster config and run update cluster (no changes, kops update cluster --name calc.k8s.local)
run rolling-update (kops rolling-update cluster calc.k8s.local --yes)

% kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
master-zone-1-orb9gv   Ready    master   31m     v1.19.6
master-zone-2-diw28n   Ready    master   18m     v1.19.6
master-zone-3-srsk15   Ready    master   12m     v1.19.6
nodes-zone-1-lhgo7i    Ready    node     6m51s   v1.19.6
nodes-zone-2-gyziid    Ready    node     2m12s   v1.19.6

scale down instancegroup nodes from 1 -> 0 (should delete one node, kops edit ig nodes-zone-2 && kops update cluster --name calc.k8s.local)

% kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
master-zone-1-orb9gv   Ready    master   33m     v1.19.6
master-zone-2-diw28n   Ready    master   20m     v1.19.6
master-zone-3-srsk15   Ready    master   14m     v1.19.6
nodes-zone-1-lhgo7i    Ready    node     8m35s   v1.19.6

delete cluster

with old code create and then scaling down with new code

create new cluster with old naming format (using old code)

% kubectl get nodes
NAME                             STATUS   ROLES    AGE     VERSION
master-zone-1-1-calc-k8s-local   Ready    master   2m20s   v1.19.6
master-zone-2-1-calc-k8s-local   Ready    master   2m32s   v1.19.6
master-zone-3-1-calc-k8s-local   Ready    master   2m34s   v1.19.6
nodes-zone-1-1-calc-k8s-local    Ready    node     82s     v1.19.6
nodes-zone-2-1-calc-k8s-local    Ready    node     85s     v1.19.6

compile new code and use it
scale down instancegroup nodes from 1 -> 0 (should delete one node, kops edit ig nodes-zone-2 && kops update cluster --name calc.k8s.local)

% kubectl get nodes
NAME                             STATUS   ROLES    AGE     VERSION
master-zone-1-1-calc-k8s-local   Ready    master   4m17s   v1.19.6
master-zone-2-1-calc-k8s-local   Ready    master   4m29s   v1.19.6
master-zone-3-1-calc-k8s-local   Ready    master   4m31s   v1.19.6
nodes-zone-1-1-calc-k8s-local    Ready    node     3m19s   v1.19.6

delete cluster

this code is backward compatible with old clusters with old format

zetaab · 2021-01-11T17:32:53Z

seems that tests will fail, I will fix tests tomorrow

zetaab · 2021-01-12T06:59:27Z

@olemarkus @rifelpet could you guys help me here, why tests are failing? I cannot find the reason for that, some fields missing from dummy data?

k8s-ci-robot · 2021-01-12T07:53:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zetaab

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [zetaab]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zetaab · 2021-01-12T08:09:54Z

I do not understand why its trying to create two instances with same name? If I add debug https://github.com/kubernetes/kops/blob/master/cloudmock/openstack/mockcompute/servers.go#L193 I can see that it will create two nodes always with same name!? Its not working like that when I execute it against real OpenStack

olemarkus · 2021-01-12T08:50:57Z

Does that happen prior to this PR? I tried to duplicate that on master, but doesn't seem to happen there.

olemarkus · 2021-01-12T08:58:52Z

Just wondering if Find actually can find anything using the mock. The new behaviour expects a tag to be set, but we don't set metadata in the mock.

zetaab · 2021-01-12T09:29:42Z

hmm the metadata is set here https://github.com/kubernetes/kops/blob/master/cloudmock/openstack/mockcompute/servers.go#L199 ?

pkg/model/openstackmodel/servergroup.go

zetaab · 2021-01-12T15:48:55Z

@olemarkus now it works! What is your opinion about this?

Ping @mitch000001

olemarkus · 2021-01-13T07:17:07Z

I think this looks good.

/lgtm

mitch000001

Overall looks good, just some minor naming issues :)

pkg/model/openstackmodel/servergroup.go

pkg/resources/openstack/floatingip.go

upup/pkg/fi/cloudup/openstacktasks/instance.go

olemarkus · 2021-01-13T07:46:17Z

/hold

In case you want to implement the requested changes

mitch000001 · 2021-01-13T10:07:35Z

looks good to me

zetaab · 2021-01-13T15:16:51Z

@olemarkus can you recheck (and remove hold if ok)

mitch000001 · 2021-01-13T15:27:17Z

@zetaab is that a change we want to also cherry-pick for 1.19?

zetaab · 2021-01-13T15:38:40Z

@mitch000001 no idea, do we want it already in 1.19? I am fine with that if someone wants it

olemarkus · 2021-01-13T16:44:19Z

/hold cancel
/lgtm

I think this needs to bake a while, so a bit too late for 1.19.

kciredor · 2021-07-28T11:46:32Z

Missed this one, but while upgrading to kops 1.20 from 1.19 on OpenStack I found out ;-) I understand this solves a problem, but it also means that the server name does not include the cluster name anymore. Meaning a quick openstack server list does not show which vm belongs to which cluster (It's not uncommon to have multiple clusters per OpenStack project). Ofcourse it's easy to display vm's in more detail, but still. What if introducing the hash would not imply dropping the cluster name?

mitch000001 · 2021-07-28T15:59:22Z

I think the introduced change does more than just adding a hash to the name. The prior problem was that the server name also contained the order in the instance group, e.g. if the instance group has 3 members (size 3) then the first node was called ig-name-1-cluster-name, the second ig-name-2-cluster-name and so on. Secondly the hostnames may only be 64 characters normally, so if your cluster name was too long you also ran into problems.
If you "just" want to have the cluster name in place to make it easier to find servers, I mean, I don't want to just point out greping for it (as we do) but I am not convinced that this outweighs the solved problems with this PR.
Having said that, this also is a preparation on enabling rolling update features later on like detaching nodes from the management of kOps (surging, +1/-1) which is only possible by stripping the instance group number because otherwise we can't easily create a replacement node because names could collide.

kciredor · 2021-07-29T08:44:31Z

Thanks @mitch000001 this makes a lot of sense now!

zetaab requested a review from olemarkus January 11, 2021 17:28

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/provider/openstack Issues or PRs related to openstack provider labels Jan 11, 2021

k8s-ci-robot requested review from drekle and johngmyers January 11, 2021 17:29

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 11, 2021

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 11, 2021

zetaab mentioned this pull request Jan 12, 2021

WIP: OpenStack: possibility to detach instances from the load #9865

Closed

zetaab force-pushed the feature/hashname branch from 0173169 to 3e68591 Compare January 12, 2021 06:36

zetaab force-pushed the feature/hashname branch from 3e68591 to 9b11b99 Compare January 12, 2021 07:53

olemarkus reviewed Jan 12, 2021

View reviewed changes

pkg/model/openstackmodel/servergroup.go Outdated Show resolved Hide resolved

zetaab force-pushed the feature/hashname branch from b270349 to 6bb0b7c Compare January 12, 2021 11:58

Use random instance names in OpenStack

185ccba

zetaab force-pushed the feature/hashname branch from 6bb0b7c to 185ccba Compare January 12, 2021 12:52

fix test

38831ff

k8s-ci-robot assigned olemarkus Jan 13, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2021

mitch000001 reviewed Jan 13, 2021

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2021

nameprefix -> groupname

1bc330b

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2021

fix comment

6439973

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jan 13, 2021

k8s-ci-robot merged commit fb0fbb5 into kubernetes:master Jan 13, 2021

k8s-ci-robot added this to the v1.20 milestone Jan 13, 2021

zetaab deleted the feature/hashname branch January 13, 2021 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenStack] Use new hash format in instance names #10557

[OpenStack] Use new hash format in instance names #10557

zetaab commented Jan 11, 2021 •

edited

Loading

zetaab commented Jan 11, 2021

zetaab commented Jan 12, 2021

k8s-ci-robot commented Jan 12, 2021

zetaab commented Jan 12, 2021 •

edited

Loading

olemarkus commented Jan 12, 2021

olemarkus commented Jan 12, 2021

zetaab commented Jan 12, 2021

zetaab commented Jan 12, 2021

olemarkus commented Jan 13, 2021

mitch000001 left a comment

olemarkus commented Jan 13, 2021

mitch000001 commented Jan 13, 2021

zetaab commented Jan 13, 2021

mitch000001 commented Jan 13, 2021

zetaab commented Jan 13, 2021

olemarkus commented Jan 13, 2021

kciredor commented Jul 28, 2021

mitch000001 commented Jul 28, 2021

kciredor commented Jul 29, 2021

[OpenStack] Use new hash format in instance names #10557

[OpenStack] Use new hash format in instance names #10557

Conversation

zetaab commented Jan 11, 2021 • edited Loading

with new code

with old code create and then updating with newer code

with old code create and then scaling down with new code

zetaab commented Jan 11, 2021

zetaab commented Jan 12, 2021

k8s-ci-robot commented Jan 12, 2021

zetaab commented Jan 12, 2021 • edited Loading

olemarkus commented Jan 12, 2021

olemarkus commented Jan 12, 2021

zetaab commented Jan 12, 2021

zetaab commented Jan 12, 2021

olemarkus commented Jan 13, 2021

mitch000001 left a comment

Choose a reason for hiding this comment

olemarkus commented Jan 13, 2021

mitch000001 commented Jan 13, 2021

zetaab commented Jan 13, 2021

mitch000001 commented Jan 13, 2021

zetaab commented Jan 13, 2021

olemarkus commented Jan 13, 2021

kciredor commented Jul 28, 2021

mitch000001 commented Jul 28, 2021

kciredor commented Jul 29, 2021

zetaab commented Jan 11, 2021 •

edited

Loading

zetaab commented Jan 12, 2021 •

edited

Loading