Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Add design doc for OpenStackServer #2021

Merged

Conversation

mdbooth
Copy link
Contributor

@mdbooth mdbooth commented Apr 16, 2024

Design proposal for #2020

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2024
@k8s-ci-robot k8s-ci-robot requested a review from dulek April 16, 2024 16:58
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 16, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 16, 2024
Copy link

netlify bot commented Apr 16, 2024

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 9b05e94
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/661fc31e4cf1760008165153
😎 Deploy Preview https://deploy-preview-2021--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mdbooth
Copy link
Contributor Author

mdbooth commented Apr 16, 2024

@lentzi90 Meta point: I recall you doing something similar before, but I can't find it to re-use it. I'm thinking of writing 2-4 of these in fairly quick succession, so it might be an opportunity to try to make something stick. If we have a better process already I'll switch to that instead.

Hoping to write them for:

  • This (obvs)
  • External endpoint controllers
  • Spine-leaf deployments

The latter likely has tentacles and may turn in to more than one (e.g. separate doc for failure domains).

My thoughts on this are:

  • Strictly for 'big' changes only, where 'big' is ill-defined.
  • Numeric prefix is the ID of an associated feature request which may just be a stub
  • We can and should merge to docs in proposed fairly freely without requiring consensus. This makes change tracking easier, and we can always delete it if we end up not planning to implement it after all.
  • Consensus is required to move out of the proposed directory.

Purposes:

  • Design review
  • Discussion reference for users or external stakeholders (e.g. CAPI/OpenShift)
  • Implementation guide
  • A roadmap of sorts

Copy link
Contributor

@jichenjc jichenjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the great summary, it's really helpful
just one minor question



```go
type OpenStackServerSpec struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there anything we need to do for existing customer to move to here? transparent or need define those ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. I should make it clear in the doc that this change will happen automatically and the end user does not need to take any action. I'll update the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just pushed a small change which adds:

  • The upgrade is automatic and will not recreate existing OpenStack resources
  • The new API will be in v1alpha1

Copy link
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I have no objections 🙂

* A Volume specified with `from: Machine` will use the value of `AvailabilityZone` instead of `Machine.FailureDomain`.

It will set the following `Conditions`:
* `Ready`: `True` when the server is fully provisioned. Once set to `True` will never be subsequently set to `False`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds strange to me to never change it back, even though we could have Failed=True. Is that intentional?
So we could have Ready=True and Failed=True at the same time 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about 3 Conditions:

  • Provisioned: a latch which is never set to false after the server has been provisioned
  • Ready: an ephemeral 'readiness' state. Can be false due to transient errors.
  • Failed: permanent failure. No further reconciliation. Never set to false after being set to true.

@lentzi90
Copy link
Contributor

The previous thing I did is not merged so that's probably why you didn't find it 😄
For that I used date-based naming because I just copied what CAPI does. They also have a template if you want to steal it. I have no strong opinion on this though. 😄

@EmilienM
Copy link
Contributor

Awesome proposal 👍

Comment on lines +131 to +134
if server spec does not match bastion.Spec {
deleteOpenStackServer(bastion)
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to modify the OpenStackCluster here too to force an event?

ServerGroup *ServerGroupParam
IdentityRef *OpenStackIdentityReference
FloatingIPPoolRef *corev1.TypedLocalObjectReference
UserDataRef *corev1.TypedLocalObjectReference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you want a new object for this one? Any reason why not passing a string anymore?

* ProviderID is not present: this is a Machine property
* UserDataRef is added. If present it must exist.
* AvailabilityZone is added, and refers explicitly
* It is an error for Ports to be empty. Defaulting the cluster network must be done by the controller creating the object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for ports in the OpenStackServerSpec? Do we want the same API as OpenStackMachineSpec ?

@EmilienM
Copy link
Contributor

/lgtm
we can iterate later on the design but I think overall it's a solid start.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 24, 2024
@huxcrux
Copy link
Contributor

huxcrux commented Apr 26, 2024

I'm a bit late to the party however think the proposal sounds good 👍
/lgtm

@mdbooth
Copy link
Contributor Author

mdbooth commented Apr 30, 2024

I'm going to move this around to conform to the pattern set by #1565

@lentzi90
Copy link
Contributor

Do we want to include a paragraph or two about if/how this could be used for MachinePool support in CAPO? From my understanding we would "just" need to add one higher level "pool" object to handle OpenStackServers in order to achieve that.

@mdbooth
Copy link
Contributor Author

mdbooth commented May 28, 2024

Do we want to include a paragraph or two about if/how this could be used for MachinePool support in CAPO? From my understanding we would "just" need to add one higher level "pool" object to handle OpenStackServers in order to achieve that.

What would this look like? I think MachinePool is for when the cloud has some efficiently-implemented native concept of a scalable set of machines. I don't think we have that in OpenStack?

@lentzi90
Copy link
Contributor

Do we want to include a paragraph or two about if/how this could be used for MachinePool support in CAPO? From my understanding we would "just" need to add one higher level "pool" object to handle OpenStackServers in order to achieve that.

What would this look like? I think MachinePool is for when the cloud has some efficiently-implemented native concept of a scalable set of machines. I don't think we have that in OpenStack?

Good point. I think it would be doable, but we would end up with all the logic in CAPO to manage the pool, which is not ideal. Forget that I mentioned it 😅

In the same way that OpenStackMachine currently has an 'adoption' phase where it will adopt existing resources, OpenStackServer should adopt matching resources which it would have created. On upgrade to a new version of CAPO which manages the bastion with an OpenStackServer object, I would expect the flow to be:

* Cluster controller creates OpenStackServer and waits for it to report Ready
* OpenStackServer controller looks for existing resources matching its given spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a bit tricky, we'll need to compare every field out of instanceStatus, I started to play with it and it became messy.
Not against this idea, just making sure we really want to go down that path.

The risk of not doing that is that an instance would be adopted by OpenStackServer that wouldn't be matching 100% the spec if the instance would have been created by the OpenStackServer controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the adoption behaviour would be identical to what we currently do in the OpenStackMachine controller. Literally the same code. We're basically just adopting by name.

@EmilienM
Copy link
Contributor

/hold cancel
/approve
we can merge the design since the implementation has merged.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 25, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: EmilienM

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2024
@EmilienM
Copy link
Contributor

I'm expediting this but that doesn't mean we can't iterate on the design document that was initially proposed.

@k8s-ci-robot k8s-ci-robot merged commit 16d4e87 into kubernetes-sigs:main Jul 25, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

7 participants