Easier, simplified UX for cluster and machine creation #1227

ncdc · 2019-08-05T17:25:20Z

/kind feature

Describe the solution you'd like
In v1alpha1, you typically have to create two resources to have a minimally functional Kubernetes cluster: a Cluster and a Machine. Anything that is provider-specific, from kubeadm configuration to infrastructure details such as instance type are stored in providerSpec in each resource.

In v1alpha2, now that providerSpec has moved to a standalone CRD, you have to create the following to get a functional cluster (using CAPA and kubeadm as an example):

Cluster
AWSCluster
Machine
AWSMachine
KubeadmConfig

This is five resources now, instead of two.

I'm thinking we could simplify the UX by allowing, either via opt-in or opt-out, defaulting for both bootstrap and infrastructure resources. For example, with an opt-in flow, we could do something like this (annotation names TBD):

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
  annotations:
    boostrap.cluster.x-k8s.io/default: kubeadm
    infrastructure.cluster.x-k8s.io/default: aws
    aws.infrastructure.cluster.x-k8s.io/region: us-east-1
spec:
  # ...

apiVersion: cluster.x-k8s.io
kind: Machine
metadata:
  namespace: foo
  name: controlplane-0
  labels:
    cluster.x-k8s.io/cluster-name: my-cluster
spec:
  bootstrap:
    configRef: nil
    data: nil
  infrastructureRef: nil/empty

The presence of the annotations on the Cluster above tells the kubeadm bootstrap provider and the AWS infra provider that they should take action on the Cluster and any Machine that do not have infrastructureRef and/or bootstrap configRef filled in. The providers would generate new resources (AWSCluster, AWSMachine, KubeadmConfig) to fulfill the missing pieces, using a combination of sensible defaults along with values from the annotations. This way, the user probably could go back to only creating two resources -- Cluster and Machine -- and end up with a functional cluster.

The alternative to opt-in would be opt-out, which essentially means that a bootstrap and infra provider could do the logic mentioned in the previous paragraph, except the annotations wouldn't be required. That might end up being problematic (what happens if two infra providers both try to be the default?), but I figure it's worth mentioning as an alternative.

WDYT?

The text was updated successfully, but these errors were encountered:

sethp-nr · 2019-08-05T18:04:37Z

This is really interesting! I really like where you're going with this as a concept. Beyond what seems to me as providing a good experience, it would make (more) possible an idea that I've been not-so-seriously kicking around: having a control plane from one provider (say, a dynamically size-able cloud) manage nodes from a different one (say, expensive bare metal).

One detail I'm wondering about is:

aws.infrastructure.cluster.x-k8s.io/region: us-east-1

How would we avoid that becoming a vehicle for moving the runtime.RawExtension from the spec into the annotations' map[string]string?

Put another way, I wonder if there aren't two more CRDs waiting in the system for us to extract. maybe an opt-in AWSClusterConfig and AWSMachineConfig that could be referenced by name from the annotation? Does that split the difference between validation and opt-in piecemeal default overriding, or just confuse the issue?

ncdc · 2019-08-05T18:16:13Z

How would we avoid that becoming a vehicle for moving the runtime.RawExtension from the spec into the annotations' map[string]string?

Ideally, each provider that wants to support this would expose an absolute minimum number of annotations. Anything beyond a strict minimum must force the user to create their own resources (AWSMachine, etc).

Put another way, I wonder if there aren't two more CRDs waiting in the system for us to extract. maybe an opt-in AWSClusterConfig and AWSMachineConfig that could be referenced by name from the annotation? Does that split the difference between validation and opt-in piecemeal default overriding, or just confuse the issue?

I feel like this is AWSCluster and AWSMachine, which we already have, and which are the full-feature CRDs you'd fill out. Right?

moshloop · 2019-08-05T18:52:14Z

I really like the goal, of a better simplified UX

This seems like a lot of boilerplate already, if our goal is to reduce boilerplate.

Perhaps it makes more sense to have these as first-class fields on a new Cluster API Operator.

apiVersion: cluster.x-k8s.io
kind: ClusterOperator
metadata:
  namespace: foo
  name: my-cluster
spec:
  bootstrap:
    provider: kubeadm
    version: 0.1
  infrastructure:
    provider: aws
    version: 1.0

Region I think can be handled in a similar way to the proposed service topologies:

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
  labels:
    region: us-east-1
spec:

And then the AWS provider can define "topology" mappings from labels to region, zone, subnet, etc..

ncdc · 2019-08-05T19:08:33Z

@moshloop this issue isn't about having an operator install specific versions of specific providers (which is a fantastic, separate idea - see https://docs.google.com/document/d/1ox_2dVzxJtWy_zv1SID0bGIes-hQyOVlAIC5ARcVdak/edit if you haven't already). It's about making it easier to get up and running by omitting the need to create the 3 extra resources and replacing them with a small set of annotations + a lot of defaulting logic.

moshloop · 2019-08-05T19:25:05Z

clusteradm could start out by providing just this UX before growing into a full-blown provider lifecycle manager or this could be implemented in a standalone defaulting controller

The problem I see is that this completely breaks the lifecycle phases that you so carefully put together - this will make it harder to reason about the system as a whole.

/assign @timothysc

ncdc · 2019-08-05T19:26:30Z

or this could be implemented in a standalone defaulting controller

This is essentially what I'm proposing.

The problem I see is that this completely breaks the lifecycle phases that you so carefully put together - this will make it harder to reason about the system as a whole.

How - could you please explain?

moshloop · 2019-08-05T19:34:38Z

I think this is where we diverge:

The presence of the annotations on the Cluster above tells the kubeadm bootstrap provider and the AWS infra provider that they should take action on the Cluster and any Machine that do not have infrastructureRef and/or bootstrap configRef filled in.

Adding this logic into CAPA/CAPV/CAPKB/etc adds many new conditions, branches and watches to the existing state diagram.

On the other hand adding this logic into a new or standalone controller leaves the existing state diagram unchanged and adds a new defaulting diagram that is easy to reason about (Watch Machine and create AWSMachine ), and therefore easy to test.

ncdc · 2019-08-05T19:39:07Z

Adding this logic into CAPA/CAPV/CAPKB/etc adds many new conditions, branches and watches to the existing state diagram.

I'm not sure I see this adding as much complexity as you're suggesting.

On the other hand adding this logic into a new or standalone controller leaves the existing state diagram unchanged and adds a new defaulting diagram that is easy to reason about (Watch Machine and create AWSMachine ), and therefore easy to test.

If we add a standalone controller to CAPI, it then has to know about all the different providers and their data types. I'd rather have the logic live in each provider.

moshloop · 2019-08-05T19:48:21Z

This might not be complex to create, But from readability, e2e testing and reasoning perspective, it adds complexity.

it then has to know about all the different providers and their data types

Isn't that an eventuality/requirement for clusteradm ?

ncdc · 2019-08-05T19:49:53Z

Isn't that an eventuality/requirement for clusteradm ?

Yes, and I'll admit I haven't been thinking of this as a clusteradm feature.

detiber · 2019-08-05T19:50:36Z

Isn't that an eventuality/requirement for clusteradm ?

@moshloop I hope not, otherwise we remove support for custom providers that clusteradm does not know about and we are back in a situation similar to v1alpha1 where users would have to custom build their own clusteradm for non-blessed providers.

moshloop · 2019-08-05T19:58:26Z

I think clusteradm init --infra=aws,vsphere --bootstrap=kubeadm would require knowing about these providers? non-blessed providers could implement a provider plugin similar to how packer does it today for custom provisioners/builders

timothysc · 2019-08-05T20:45:26Z

Let's table some of this for conversing high bandwidth on Wed. I generally like the core idea, but a question I would have is: "Shouldn't we be able to cache the defaults, and not even have to specify, e.g. added on admission?"

detiber · 2019-08-06T07:56:01Z

@ncdc I've been thinking a bit more on this. I'm not sure I'm a fan of encouraging users to set annotations for this type of behavior as it essentially creates an unversioned shadow API that exists parallel to the defined and versioned API.

I think it's one thing for leveraging annotations for tracking internal state, since we do not expose that directly to the user.

For defaulting based on which provider to use, why not optional high level fields:

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
spec:
  defaultInfrastructureProvider: aws
  defaultBootstrapProvider: kubeadm

The provider configuration settings are potentially more compelling as an annotation, but it still feels like we are potentially trending back to opaque embedded configuration with the annotations. Would it make sense instead to leverage a single provider-defined type per provider that can be attached to the Cluster spec to provide cluster-wide defaults instead?

ncdc · 2019-08-06T13:29:27Z

@detiber I like your suggestions better than mine - thanks!

vincepri · 2019-08-07T13:42:43Z

Could we leverage the template resources in the example that @detiber suggested to let users to specify the default configurations?

ncdc · 2019-08-07T15:01:46Z

SGTM

dlipovetsky · 2019-08-07T17:39:12Z

I think we all agree that we want to improve the user experience by reducing the boilerplate that the client has to send to the server.

But so far we are discussing reducing this boilerplate server-side. I'd like to hear alternatives.

dlipovetsky · 2019-08-07T17:39:20Z

Here's a naive alternative: Reduce boilerplate client-side. Create a shell script, kubectl plugin, or standalone CLI that generates the boilerplate.

Advantages:

It's easier to iterate on client-side functionality.

Known drawbacks:

Users must use a special client to reduce boilerplate.

Potential drawbacks (not yet sure if these are true):

Every infra and bootstrap provider has to provide their own client.

detiber · 2019-08-07T17:40:55Z

@dlipovetsky the client-side solution you bring up seems like it falls in line with the clusteradm discussion.

dlipovetsky · 2019-08-07T17:44:47Z

@detiber Good point. I assumed clusteradm would not be available in time. I also feel like there's room for experimentation, which I expect would be easier to do with something like a kubectl plugin.

randomvariable · 2019-08-07T18:11:26Z

Think some of the confusion around the use cases comes from the word default. Makes sense from our perspective of people developing Cluster API, but from an end-user perspective, the question would be '"default" in what sense?'

Is there better terminology?

dlipovetsky · 2019-08-07T18:22:40Z

The providers would generate new resources (AWSCluster, AWSMachine, KubeadmConfig) to fulfill the missing pieces, using a combination of sensible defaults along with values from the annotations

Some providers do not (or cannot) provide sensible defaults. For example, CAPO requires the end user to manually create some infrastructure.

As an end-user, how will I discover that settings defaults (using fields or annotations) for these providers will not work?

Reducing boilerplate is important for UX, but so is having consistent behavior. So some providers may need help to support defaults.

vincepri · 2019-08-12T18:07:00Z

Following up on @detiber and my suggestion from above. I'd like to see a way to specify default configuration values using the <Provider>Template resources like AWSMachineTemplate.

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
spec:
  machineDefaults:
    infrastructureTemplate: <object-reference>
    bootstrapTemplate: <object-reference>

The bootstrapTemplate reference should be clear in documentation that's going only to be valid for worker nodes.

WDYT?

juan-lee · 2019-10-09T21:13:39Z

Following up on @detiber and my suggestion from above. I'd like to see a way to specify default configuration values using the <Provider>Template resources like AWSMachineTemplate.
apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
spec:
  machineDefaults:
    infrastructureTemplate: <object-reference>
    bootstrapTemplate: <object-reference>
The bootstrapTemplate reference should be clear in documentation that's going only to be valid for worker nodes.

WDYT?

If I understand correctly, most of the UX improvement is gained when multiple clusters rely on the same template? There's still a lot to define for a single cluster. Is the goal to optimize the multi or single cluster case?

juan-lee · 2019-10-09T21:18:36Z

@ncdc @detiber @vincepri Another thought I had, when the user is using a gitops approach with kustomize, they'll be able to do their own templating. I haven't had a lot of time to think about this, but we should ask ourselves, are we are trying to solve concerns that can be solved more effectively elsewhere?

detiber · 2019-10-09T21:23:36Z

@juan-lee I think the concern is more for those that are providing a managed cluster type experience.

For example, a centralized IT department wants to give developers access to request clusters using Cluster API, but wants to ensure that the clusters that are created are configured in a way that is compliant with their org requirements while still allowing users to customize other parts of the configuration.

timothysc · 2019-10-22T14:22:57Z

Kubernetes already has defaulting today with the concept of limitranges which provide defaults based on namespace. Limitranges only default if a parameter has not been specified and enable us to set a global, or "default", override. On inspecting, I'd assert that you could easily hit 80+% of users by providing a single bootstrap config in the "default" namespace. Limitranges also enable settings on a per-namespace granularity which would give administrators the ability to override policy.

timothysc · 2019-10-22T14:23:51Z

cc @andrewsykim

fejta-bot · 2020-03-05T17:30:58Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

vincepri · 2020-03-05T17:31:36Z

/lifecycle frozen

vincepri · 2020-03-11T15:52:46Z

/help
/kind design
This might need a design document first. Also cc @fabriziopandini, @frapposelli because it's kind-ish related to the templating work.

k8s-ci-robot · 2020-03-11T15:52:48Z

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/kind design
This might need a design document first. Also cc @fabriziopandini, @frapposelli because it's kind-ish related to the templating work.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vincepri · 2020-04-20T19:01:32Z

/area ux

willemm · 2020-12-29T00:55:03Z

I think you can address both ways by using (infrastructure/bootstrap)templates, but by asking the provider devs to install a default template along with their provider installation. So, for example, the AWS provider would add a 'AWSMachineTemplateWithSensibleDefaults.yaml' in their manifest.

That way, you can still get a cluster running with just the two resource definitions (i.e. optimized for the single cluster case), but you also enable admins to provide their own templates for users (i.e. optimized for the multi cluster case). And you even give them a jumping-off point for the template, with all the sensible defaults in it.

fabriziopandini · 2020-12-30T09:26:45Z

/lifecycle frozen

CecileRobertMichon · 2021-04-22T16:24:08Z

related to #4430

/remove help

vincepri · 2021-04-22T17:19:30Z

We can probably close this in favor of #4430, thoughts?

fabriziopandini · 2021-04-22T19:29:36Z

Given thumbs up on ^^
/close

k8s-ci-robot · 2021-04-22T19:29:42Z

@fabriziopandini: Closing this issue.

In response to this:

Given thumbs up on ^^
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 5, 2019

k8s-ci-robot assigned timothysc Aug 5, 2019

timothysc removed their assignment Aug 6, 2019

ncdc added this to the Next milestone Aug 7, 2019

ncdc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 7, 2019

fabriziopandini mentioned this issue Aug 12, 2019

Simplify UX / KubeadmConfig object requirements kubernetes-retired/cluster-api-bootstrap-provider-kubeadm#112

Closed

timothysc assigned ncdc Sep 26, 2019

ncdc added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 6, 2019

ncdc removed their assignment Dec 6, 2019

ncdc modified the milestones: v0.3.0, Next Dec 6, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 5, 2020

k8s-ci-robot added kind/design Categorizes issue or PR as related to design. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Mar 11, 2020

k8s-ci-robot added the area/ux label Apr 20, 2020

CecileRobertMichon removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Apr 22, 2021

k8s-ci-robot closed this as completed Apr 22, 2021

Easier, simplified UX for cluster and machine creation #1227

Easier, simplified UX for cluster and machine creation #1227

Comments

ncdc commented Aug 5, 2019

sethp-nr commented Aug 5, 2019

ncdc commented Aug 5, 2019

moshloop commented Aug 5, 2019

ncdc commented Aug 5, 2019

moshloop commented Aug 5, 2019

ncdc commented Aug 5, 2019

moshloop commented Aug 5, 2019

ncdc commented Aug 5, 2019

moshloop commented Aug 5, 2019

ncdc commented Aug 5, 2019

detiber commented Aug 5, 2019

moshloop commented Aug 5, 2019

timothysc commented Aug 5, 2019

detiber commented Aug 6, 2019 • edited Loading

ncdc commented Aug 6, 2019

vincepri commented Aug 7, 2019

ncdc commented Aug 7, 2019

dlipovetsky commented Aug 7, 2019

dlipovetsky commented Aug 7, 2019

detiber commented Aug 7, 2019

dlipovetsky commented Aug 7, 2019

randomvariable commented Aug 7, 2019

dlipovetsky commented Aug 7, 2019

vincepri commented Aug 12, 2019

juan-lee commented Oct 9, 2019

juan-lee commented Oct 9, 2019

detiber commented Oct 9, 2019

timothysc commented Oct 22, 2019 • edited Loading

timothysc commented Oct 22, 2019

fejta-bot commented Mar 5, 2020

vincepri commented Mar 5, 2020

vincepri commented Mar 11, 2020

k8s-ci-robot commented Mar 11, 2020

vincepri commented Apr 20, 2020

willemm commented Dec 29, 2020

fabriziopandini commented Dec 30, 2020

CecileRobertMichon commented Apr 22, 2021

vincepri commented Apr 22, 2021

fabriziopandini commented Apr 22, 2021

k8s-ci-robot commented Apr 22, 2021

detiber commented Aug 6, 2019 •

edited

Loading

timothysc commented Oct 22, 2019 •

edited

Loading