Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier, simplified UX for cluster and machine creation #1227

Closed
ncdc opened this issue Aug 5, 2019 · 40 comments
Closed

Easier, simplified UX for cluster and machine creation #1227

ncdc opened this issue Aug 5, 2019 · 40 comments
Labels
kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@ncdc
Copy link
Contributor

ncdc commented Aug 5, 2019

/kind feature

Describe the solution you'd like
In v1alpha1, you typically have to create two resources to have a minimally functional Kubernetes cluster: a Cluster and a Machine. Anything that is provider-specific, from kubeadm configuration to infrastructure details such as instance type are stored in providerSpec in each resource.

In v1alpha2, now that providerSpec has moved to a standalone CRD, you have to create the following to get a functional cluster (using CAPA and kubeadm as an example):

  1. Cluster
  2. AWSCluster
  3. Machine
  4. AWSMachine
  5. KubeadmConfig

This is five resources now, instead of two.

I'm thinking we could simplify the UX by allowing, either via opt-in or opt-out, defaulting for both bootstrap and infrastructure resources. For example, with an opt-in flow, we could do something like this (annotation names TBD):

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
  annotations:
    boostrap.cluster.x-k8s.io/default: kubeadm
    infrastructure.cluster.x-k8s.io/default: aws
    aws.infrastructure.cluster.x-k8s.io/region: us-east-1
spec:
  # ...
apiVersion: cluster.x-k8s.io
kind: Machine
metadata:
  namespace: foo
  name: controlplane-0
  labels:
    cluster.x-k8s.io/cluster-name: my-cluster
spec:
  bootstrap:
    configRef: nil
    data: nil
  infrastructureRef: nil/empty

The presence of the annotations on the Cluster above tells the kubeadm bootstrap provider and the AWS infra provider that they should take action on the Cluster and any Machine that do not have infrastructureRef and/or bootstrap configRef filled in. The providers would generate new resources (AWSCluster, AWSMachine, KubeadmConfig) to fulfill the missing pieces, using a combination of sensible defaults along with values from the annotations. This way, the user probably could go back to only creating two resources -- Cluster and Machine -- and end up with a functional cluster.

The alternative to opt-in would be opt-out, which essentially means that a bootstrap and infra provider could do the logic mentioned in the previous paragraph, except the annotations wouldn't be required. That might end up being problematic (what happens if two infra providers both try to be the default?), but I figure it's worth mentioning as an alternative.

WDYT?

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 5, 2019
@sethp-nr
Copy link
Contributor

sethp-nr commented Aug 5, 2019

This is really interesting! I really like where you're going with this as a concept. Beyond what seems to me as providing a good experience, it would make (more) possible an idea that I've been not-so-seriously kicking around: having a control plane from one provider (say, a dynamically size-able cloud) manage nodes from a different one (say, expensive bare metal).

One detail I'm wondering about is:

aws.infrastructure.cluster.x-k8s.io/region: us-east-1

How would we avoid that becoming a vehicle for moving the runtime.RawExtension from the spec into the annotations' map[string]string?

Put another way, I wonder if there aren't two more CRDs waiting in the system for us to extract. maybe an opt-in AWSClusterConfig and AWSMachineConfig that could be referenced by name from the annotation? Does that split the difference between validation and opt-in piecemeal default overriding, or just confuse the issue?

@ncdc
Copy link
Contributor Author

ncdc commented Aug 5, 2019

How would we avoid that becoming a vehicle for moving the runtime.RawExtension from the spec into the annotations' map[string]string?

Ideally, each provider that wants to support this would expose an absolute minimum number of annotations. Anything beyond a strict minimum must force the user to create their own resources (AWSMachine, etc).

Put another way, I wonder if there aren't two more CRDs waiting in the system for us to extract. maybe an opt-in AWSClusterConfig and AWSMachineConfig that could be referenced by name from the annotation? Does that split the difference between validation and opt-in piecemeal default overriding, or just confuse the issue?

I feel like this is AWSCluster and AWSMachine, which we already have, and which are the full-feature CRDs you'd fill out. Right?

@moshloop
Copy link
Contributor

moshloop commented Aug 5, 2019

I really like the goal, of a better simplified UX

This seems like a lot of boilerplate already, if our goal is to reduce boilerplate.

Perhaps it makes more sense to have these as first-class fields on a new Cluster API Operator.

apiVersion: cluster.x-k8s.io
kind: ClusterOperator
metadata:
  namespace: foo
  name: my-cluster
spec:
  bootstrap:
    provider: kubeadm
    version: 0.1
  infrastructure:
    provider: aws
    version: 1.0

Region I think can be handled in a similar way to the proposed service topologies:

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
  labels:
    region: us-east-1
spec:

And then the AWS provider can define "topology" mappings from labels to region, zone, subnet, etc..

@ncdc
Copy link
Contributor Author

ncdc commented Aug 5, 2019

@moshloop this issue isn't about having an operator install specific versions of specific providers (which is a fantastic, separate idea - see https://docs.google.com/document/d/1ox_2dVzxJtWy_zv1SID0bGIes-hQyOVlAIC5ARcVdak/edit if you haven't already). It's about making it easier to get up and running by omitting the need to create the 3 extra resources and replacing them with a small set of annotations + a lot of defaulting logic.

@moshloop
Copy link
Contributor

moshloop commented Aug 5, 2019

clusteradm could start out by providing just this UX before growing into a full-blown provider lifecycle manager or this could be implemented in a standalone defaulting controller

The problem I see is that this completely breaks the lifecycle phases that you so carefully put together - this will make it harder to reason about the system as a whole.

/assign @timothysc

@ncdc
Copy link
Contributor Author

ncdc commented Aug 5, 2019

or this could be implemented in a standalone defaulting controller

This is essentially what I'm proposing.

The problem I see is that this completely breaks the lifecycle phases that you so carefully put together - this will make it harder to reason about the system as a whole.

How - could you please explain?

@moshloop
Copy link
Contributor

moshloop commented Aug 5, 2019

I think this is where we diverge:

The presence of the annotations on the Cluster above tells the kubeadm bootstrap provider and the AWS infra provider that they should take action on the Cluster and any Machine that do not have infrastructureRef and/or bootstrap configRef filled in.

Adding this logic into CAPA/CAPV/CAPKB/etc adds many new conditions, branches and watches to the existing state diagram.

On the other hand adding this logic into a new or standalone controller leaves the existing state diagram unchanged and adds a new defaulting diagram that is easy to reason about (Watch Machine and create AWSMachine ), and therefore easy to test.

@ncdc
Copy link
Contributor Author

ncdc commented Aug 5, 2019

Adding this logic into CAPA/CAPV/CAPKB/etc adds many new conditions, branches and watches to the existing state diagram.

I'm not sure I see this adding as much complexity as you're suggesting.

On the other hand adding this logic into a new or standalone controller leaves the existing state diagram unchanged and adds a new defaulting diagram that is easy to reason about (Watch Machine and create AWSMachine ), and therefore easy to test.

If we add a standalone controller to CAPI, it then has to know about all the different providers and their data types. I'd rather have the logic live in each provider.

@moshloop
Copy link
Contributor

moshloop commented Aug 5, 2019

This might not be complex to create, But from readability, e2e testing and reasoning perspective, it adds complexity.

it then has to know about all the different providers and their data types

Isn't that an eventuality/requirement for clusteradm ?

@ncdc
Copy link
Contributor Author

ncdc commented Aug 5, 2019

Isn't that an eventuality/requirement for clusteradm ?

Yes, and I'll admit I haven't been thinking of this as a clusteradm feature.

@detiber
Copy link
Member

detiber commented Aug 5, 2019

Isn't that an eventuality/requirement for clusteradm ?

@moshloop I hope not, otherwise we remove support for custom providers that clusteradm does not know about and we are back in a situation similar to v1alpha1 where users would have to custom build their own clusteradm for non-blessed providers.

@moshloop
Copy link
Contributor

moshloop commented Aug 5, 2019

I think clusteradm init --infra=aws,vsphere --bootstrap=kubeadm would require knowing about these providers? non-blessed providers could implement a provider plugin similar to how packer does it today for custom provisioners/builders

@timothysc
Copy link
Member

Let's table some of this for conversing high bandwidth on Wed. I generally like the core idea, but a question I would have is: "Shouldn't we be able to cache the defaults, and not even have to specify, e.g. added on admission?"

@timothysc timothysc removed their assignment Aug 6, 2019
@detiber
Copy link
Member

detiber commented Aug 6, 2019

@ncdc I've been thinking a bit more on this. I'm not sure I'm a fan of encouraging users to set annotations for this type of behavior as it essentially creates an unversioned shadow API that exists parallel to the defined and versioned API.

I think it's one thing for leveraging annotations for tracking internal state, since we do not expose that directly to the user.

For defaulting based on which provider to use, why not optional high level fields:

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
spec:
  defaultInfrastructureProvider: aws
  defaultBootstrapProvider: kubeadm

The provider configuration settings are potentially more compelling as an annotation, but it still feels like we are potentially trending back to opaque embedded configuration with the annotations. Would it make sense instead to leverage a single provider-defined type per provider that can be attached to the Cluster spec to provide cluster-wide defaults instead?

@ncdc
Copy link
Contributor Author

ncdc commented Aug 6, 2019

@detiber I like your suggestions better than mine - thanks!

@vincepri
Copy link
Member

vincepri commented Aug 7, 2019

Could we leverage the template resources in the example that @detiber suggested to let users to specify the default configurations?

@ncdc
Copy link
Contributor Author

ncdc commented Aug 7, 2019

SGTM

@dlipovetsky
Copy link
Contributor

I think we all agree that we want to improve the user experience by reducing the boilerplate that the client has to send to the server.

But so far we are discussing reducing this boilerplate server-side. I'd like to hear alternatives.

@dlipovetsky
Copy link
Contributor

Here's a naive alternative: Reduce boilerplate client-side. Create a shell script, kubectl plugin, or standalone CLI that generates the boilerplate.

Advantages:

  • It's easier to iterate on client-side functionality.

Known drawbacks:

  • Users must use a special client to reduce boilerplate.

Potential drawbacks (not yet sure if these are true):

  • Every infra and bootstrap provider has to provide their own client.

@detiber
Copy link
Member

detiber commented Aug 7, 2019

@dlipovetsky the client-side solution you bring up seems like it falls in line with the clusteradm discussion.

@dlipovetsky
Copy link
Contributor

@detiber Good point. I assumed clusteradm would not be available in time. I also feel like there's room for experimentation, which I expect would be easier to do with something like a kubectl plugin.

@ncdc ncdc added this to the Next milestone Aug 7, 2019
@ncdc ncdc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 7, 2019
@randomvariable
Copy link
Member

Think some of the confusion around the use cases comes from the word default. Makes sense from our perspective of people developing Cluster API, but from an end-user perspective, the question would be '"default" in what sense?'

Is there better terminology?

@dlipovetsky
Copy link
Contributor

The providers would generate new resources (AWSCluster, AWSMachine, KubeadmConfig) to fulfill the missing pieces, using a combination of sensible defaults along with values from the annotations

Some providers do not (or cannot) provide sensible defaults. For example, CAPO requires the end user to manually create some infrastructure.

As an end-user, how will I discover that settings defaults (using fields or annotations) for these providers will not work?

Reducing boilerplate is important for UX, but so is having consistent behavior. So some providers may need help to support defaults.

@vincepri
Copy link
Member

Following up on @detiber and my suggestion from above. I'd like to see a way to specify default configuration values using the <Provider>Template resources like AWSMachineTemplate.

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
spec:
  machineDefaults:
    infrastructureTemplate: <object-reference>
    bootstrapTemplate: <object-reference>

The bootstrapTemplate reference should be clear in documentation that's going only to be valid for worker nodes.

WDYT?

@juan-lee
Copy link
Contributor

juan-lee commented Oct 9, 2019

Following up on @detiber and my suggestion from above. I'd like to see a way to specify default configuration values using the <Provider>Template resources like AWSMachineTemplate.

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
spec:
  machineDefaults:
    infrastructureTemplate: <object-reference>
    bootstrapTemplate: <object-reference>

The bootstrapTemplate reference should be clear in documentation that's going only to be valid for worker nodes.

WDYT?

If I understand correctly, most of the UX improvement is gained when multiple clusters rely on the same template? There's still a lot to define for a single cluster. Is the goal to optimize the multi or single cluster case?

@juan-lee
Copy link
Contributor

juan-lee commented Oct 9, 2019

@ncdc @detiber @vincepri Another thought I had, when the user is using a gitops approach with kustomize, they'll be able to do their own templating. I haven't had a lot of time to think about this, but we should ask ourselves, are we are trying to solve concerns that can be solved more effectively elsewhere?

@detiber
Copy link
Member

detiber commented Oct 9, 2019

@juan-lee I think the concern is more for those that are providing a managed cluster type experience.

For example, a centralized IT department wants to give developers access to request clusters using Cluster API, but wants to ensure that the clusters that are created are configured in a way that is compliant with their org requirements while still allowing users to customize other parts of the configuration.

@timothysc
Copy link
Member

timothysc commented Oct 22, 2019

Kubernetes already has defaulting today with the concept of limitranges which provide defaults based on namespace. Limitranges only default if a parameter has not been specified and enable us to set a global, or "default", override. On inspecting, I'd assert that you could easily hit 80+% of users by providing a single bootstrap config in the "default" namespace. Limitranges also enable settings on a per-namespace granularity which would give administrators the ability to override policy.

@timothysc
Copy link
Member

cc @andrewsykim

@ncdc ncdc added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 6, 2019
@ncdc ncdc removed their assignment Dec 6, 2019
@ncdc ncdc modified the milestones: v0.3.0, Next Dec 6, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2020
@vincepri
Copy link
Member

vincepri commented Mar 5, 2020

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 5, 2020
@vincepri
Copy link
Member

/help
/kind design
This might need a design document first. Also cc @fabriziopandini, @frapposelli because it's kind-ish related to the templating work.

@k8s-ci-robot
Copy link
Contributor

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/kind design
This might need a design document first. Also cc @fabriziopandini, @frapposelli because it's kind-ish related to the templating work.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/design Categorizes issue or PR as related to design. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Mar 11, 2020
@vincepri
Copy link
Member

/area ux

@willemm
Copy link

willemm commented Dec 29, 2020

I think you can address both ways by using (infrastructure/bootstrap)templates, but by asking the provider devs to install a default template along with their provider installation. So, for example, the AWS provider would add a 'AWSMachineTemplateWithSensibleDefaults.yaml' in their manifest.

That way, you can still get a cluster running with just the two resource definitions (i.e. optimized for the single cluster case), but you also enable admins to provide their own templates for users (i.e. optimized for the multi cluster case). And you even give them a jumping-off point for the template, with all the sensible defaults in it.

@fabriziopandini
Copy link
Member

/lifecycle frozen

@CecileRobertMichon
Copy link
Contributor

related to #4430

/remove help

@CecileRobertMichon CecileRobertMichon removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Apr 22, 2021
@vincepri
Copy link
Member

We can probably close this in favor of #4430, thoughts?

@fabriziopandini
Copy link
Member

Given thumbs up on ^^
/close

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

Given thumbs up on ^^
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests