pkg/asset: Add asset for Worker machinesets #468

abhinavdahiya · 2018-10-15T22:47:03Z

This allows combining values for machine configuration from various sources like, 1. Defaults decided by installer 2. Platform level defaults in InstallConfig 3. Per pool options.

abhinavdahiya · 2018-10-15T22:47:37Z

/cc @crawford @trawler

abhinavdahiya · 2018-10-15T23:37:08Z

The workers did not come up because:

Seeing this error when creating machines on aws

error creating EC2 instance: InvalidParameterValue: Duplicate tag
        key 'kubernetes.io/cluster/ci-op-nn8nk5b5-1d3f3' specified.\n\tstatus code:
        400, request id: a22b595d-6be4-4fe1-b358-c50029d330f9

The machine object:

apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  creationTimestamp: 2018-10-15T23:02:01Z
  finalizers:
  - machine.cluster.k8s.io
  generateName: ci-op-nn8nk5b5-1d3f3-worker-0-
  generation: 1
  labels:
    sigs.k8s.io/cluster-api-cluster: ci-op-nn8nk5b5-1d3f3
    sigs.k8s.io/cluster-api-machine-role: worker
    sigs.k8s.io/cluster-api-machine-type: worker
    sigs.k8s.io/cluster-api-machineset: worker
  name: ci-op-nn8nk5b5-1d3f3-worker-0-jjpq4
  namespace: openshift-cluster-api
  ownerReferences:
  - apiVersion: cluster.k8s.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: MachineSet
    name: ci-op-nn8nk5b5-1d3f3-worker-0
    uid: 42eb4d20-d0ce-11e8-a52e-0a580a020206
  resourceVersion: "275"
  selfLink: /apis/cluster.k8s.io/v1alpha1/namespaces/openshift-cluster-api/machines/ci-op-nn8nk5b5-1d3f3-worker-0-jjpq4
  uid: 5321c7b1-d0ce-11e8-a52e-0a580a020206
spec:
  metadata:
    creationTimestamp: null
  providerConfig:
    ValueFrom: null
    value:
      ami:
        filters:
        - name: name
          values:
          - rhcos*
        - name: architecture
          values:
          - x86_64
        - name: virtualization-type
          values:
          - hvm
        - name: owner-id
          values:
          - "531415883065"
        - name: image-type
          values:
          - machine
        - name: state
          values:
          - available
      apiVersion: aws.cluster.k8s.io/v1alpha1
      iamInstanceProfile:
        id: ci-op-nn8nk5b5-1d3f3-worker-profile
      instanceType: m2.medium
      kind: AWSMachineProviderConfig
      placement:
        region: us-east-1
      securityGroups:
      - filters:
        - name: tag:Name
          values:
          - ci-op-nn8nk5b5-1d3f3_worker_sg
      subnet:
        filters:
        - name: tag:Name
          values:
          - ci-op-nn8nk5b5-1d3f3-worker-*
      tags:
      - name: expirationDate
        value: 2018-10-16T02:52+0000
      - name: kubernetes.io/cluster/ci-op-nn8nk5b5-1d3f3
        value: owned
      - name: tectonicClusterID
        value: 8c846f3f-db33-4707-bb01-e8604f6976f3
      userDataSecret:
        name: worker-user-data
  versions:
    kubelet: ""
status:
  lastUpdated: 2018-10-15T23:12:21Z
  providerStatus:
    apiVersion: aws.cluster.k8s.io/v1alpha1
    conditions:
    - lastProbeTime: 2018-10-15T23:12:21Z
      lastTransitionTime: 2018-10-15T23:02:06Z
      message: "error creating EC2 instance: InvalidParameterValue: Duplicate tag
        key 'kubernetes.io/cluster/ci-op-nn8nk5b5-1d3f3' specified.\n\tstatus code:
        400, request id: a22b595d-6be4-4fe1-b358-c50029d330f9"
      reason: MachineCreationFailed
      status: "True"
      type: MachineCreation
    instanceId: null
    instanceState: null
    kind: AWSMachineProviderStatus

It looks like this code https://github.com/openshift/cluster-api-provider-aws/blob/master/cloud/aws/actuators/machine/actuator.go#L337-L345 is not de-deduplicating the tags.

/cc @bison

dgoodwin · 2018-10-16T11:19:33Z

pkg/asset/machines/worker.go

+
+		tags := map[string]string{
+			"tectonicClusterID":                                         ic.ClusterID,
+			fmt.Sprintf("kubernetes.io/cluster/%s", ic.ObjectMeta.Name): "owned",


Not sure what is more correct here, the actuator would need to ensure the tag is there, you shouldn't need to set this here, and it would be ideal if the actuator caught the duplicate. I'd just drop here and file a request for cloud team to check for dupes.

abhinavdahiya · 2018-10-16T16:18:50Z

opened openshift/cluster-api-provider-aws#73 to track #468 (comment)

bison · 2018-10-16T16:50:32Z

pkg/asset/machines/aws/aws.go

+apiVersion: cluster.k8s.io/v1alpha1
+kind: MachineSet
+metadata:
+  name: {{.ClusterName}}-worker-0


What's up the -0 suffix on these? This will be copied into generateName by the controller, so the nodes will end up with names like openshift-worker-0-abdef.

What's up the -0 suffix on these

We there are going to have multiple machinesets (in aws per az).

so the nodes will end up with names like openshift-worker-0-abdef

I think the node names is not dictated by the machine object name. it will be whatever kubelet decides. on aws it is usually internal name, on libvirt host name.

From the ci:

NAME STATUS ROLES AGE VERSION ip-10-0-13-26.ec2.internal Ready master 10m v1.11.0+d4cacc0 ip-10-0-14-47.ec2.internal Ready bootstrap 10m v1.11.0+d4cacc0 ip-10-0-151-203.ec2.internal Ready worker 5m v1.11.0+d4cacc0 ip-10-0-152-41.ec2.internal Ready worker 5m v1.11.0+d4cacc0 ip-10-0-155-96.ec2.internal Ready worker 4m v1.11.0+d4cacc0 ip-10-0-16-202.ec2.internal Ready master 10m v1.11.0+d4cacc0 ip-10-0-39-160.ec2.internal Ready master 10m v1.11.0+d4cacc0

Yeah, forgot about the multiple AZ thing.

So, yeah, this is kind of confusing. The node object's name is determined by what the kubelet registers. The machine object's name is this string copied into generateName, so basically this with some extra characters.

bison · 2018-10-16T16:51:46Z

I think @trawler is working on the duplicate tags issue.

wking · 2018-10-16T16:55:30Z

From 16e86c9:

This depends on the cluster-apiserver to be deployed by the machine-api-operator...

I still have a pretty fuzzy grasp of how all these operators fit together :p. We could push most of these without waiting on the cluster-API server, right? We just can't push the kind: Cluster template without the custer-API server.

Also, do we want to address "worker" vs. "compute" for new-to-this-PR names?

abhinavdahiya · 2018-10-16T16:59:35Z

We could push most of these without waiting on the cluster-API server, right? We just can't push the kind: Cluster template without the custer-API server.

The api for all these objects is provided by the aggregated apiserver installed by MAO.

Also, do we want to address "worker" vs. "compute" for new-to-this-PR names?

We still call them workers.

wking · 2018-10-16T17:01:05Z

pkg/asset/machines/aws/aws.go

+            filters:
+              - name: "name"
+                values:
+                - rhcos*


We don't want all of this filtering, do we? I think we want to calculate the AMI ID in Go and inject it directly, like we're currently doing over here. That will also make it easier to allow callers to override the AMI ID if/when we restore support for that.

InstallConfig does not have AMI override. So we don't support that for workers.

InstallConfig does not have AMI override. So we don't support that for workers.

But we do support AMI lookup in Go. And we use that lookup to set the master/bootstrap AMI ID here. I think we want to use the same AMI for workers, which means we should pull that lookup out to a higher level so we don't have to perform it twice. InstallConfig seems like a reasonable place to put it, but I'm ok if you want to stick it somewhere else for the time being. I don't think we need to supply a user prompt for it.

I think we want to use the same AMI for workers.

That's not true. New workers created by scaling the machineset should not be stuck with old amis.
Bootstrap and master is separate because we create them.
But in future we want masters to be adopted by the something similar

I think we want to use the same AMI for workers.

That's not true. New workers created by scaling the machineset should not be stuck with old amis.
Bootstrap and master is separate because we create them.

And with this PR, we're effectively creating the workers as well.

But in future we want masters to be adopted by the something similar

Bootstrap doesn't matter; since it's a throw-away node. I expect we'll want a machine-set for masters too (or is there a different approach to scaling masters?). And however we handle rolling updates for workers, I expect we'll want the same handling for masters. Isn't that what the machine-config operator is going to handle? Can't it update the machine-sets when it decides to bump to a new AMI? For now, I thought we'd want to punt on all of this and just pin to the creation-time AMI for both masters and workers.

Isn't that what the machine-config operator is going to handle? Can't it update the machine-sets when it decides to bump to a new AMI?

it doesn't control machine characteristics. So no.

I thought we'd want to punt on all of this and just pin to the creation-time AMI for both masters and workers.

I would still want installer to not choose the ami for worker.

Isn't that what the machine-config operator is going to handle? Can't it update the machine-sets when it decides to bump to a new AMI?

it doesn't control machine characteristics. So no.

Who is (or will be?) in charge of rolling AMI updates out across masters/workers? Or maybe with #281 in the pipe, nobody will care about AMIs at all?

I thought we'd want to punt on all of this and just pin to the creation-time AMI for both masters and workers.

I would still want installer to not choose the ami for worker.

But you're fine having it chose the AMI for the masters? They feel like the same thing to me. Would you rather we drop the AMI lookup from Go and just perform the lookup via tags? See also openshift/os#353 about long-term issues with the tag-based approach and release channels. It's not clear to me how the approach you have here will transition to the eventual non-tag AMI lookup.

Who is (or will be?) in charge of rolling AMI updates out across masters/workers?

That is TBD.

But you're fine having it chose the AMI for the masters?

I don't want to do that for masters too. but we cannot do that now as we cannot run the cluster-api stack during bootstrap, because of various reasons, to give us masters.

Switched to specific AMI to keep consistency with current MAO implementation.

wking · 2018-10-16T17:12:57Z

pkg/asset/machines/cluster_k8s_io.go

+    pods:
+      cidrBlocks:
+      - {{.PodCIDR}}
+    serviceDomain: unused


We've had this since openshift/machine-api-operator@04253cb9, but I'm not clear on why we aren't using our base domain or some such. @enxebre, do you remember why you chose this value?

None of the cluster object fields itself is used. However it's currently coupled in the actuator interface upstream hence we need it to exist for now https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machine/actuator.go#L25

wking · 2018-10-16T17:16:22Z

pkg/asset/machines/libvirt/libvirt.go

+          kind: LibvirtMachineProviderConfig
+          domainMemory: 2048
+          domainVcpu: 2
+          ignKey: /var/lib/libvirt/images/worker.ign


Can we make this just worker.ign? More on why in openshift/machine-api-operator#70, in case you want any fodder for the explanatory commit message (although we probably don't have time to explain everything in these big "drop in a bunch of stuff which was worked up somewhere else" commits).

I'm fine with changing this to worker.ign.

Long term i want consistency between libvirt and aws, as AWS uses a secret to specify the useradata.

@wking this doesn't work

The actuator doesn't support this:
locally

Coud not create libvirt machine: error creating domain: error creating libvirt domain: virError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2018-10-16T17:59:15.773168Z qemu-system-x86_64: -fw_cfg name=opt/com.coreos/config,file=worker.ign: can't load worker.ign')

Add assets that can create the machinesets for workers. This depends on the cluster-apiserver to be deployed by the machine-api-operator, so the tectonic.sh will block on creating these machinesets objects until then. Using specific AMI for worker machinesets to keep consistency with current MAO implementation.

wking · 2018-10-16T19:17:58Z

pkg/asset/machines/worker.go

+
+		ctx, cancel := context.WithTimeout(context.TODO(), 60*time.Second)
+		defer cancel()
+		ami, err := rhcos.AMI(ctx, rhcos.DefaultChannel, config.Region)


This duplicates the later lookup for master/bootstrap AMIs. Are we ok with the redundant request (and possible race coming up with two different AMIs), or do we want to pull this lookup back to a separate location where it can be shared by both consumers?

This is not different than what we have today, in MAO here

Also when we move masters to machinesets, it will end up in same location.

wking · 2018-10-16T22:33:06Z

I'll follow up on the /var/lib/libvirt stuff later.

/lgtm

openshift-ci-robot · 2018-10-16T22:33:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Adds ClusterK8sIO from e2dc955 (pkg/asset: add ClusterK8sIO, machines.Worker assets, 2018-10-15, openshift#468), Master from 586ad45 (pkg/asset: Add asset for Master machines, 2018-10-18, openshift#491). Removes KubeCoreOperator from c9b0e2f (manifests: stop using kube core operator, 2018-10-08, openshift#420). Generated with: $ openshift-install graph | dot -Tsvg >docs/design/resource_dep.svg using: $ dot -V dot - graphviz version 2.30.1 (20170916.1124)

Adds ClusterK8sIO from e2dc955 (pkg/asset: add ClusterK8sIO, machines.Worker assets, 2018-10-15, openshift#468) and Master from 586ad45 (pkg/asset: Add asset for Master machines, 2018-10-18, openshift#491). Removes KubeCoreOperator from c9b0e2f (manifests: stop using kube core operator, 2018-10-08, openshift#420). Generated with: $ openshift-install graph | dot -Tsvg >docs/design/resource_dep.svg using: $ dot -V dot - graphviz version 2.30.1 (20170916.1124)

We'd added this in 4d636d3 (asset/manifests: bootstrap manifest generation, 2018-08-31, openshift#286) to support the machine-API operator which had been generating worker machine-sets. But since e2dc955 (pkg/asset: add ClusterK8sIO, machines.Worker assets, 2018-10-15, openshift#468), we've been creating those machine-sets ourselves. And the machine-API operator dropped their consuming code in openshift/machine-api-operator@c59151f6 (delete machine/cluster object loops, 2018-10-22, openshift/machine-api-operator#286). Dropping this dependency fixes bootstrap Ignition config edits during a multi-step deploy. For example, with: $ openshift-install --dir=wking create ignition-configs $ tree wking wking ├── bootstrap.ign ├── master-0.ign └── worker.ign before this commit, any edits to bootstrap.ign were clobbered in a subsequent 'create cluster' call, because: 1. The bootstrap Ignition config depends on the manifests. 2. The manifests depended on the worker Ignition config. 3. The worker Ignition config is on disk, so it gets marked dirty.

Workers have not had public IPs since (at least) we moved them to cluster-API creation in e2dc955 (pkg/asset: add ClusterK8sIO, machines.Worker assets, 2018-10-15, openshift#468). But it turns out a number of e2e tests assume SSH access to workers (e.g. [1]), and we don't have time to fix those tests now. We'll remove this once the tests have been fixed. [1]: https://github.com/kubernetes/kubernetes/blob/v1.13.1/test/e2e/node/ssh.go#L43

pkg/types: add Set to *MachinePoolPlatform

55c9cb0

This allows combining values for machine configuration from various sources like, 1. Defaults decided by installer 2. Platform level defaults in InstallConfig 3. Per pool options.

openshift-ci-robot requested review from crawford, trawler and wking October 15, 2018 22:48

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 15, 2018

abhinavdahiya changed the title ~~pkg/asset: Add assest for Worker machinesets~~ pkg/asset: Add asset for Worker machinesets Oct 15, 2018

dgoodwin reviewed Oct 16, 2018

View reviewed changes

abhinavdahiya force-pushed the worker_machinesets branch from c6ca08a to 4c0a98f Compare October 16, 2018 16:16

abhinavdahiya force-pushed the worker_machinesets branch from 4c0a98f to 16e86c9 Compare October 16, 2018 16:38

bison reviewed Oct 16, 2018

View reviewed changes

wking reviewed Oct 16, 2018

View reviewed changes

abhinavdahiya force-pushed the worker_machinesets branch 3 times, most recently from a4e70fb to b9ef603 Compare October 16, 2018 18:54

abhinavdahiya force-pushed the worker_machinesets branch from b9ef603 to e2dc955 Compare October 16, 2018 18:56

wking reviewed Oct 16, 2018

View reviewed changes

openshift-ci-robot assigned wking Oct 16, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 16, 2018

openshift-merge-robot merged commit 079fe51 into openshift:master Oct 16, 2018

trawler mentioned this pull request Oct 17, 2018

Drop duplicate instance tags, if exists openshift/cluster-api-provider-aws#78

Merged

wking mentioned this pull request Oct 20, 2018

docs/design/resource_dep: Rebuild after kube-core removal #508

Merged

wking mentioned this pull request Oct 24, 2018

pkg/asset/manifests/operators: Drop worker-ignition dependency #529

Merged

This was referenced Oct 29, 2018

Respect InstallConfig.Platform.AWS.UserTags openshift/machine-api-operator#97

Closed

Configurable instance types on AWS? openshift/machine-api-operator#85

Closed

Dropping Container Linux? openshift/machine-api-operator#84

Closed

wking mentioned this pull request Dec 4, 2018

Tag master machines for adoption by machine controller #479

Merged

wking mentioned this pull request Dec 17, 2018

pkg/asset/machines/aws/machinesets: Give workers public IPs (for now) #927

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/asset: Add asset for Worker machinesets #468

pkg/asset: Add asset for Worker machinesets #468

abhinavdahiya commented Oct 15, 2018

abhinavdahiya commented Oct 15, 2018

abhinavdahiya commented Oct 15, 2018 •

edited

Loading

dgoodwin Oct 16, 2018

abhinavdahiya commented Oct 16, 2018

bison Oct 16, 2018

abhinavdahiya Oct 16, 2018 •

edited

Loading

bison Oct 16, 2018

bison commented Oct 16, 2018

wking commented Oct 16, 2018

abhinavdahiya commented Oct 16, 2018

wking Oct 16, 2018

abhinavdahiya Oct 16, 2018

wking Oct 16, 2018

abhinavdahiya Oct 16, 2018

wking Oct 16, 2018

abhinavdahiya Oct 16, 2018

wking Oct 16, 2018 •

edited

Loading

abhinavdahiya Oct 16, 2018

abhinavdahiya Oct 16, 2018

wking Oct 16, 2018

enxebre Oct 16, 2018

wking Oct 16, 2018

abhinavdahiya Oct 16, 2018

abhinavdahiya Oct 16, 2018 •

edited

Loading

wking Oct 16, 2018

abhinavdahiya Oct 16, 2018

abhinavdahiya Oct 16, 2018

wking commented Oct 16, 2018

openshift-ci-robot commented Oct 16, 2018

pkg/asset: Add asset for Worker machinesets #468

pkg/asset: Add asset for Worker machinesets #468

Conversation

abhinavdahiya commented Oct 15, 2018

abhinavdahiya commented Oct 15, 2018

abhinavdahiya commented Oct 15, 2018 • edited Loading

Choose a reason for hiding this comment

abhinavdahiya commented Oct 16, 2018

Choose a reason for hiding this comment

abhinavdahiya Oct 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bison commented Oct 16, 2018

wking commented Oct 16, 2018

abhinavdahiya commented Oct 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wking Oct 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavdahiya Oct 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wking commented Oct 16, 2018

openshift-ci-robot commented Oct 16, 2018

abhinavdahiya commented Oct 15, 2018 •

edited

Loading

abhinavdahiya Oct 16, 2018 •

edited

Loading

wking Oct 16, 2018 •

edited

Loading

abhinavdahiya Oct 16, 2018 •

edited

Loading