Remove InstanceGroup from NodeupModelContext #9294

johngmyers · 2020-06-08T03:00:07Z

Move all data from InstanceGroup into NodeupModelContext.NodeupConfig to address some of the issues mentioned in #9229.

~~There could be an issue if the Hooks or FileAssets are too big to fit in userdata.~~

Creates a new NodeupAuxConfig struct which is read from an instancegroup-specific file in the state store.

hakman · 2020-06-08T15:22:44Z

/retest

johngmyers · 2020-06-08T16:29:11Z

/retest

hakman · 2020-06-08T16:34:03Z

I think we may need to rebase :)

johngmyers · 2020-06-12T05:48:52Z

/retest

johngmyers · 2020-06-17T20:07:59Z

/retest

justinsb · 2020-06-18T03:23:28Z

Sorry about the delay in reviewing this. So I believe we had problems fitting in the 16KB limit in the past, which is why we had this split. I don't have an example to hand though. I suspect the most likely case would be if we were adding some certificates or keys, simply because those compress so poorly.

For the nodes, we could move to fetching this data from kops-controller, which would have two advantages: no S3 access required (so fewer node permissions), and effectively no limit on the size.

That doesn't really help us on the control plane nodes, though.

I do really like the simplification of the code here, so I would like to see us get this in; we just need to satisfy ourselves that we aren't going to paint users that have large fileAssets/keys into a corner.

I propose:

We verify that we do compress the userdata if it's > 16KB, just so we know the limit we're dealing with on AWS (GCE has a similar limit, but it's much bigger - 256KB if memory holds)
We bring it up at office hours on Friday, to see if there are use cases we're missing (and if people still do this)
We think about adding a external mechanism for these files: PKI data could be referenced from the kops CAStore, fileAssets or hooks could be referenced by URL (including an S3 url). I don't love the URL option (in particular), I just think we probably need to offer something here.

johngmyers · 2020-06-18T04:49:29Z

The only other thing I can think of is store them in a per-launchconfiguration directory in vfs with deletion tied into the code that deletes old launchconfigurations (/launchtemplates).

k8s-ci-robot · 2021-06-04T05:10:25Z

@johngmyers: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kops-verify-hashes	`1f8d717`	link	`/test pull-kops-verify-hashes`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

johngmyers · 2021-06-04T05:29:25Z

/retest

justinsb · 2021-06-12T17:16:55Z

pkg/client/simple/vfsclientset/clientset.go

@@ -161,6 +161,9 @@ func DeleteAllClusterState(basePath vfs.Path) error {
 		if strings.HasPrefix(relativePath, "instancegroup/") {
 			continue
 		}
+		if strings.HasPrefix(relativePath, "igconfig/") {


How about nodeconfig or configversion or fullconfig?

It's the config specific to particular instancegroups. First I partition by IG role (because that's the granularity our IAM roles currently have), then by IG. Whether the config is "full" or versioned is secondary to this partitioning.

justinsb · 2021-06-12T17:25:25Z

pkg/model/bootstrapscript.go

+	c.AddTask(&fitasks.ManagedFile{
+		Name:      fi.String("auxconfig-" + ig.Name),
+		Lifecycle: b.Lifecycle,
+		Location:  fi.String("igconfig/" + strings.ToLower(string(ig.Spec.Role)) + "/" + ig.Name + "/auxconfig.yaml"),


Does strings.ToLower(string(ig.Spec.Role)) need to be in the path?

That's so we can deny the nodes IG role read access to other IG roles' config.

justinsb · 2021-06-12T17:34:56Z

pkg/apis/nodeup/config.go

@@ -53,6 +53,8 @@ type Config struct {

 	// DefaultMachineType is the first-listed instance machine type, used if querying instance metadata fails.
 	DefaultMachineType *string `json:",omitempty"`
+	// EnableLifecycleHook defines whether we need to complete a lifecycle hook.
+	EnableLifecycleHook bool `json:",omitempty"`


Should we put these fields (EnableLifecycleHook, UpdatePolicy) into the AuxConfig? I figure we make this config only what we need to load the other configuration.

I've been leaning towards putting as much of the small stuff into the userdata as possible. If we can gain confidence that we have enough testing to catch any breakage in the AuxConfig mechanism, I'd like to have nodeup skip loading the AuxConfig in the common case where everything fits in userdata.

I think we can get to the case where in the common case the worker nodes don't need any access to the state store.

Ah, I see. If you have to go to the state store, you might as well get everything from there instead of going field-by-field. So we'd put everything in Config and when it's too big, spill it to the state store and put a smaller bootstrap thing in userdata.

I'll put it on the backlog, but unblocking v1beta1 apiversion is higher priority.

justinsb · 2021-06-12T18:09:25Z

pkg/apis/nodeup/config.go

+}
+
+// AuxConfig is the configuration for the nodeup binary that might be too big to fit in userdata.
+type AuxConfig struct {


Maybe instead of calling this AuxConfig, NodeFullConfig would be consistent with ClusterFullConfig. I see, nodeup.Config as more the auxiliary configuration ... it is the small config (constrained by userdata size) that lets us locate and load the real config.

I'd prefer not to put duplicate, unused fields in this object.

I have thought of using the same schema for userdata's Config and this file in the state store. That would allow putting the hooks, etc. in the userdata when they fit. The userdata would then have metadata saying "pull these fields from the one in the state store" when things don't fit.

But that is all more complicated and should be deferred until after we get more experience and better testing around the exceptional cases.

justinsb · 2021-06-12T18:11:00Z

A few quibbles, but I like where this is going and I think we can discuss the quibbles separately.

Going to apply a hold in case you immediately agree with any of the suggestions (and want to change them here), but otherwise please just remove the hold.

/approve
/lgtm

/hold

k8s-ci-robot · 2021-06-12T18:11:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [justinsb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

johngmyers · 2021-06-12T19:58:56Z

These comments are all things that can be dealt with in future PRs, possibly in future releases.
/hold cancel

k8s-ci-robot requested review from gjtempleton and joshbranham June 8, 2020 03:00

k8s-ci-robot added area/api size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/nodeup cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 8, 2020

johngmyers force-pushed the refactor-nodeup-context branch from 6b6038e to ceb2ae7 Compare June 8, 2020 03:14

johngmyers force-pushed the refactor-nodeup-context branch 2 times, most recently from 3b87312 to 04c870b Compare June 9, 2020 06:52

johngmyers changed the title ~~Remove InstanceGroup from NodeupModelContext~~ WIP Remove InstanceGroup from NodeupModelContext Jun 9, 2020

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2020

johngmyers force-pushed the refactor-nodeup-context branch 2 times, most recently from 4772acf to 9ee7b8c Compare June 10, 2020 21:04

johngmyers changed the title ~~WIP Remove InstanceGroup from NodeupModelContext~~ Remove InstanceGroup from NodeupModelContext Jun 10, 2020

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 10, 2020

johngmyers force-pushed the refactor-nodeup-context branch from 9ee7b8c to 482807b Compare June 12, 2020 02:40

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 12, 2020

johngmyers force-pushed the refactor-nodeup-context branch from 482807b to b5cf082 Compare June 12, 2020 04:12

johngmyers mentioned this pull request Jun 13, 2020

Continue refactoring certs into nodeup #9354

Merged

johngmyers force-pushed the refactor-nodeup-context branch 2 times, most recently from cb7bec1 to aed047f Compare June 17, 2020 16:20

johngmyers mentioned this pull request Jun 18, 2020

Start moving InstanceGroup data to NodeupConfig #9391

Merged

johngmyers added 12 commits June 3, 2021 21:04

Pass AuxConfig to nodeup

eb09d31

hack/update-expected.sh

9cba5e3

Include AuxConfig output in TestBootstrapUserData

c3c1aca

Move Hooks into the NodeupAuxConfig

06658c9

hack/update-expected.sh

4bf9150

Move FileAssets into the NodeupAuxConfig

59c8826

hack/update-expected.sh

221f02b

Move UpdatePolicy into NodeConfig

14ab4a3

hack/update-expected.sh

91d81e5

Move EnableLifecycleHook to NodeConfig

5d5a410

Remove InstanceGroup from NodeupModelContext

b45c0b4

hack/update-expected.sh

1db6e31

johngmyers force-pushed the refactor-nodeup-context branch from ca2c33f to 1db6e31 Compare June 4, 2021 04:30

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2021

johngmyers requested a review from justinsb June 5, 2021 23:49

johngmyers mentioned this pull request Jun 8, 2021

Pre-pull all container images used by components and addons #11717

Merged

justinsb reviewed Jun 12, 2021

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2021

k8s-ci-robot assigned justinsb Jun 12, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 12, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2021

k8s-ci-robot merged commit cfc93e5 into kubernetes:master Jun 12, 2021

k8s-ci-robot added this to the v1.22 milestone Jun 12, 2021

johngmyers deleted the refactor-nodeup-context branch June 12, 2021 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove InstanceGroup from NodeupModelContext #9294

Remove InstanceGroup from NodeupModelContext #9294

johngmyers commented Jun 8, 2020 •

edited

Loading

hakman commented Jun 8, 2020

johngmyers commented Jun 8, 2020

hakman commented Jun 8, 2020

johngmyers commented Jun 12, 2020

johngmyers commented Jun 17, 2020

justinsb commented Jun 18, 2020

johngmyers commented Jun 18, 2020

k8s-ci-robot commented Jun 4, 2021 •

edited

Loading

johngmyers commented Jun 4, 2021

justinsb Jun 12, 2021

johngmyers Jun 12, 2021

justinsb Jun 12, 2021

johngmyers Jun 12, 2021 •

edited

Loading

justinsb Jun 12, 2021

johngmyers Jun 12, 2021 •

edited

Loading

johngmyers Jun 12, 2021

justinsb Jun 12, 2021

johngmyers Jun 12, 2021

justinsb commented Jun 12, 2021 •

edited

Loading

k8s-ci-robot commented Jun 12, 2021

johngmyers commented Jun 12, 2021

Remove InstanceGroup from NodeupModelContext #9294

Remove InstanceGroup from NodeupModelContext #9294

Conversation

johngmyers commented Jun 8, 2020 • edited Loading

hakman commented Jun 8, 2020

johngmyers commented Jun 8, 2020

hakman commented Jun 8, 2020

johngmyers commented Jun 12, 2020

johngmyers commented Jun 17, 2020

justinsb commented Jun 18, 2020

johngmyers commented Jun 18, 2020

k8s-ci-robot commented Jun 4, 2021 • edited Loading

johngmyers commented Jun 4, 2021

justinsb Jun 12, 2021

Choose a reason for hiding this comment

johngmyers Jun 12, 2021

Choose a reason for hiding this comment

justinsb Jun 12, 2021

Choose a reason for hiding this comment

johngmyers Jun 12, 2021 • edited Loading

Choose a reason for hiding this comment

justinsb Jun 12, 2021

Choose a reason for hiding this comment

johngmyers Jun 12, 2021 • edited Loading

Choose a reason for hiding this comment

johngmyers Jun 12, 2021

Choose a reason for hiding this comment

justinsb Jun 12, 2021

Choose a reason for hiding this comment

johngmyers Jun 12, 2021

Choose a reason for hiding this comment

justinsb commented Jun 12, 2021 • edited Loading

k8s-ci-robot commented Jun 12, 2021

johngmyers commented Jun 12, 2021

johngmyers commented Jun 8, 2020 •

edited

Loading

k8s-ci-robot commented Jun 4, 2021 •

edited

Loading

johngmyers Jun 12, 2021 •

edited

Loading

johngmyers Jun 12, 2021 •

edited

Loading

justinsb commented Jun 12, 2021 •

edited

Loading