Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove InstanceGroup from NodeupModelContext #9294

Merged
merged 14 commits into from
Jun 12, 2021

Conversation

johngmyers
Copy link
Member

@johngmyers johngmyers commented Jun 8, 2020

Move all data from InstanceGroup into NodeupModelContext.NodeupConfig to address some of the issues mentioned in #9229.

There could be an issue if the Hooks or FileAssets are too big to fit in userdata.

Creates a new NodeupAuxConfig struct which is read from an instancegroup-specific file in the state store.

@k8s-ci-robot k8s-ci-robot added area/api size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/nodeup cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 8, 2020
@hakman
Copy link
Member

hakman commented Jun 8, 2020

/retest

1 similar comment
@johngmyers
Copy link
Member Author

/retest

@hakman
Copy link
Member

hakman commented Jun 8, 2020

I think we may need to rebase :)

@johngmyers johngmyers force-pushed the refactor-nodeup-context branch 2 times, most recently from 3b87312 to 04c870b Compare June 9, 2020 06:52
@johngmyers johngmyers changed the title Remove InstanceGroup from NodeupModelContext WIP Remove InstanceGroup from NodeupModelContext Jun 9, 2020
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2020
@johngmyers johngmyers force-pushed the refactor-nodeup-context branch 2 times, most recently from 4772acf to 9ee7b8c Compare June 10, 2020 21:04
@johngmyers johngmyers changed the title WIP Remove InstanceGroup from NodeupModelContext Remove InstanceGroup from NodeupModelContext Jun 10, 2020
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 10, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 12, 2020
@johngmyers
Copy link
Member Author

/retest

@johngmyers johngmyers force-pushed the refactor-nodeup-context branch 2 times, most recently from cb7bec1 to aed047f Compare June 17, 2020 16:20
@johngmyers
Copy link
Member Author

/retest

@justinsb
Copy link
Member

Sorry about the delay in reviewing this. So I believe we had problems fitting in the 16KB limit in the past, which is why we had this split. I don't have an example to hand though. I suspect the most likely case would be if we were adding some certificates or keys, simply because those compress so poorly.

For the nodes, we could move to fetching this data from kops-controller, which would have two advantages: no S3 access required (so fewer node permissions), and effectively no limit on the size.

That doesn't really help us on the control plane nodes, though.

I do really like the simplification of the code here, so I would like to see us get this in; we just need to satisfy ourselves that we aren't going to paint users that have large fileAssets/keys into a corner.

I propose:

  • We verify that we do compress the userdata if it's > 16KB, just so we know the limit we're dealing with on AWS (GCE has a similar limit, but it's much bigger - 256KB if memory holds)
  • We bring it up at office hours on Friday, to see if there are use cases we're missing (and if people still do this)
  • We think about adding a external mechanism for these files: PKI data could be referenced from the kops CAStore, fileAssets or hooks could be referenced by URL (including an S3 url). I don't love the URL option (in particular), I just think we probably need to offer something here.

@johngmyers
Copy link
Member Author

The only other thing I can think of is store them in a per-launchconfiguration directory in vfs with deletion tied into the code that deletes old launchconfigurations (/launchtemplates).

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2021
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 4, 2021

@johngmyers: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kops-verify-hashes 1f8d717 link /test pull-kops-verify-hashes

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@johngmyers
Copy link
Member Author

/retest

@@ -161,6 +161,9 @@ func DeleteAllClusterState(basePath vfs.Path) error {
if strings.HasPrefix(relativePath, "instancegroup/") {
continue
}
if strings.HasPrefix(relativePath, "igconfig/") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about nodeconfig or configversion or fullconfig?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the config specific to particular instancegroups. First I partition by IG role (because that's the granularity our IAM roles currently have), then by IG. Whether the config is "full" or versioned is secondary to this partitioning.

c.AddTask(&fitasks.ManagedFile{
Name: fi.String("auxconfig-" + ig.Name),
Lifecycle: b.Lifecycle,
Location: fi.String("igconfig/" + strings.ToLower(string(ig.Spec.Role)) + "/" + ig.Name + "/auxconfig.yaml"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does strings.ToLower(string(ig.Spec.Role)) need to be in the path?

Copy link
Member Author

@johngmyers johngmyers Jun 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's so we can deny the nodes IG role read access to other IG roles' config.

@@ -53,6 +53,8 @@ type Config struct {

// DefaultMachineType is the first-listed instance machine type, used if querying instance metadata fails.
DefaultMachineType *string `json:",omitempty"`
// EnableLifecycleHook defines whether we need to complete a lifecycle hook.
EnableLifecycleHook bool `json:",omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put these fields (EnableLifecycleHook, UpdatePolicy) into the AuxConfig? I figure we make this config only what we need to load the other configuration.

Copy link
Member Author

@johngmyers johngmyers Jun 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been leaning towards putting as much of the small stuff into the userdata as possible. If we can gain confidence that we have enough testing to catch any breakage in the AuxConfig mechanism, I'd like to have nodeup skip loading the AuxConfig in the common case where everything fits in userdata.

I think we can get to the case where in the common case the worker nodes don't need any access to the state store.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. If you have to go to the state store, you might as well get everything from there instead of going field-by-field. So we'd put everything in Config and when it's too big, spill it to the state store and put a smaller bootstrap thing in userdata.

I'll put it on the backlog, but unblocking v1beta1 apiversion is higher priority.

}

// AuxConfig is the configuration for the nodeup binary that might be too big to fit in userdata.
type AuxConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of calling this AuxConfig, NodeFullConfig would be consistent with ClusterFullConfig. I see, nodeup.Config as more the auxiliary configuration ... it is the small config (constrained by userdata size) that lets us locate and load the real config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to put duplicate, unused fields in this object.

I have thought of using the same schema for userdata's Config and this file in the state store. That would allow putting the hooks, etc. in the userdata when they fit. The userdata would then have metadata saying "pull these fields from the one in the state store" when things don't fit.

But that is all more complicated and should be deferred until after we get more experience and better testing around the exceptional cases.

@justinsb
Copy link
Member

justinsb commented Jun 12, 2021

A few quibbles, but I like where this is going and I think we can discuss the quibbles separately.

Going to apply a hold in case you immediately agree with any of the suggestions (and want to change them here), but otherwise please just remove the hold.

/approve
/lgtm

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2021
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 12, 2021
@johngmyers
Copy link
Member Author

These comments are all things that can be dealt with in future PRs, possibly in future releases.
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2021
@k8s-ci-robot k8s-ci-robot merged commit cfc93e5 into kubernetes:master Jun 12, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 12, 2021
@johngmyers johngmyers deleted the refactor-nodeup-context branch June 12, 2021 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api area/kops-controller area/nodeup area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider area/provider/openstack Issues or PRs related to openstack provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants