Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster spec to node user data so component config changes are detected #3120

Conversation

KashifSaadat
Copy link
Contributor

@KashifSaadat KashifSaadat commented Aug 2, 2017

Related to #3076

Some cluster changes such as component config modifications are not picked up when performing updates (nodes are not marked as NEEDUPDATE). This change introduces the ability to:

  1. Include certain cluster specs within the node user data file (enableClusterSpecInUserData: true)
  2. Encode the cluster spec string before placing within the user data file (enableClusterSpecInUserData: true)

The above flags default to false so shouldn't cause any changes to existing clusters.

Following feedback I've removed the optional API flags, so component config is included by default within the user data. This WILL cause all nodes to have a required update to their bootstrap scripts.

@k8s-ci-robot
Copy link
Contributor

Hi @KashifSaadat. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 2, 2017
@KashifSaadat KashifSaadat force-pushed the diff-on-component-config-changes branch from 5a22358 to c497d4d Compare August 2, 2017 16:12
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2017
@KashifSaadat
Copy link
Contributor Author

/assign @zmerlynn

@justinsb
Copy link
Member

justinsb commented Aug 4, 2017

So I was a little confused, but then I realized that we never merged #2167

I don't think this needs to be optional; it should be the default behaviour. In #2167 I just hashed "everything", and I like how you only hash the relevant data. My thought was to start with everything and then start excluding things we knew to be irrelevant.

I'm not sure how we want to reconcile the two approaches. We could merge #2167 and then we can take your code to be more selective in what we fingerprint. Or we could remove the API fields from this PR (if you agree) and I think this PR will then be simpler.

I'll rebase #2167 - feel free to have a look and review. If you like it you can even comment with LGTM, even if the bots don't technically count it, I always encourage people to review & LGTM - at the end of the day there are people making the final decisions :-)

If you'd rather follow this PR, that's great also. My instinct is that we don't need the API fields, but I am open to an explanation of why they are needed :-)

@justinsb justinsb assigned justinsb and unassigned zmerlynn Aug 4, 2017
@justinsb justinsb self-requested a review August 4, 2017 15:09
@justinsb
Copy link
Member

justinsb commented Aug 4, 2017

Also I just sent #3139 which exposes the name of the tempfile used during the CF tests, which can be a time-saver :-)

@KashifSaadat
Copy link
Contributor Author

Hey @justinsb, thank you for the feedback and the call! :)

I'll get the PR updated and re-push following your comments (full cluster spec not hashed, drop API fields).

@KashifSaadat KashifSaadat force-pushed the diff-on-component-config-changes branch 2 times, most recently from 4cedca3 to 03e1a71 Compare August 7, 2017 10:06
@KashifSaadat
Copy link
Contributor Author

Updated the PR following comments, let me know what you think! :)

@@ -34,7 +37,7 @@ type BootstrapScript struct {
NodeUpConfigBuilder func(ig *kops.InstanceGroup) (*nodeup.Config, error)
}

func (b *BootstrapScript) ResourceNodeUp(ig *kops.InstanceGroup) (*fi.ResourceHolder, error) {
func (b *BootstrapScript) ResourceNodeUp(ig *kops.InstanceGroup, cs *kops.ClusterSpec) (*fi.ResourceHolder, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have a comment on the exported func

@KashifSaadat KashifSaadat force-pushed the diff-on-component-config-changes branch 4 times, most recently from 6b4cccf to c4b72db Compare August 7, 2017 12:58
@KashifSaadat
Copy link
Contributor Author

Hi @justinsb, updated the PR following comments & test failures:

  • API flags have been removed which has simplified the logic
  • Tests updated accordingly (including integration test with user data base64 check)
  • Validated locally and can see that PopulateClusterSpec is called on cluster apply, so the full component config data is correctly included within the bootstrap script

@k8s-github-robot
Copy link

@KashifSaadat PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2017
@KashifSaadat KashifSaadat force-pushed the diff-on-component-config-changes branch from c4b72db to 56d0748 Compare August 8, 2017 08:12
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2017
@KashifSaadat
Copy link
Contributor Author

KashifSaadat commented Aug 9, 2017

Friendly poke @justinsb :) Also included parts of the IG Spec (Node Labels, Taints, Kubelet override), as it seems these were also not being picked up correctly: e0461b9

@KashifSaadat KashifSaadat force-pushed the diff-on-component-config-changes branch from 6d9b1fd to f4b7909 Compare August 9, 2017 12:25
so component config changes are detected and causes nodes to be updated
@KashifSaadat KashifSaadat force-pushed the diff-on-component-config-changes branch from f4b7909 to e0461b9 Compare August 9, 2017 13:15
@justinsb
Copy link
Member

/ok-to-test

/lgtm

Looks great - solves a real problem, and positions us very well to actually pass the data to nodeup via this mechanism (assuming that we can squeeze under the size limit!)

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 11, 2017
spec["masterKubelet"] = cs.MasterKubelet
}

content, err := yaml.Marshal(spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should actually use the typed version i.e. take the spec & zero out the bits we don't want, and then use the existing API marshalling. Then we could use this as the versioned API between kops & nodeup...

But I'm getting ahead of myself - this 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good option. When I was using the existing API marshalling I was doing conversion from Spec -> JSON -> YAML, which looked a bit messy. Was a bit unsure on the most ideal approach.

{{ ClusterSpec }}
__EOF_CLUSTER_SPEC

cat > ig_spec.yaml << __EOF_IG_SPEC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is going :-)

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KashifSaadat, justinsb

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 11, 2017
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit b7efd3b into kubernetes:master Aug 11, 2017
k8s-github-robot pushed a commit that referenced this pull request Aug 18, 2017
Automatic merge from submit-queue

Add hooks to bootstrapscript output

Ping @gambol99 

Related to:
- #3120
- #3063

Cluster Hooks are now added in their entirety to the nodeup user data scripts, dependant on whether the instance group roles match. This could increase the size of the user-data files significantly (getting closer to the 16KB mark) depending on users' hooks content. I considered hashing and fingerprinting the cluster hooks, but following discussions in #3120 thought we should keep the full output instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants