Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OCPCLOUD-1209] Run KubeletConfig FeatureGate sync during bootstrap #2668

Merged

Conversation

JoelSpeed
Copy link
Contributor

@JoelSpeed JoelSpeed commented Jul 8, 2021

Note, the first two commits of this PR are currently the content of #2647, there would be conflicts if I tried to implement it individually so thought it was quicker to leave this based on that.

- What I did
I've extracted some of the logic in the kubeletconfig package to standalone functions rather than being part of the Controller. This is the first commit "Extract KubeConfig FeatureGate ignition to a standalone function"

Then I've copied the logic from the feature gate sync into a bootstrap ready RunFeatureGateBootstrap function, and then we run this if a FeatureGate is detected in the manifests directory.

- How to verify it

  • openshift-install create install-config
  • openshift-install create manifests
  • Add a FeatureGate yaml resource to the manifests directory
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  name: cluster
spec:
  featureSet: TechPreviewNoUpgrade
  • openshift-install create cluster
  • Observe that the cluster bootstraps correctly

Without this fix, there is a discrepancy in the /etc/mcs-machine-config-content.json on the master hosts and the initial rendered master, because the feature gate sync handler has not run before the initial bootstrap machine config is generated.

This has been manually tested by me locally by taking the resources of a cluster I bootstrapped, from the bootstrap node, and running this patch locally and observing the output of the rendered master config.

- Description for the changelog

Clusters bootstrapped with a FeatureGate will now successfully complete installation.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 8, 2021
@JoelSpeed JoelSpeed changed the title [WIP] Run KubeletConfig FeatureGate sync during bootstrap [OCPCLOUD-1209] Run KubeletConfig FeatureGate sync during bootstrap Jul 8, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 8, 2021
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General code LGTM, 2 questions below:

Also cc @rphillips and @QiWang19 from the kubelet controller side

Edit: 1 more question: just to make sure, if you supply a kubeletconfig and a featuregate, they do not conflict right?

@@ -112,6 +116,10 @@ func (b *Bootstrap) Run(destDir string) error {
icspRules = append(icspRules, obj)
case *apicfgv1.Image:
imgCfg = obj
case *apicfgv1.FeatureGate:
if obj.GetName() == ctrlcommon.ClusterFeatureInstanceName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm interpreting this correctly, since you are not storing this as an array, the expected behaviour is that the user only applies 1 config. On the off chance they supply multiple, it will not error but instead have the higher alphanumeric take priority?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, so the expectation is that a customer should only create one, whichever is read last will win, not sure if you want to have logic in here to account for that or not? But I'd expect a customer to only supply one.

Also note, it has to have a particular name, cluster, so that's why we expect a singleton here

// Check to see if FeatureGates are equal
if reflect.DeepEqual(originalKubeConfig.FeatureGates, featureGates) {
// When there is no difference, this isn't an error, but no machine config should be created
return nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this will work. Upon an upgrade, all MCs should be updated with at least the new controller version. If the featuregate doesn't change, we still have to return the same MC such that the sync funtion above doesn't hit the

if rawCfgIgn == nil {
	continue
}

which will not update the

mc.ObjectMeta.Annotations = map[string]string{
	ctrlcommon.GeneratedByControllerVersionAnnotationKey: version.Hash,
}

which then in turn causes the MCO to fail because of a rendered version mismatch for the controller "(i.e. MCC sees that a controller-created config never got updated). So we should still return the same config so the above sync happens.

Have you tried installing (from existing releases), adding a feature gate and then upgrading to this PR? That should be a good check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here hasn't changed as far as I understand it, the idea of this function was just to extract the logic that was already in place, but not change the logic.

Before, there was this same check that caused a continue in the loop, I've just replaced this with an early return and the continue stays in the top of the loop.

I think the reason this works is because we are comparing the default feature gates (ie if you didn't have a FeatureGate) with what is in the FeatureGate in the cluster, if it exists. Once you've added a feature gate, you can't remove it or modify it to remove any feature that has been enabled. So if a MC has been generated because of the FG, it will always be the case that these differ and it will always cause the MC to be updated.

Also note that if you have a FeatureGate in the cluster, you can't upgrade your cluster, so the version should never change anyway if this logic has been triggered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, defaultFeatures is static, ok that makes sense I think.

When we say "can't upgrade" we mean we cannot upgrade y-stream correct? Based on the CI job it seems featuregates are blocked from upgrading at the y-stream level but not at the z-stream level, since it makes use of upgradeable=false. Just wanted to check that understanding

@yuqi-zhang
Copy link
Contributor

Also, is #2670 needed? I see it's closed but I'm not sure why

@JoelSpeed
Copy link
Contributor Author

Also, is #2670 needed? I see it's closed but I'm not sure why

I believe this was opened accidentally. Looks like it was a combination of this PR and #2669. 2669 will be required, it's just a vendor update against library go though so shouldn't be too controversial

@JoelSpeed
Copy link
Contributor Author

just to make sure, if you supply a kubeletconfig and a featuregate, they do not conflict right?

This behaviour will mimic whatever is the behaviour today if you did this in cluster. If you supply a feature gate today in cluster and you also have a kubeletconfig in cluster, what happens? This isn't something I've ever tried but would expect someone maybe from the Node team to have a better idea

@rphillips
Copy link
Contributor

We need to combine this PR with the KubletConfig bootstrap PR... #2547

To make sure KubeletConfigs and FeatureGates work in combination.

@JoelSpeed
Copy link
Contributor Author

JoelSpeed commented Jul 15, 2021

HI all, I just wanted to add some notes about testing for this PR:

Manual testing

I have manually taken this PR and a 4.9 CI build and run through the following process:

  • openshift-install create install-config
  • openshift-install create manifests
  • Add the following manifest to the manifests folder as featureGate.yaml:
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  name: cluster
spec:
  featureSet: TechPreviewNoUpgrade
  • openshift-install create cluster

This succeeds, however, if I do this without this patch, the bootstrap fails as MCO is degraded.

Note, I've also done this with 4.8 and a variation on this PR (to remove the 4.9 CCM part) and this has the same effect.

QE testing

@sunzhaohua2 Has already come across this issue when trying to test CCM as part of our QE cycle. I have asked them to take this PR and try out the testing procedure using a custom build. All of our testing efforts for our CCM project across AWS, Azure and OpenStack will involve this patch, so I expect it to be thoroughly tested. With @sunzhaohua2's blessing, I would like to volunteer the Cluster Infrastructure QE team to dedicate the time needed for MCO specific QE on this feature too.

E2E testing

I am currently working on (at the request of @deads2k) a set of release informing E2E tests (eventually to be run on all platforms). These will help to inform the release team about the stability of tech preview features in perpetuity (note this is not specifically tied to any one project, but is intended to cover any and all tech preview features).

These tests will be running daily and will be informing the release status dashboard. The work in this PR also sets up the ability to add a PR blocking E2E which will also exercise this mechanism. We will be setting PR blocking E2E for this on all CCM releated repos until we are out of TP (4.12 ish). So there will be some signal on our repositories and PRs if this does break in the future.

If it is desired, I am happy to add a new PR E2E for the MCO repo as well so that this fix can be tested regularly on the MCO repo too


If there are any further ideas for how this can/should be tested, please let me know, I'm happy to go investigate further testing if there are gaps

@rphillips
Copy link
Contributor

I believe a KubeletConfig and a FeatureGate at day1 will not work correctly. One will get overridden by the other at installation time.

@JoelSpeed
Copy link
Contributor Author

We need to combine this PR with the KubletConfig bootstrap PR... #2547

To make sure KubeletConfigs and FeatureGates work in combination.

@rphillips How does this work today in cluster? As I understand it, the two controllers (kubelet config and kubelet config feature sync) exist in the cluster today, and could cause this same issue if a customer were to configure feature gates and a kubelet config on a cluster post install?

I did already look into this code wise (though I haven't had a chance to spin up a cluster) after Jerry mentioned a similar concern, as far as I can tell the behaviour won't be different day 1 vs day 2, have you ever had a chance to try this on a cluster that is bootstrapped?

@rphillips
Copy link
Contributor

rphillips commented Jul 15, 2021

featuregates will be added as a singleton into the system early in the installation process. By the time the MCO installs the kubeletconfig as a day2 operation the FeatureGate is already installed.

@JoelSpeed
Copy link
Contributor Author

featuregates will be added as a singleton into the system early in the installation process. By the time the MCO installs the kubeletconfig as a day2 operation the FeatureGate is already installed.

I may not be following this correctly, but, I think what you're suggesting here is that day 2, the kubeletconfig controller picks up the existing MachineConfig that was created by the kubelet feature sync, and modifies that based on the input from the user?

Because 2547 is not taking into account existing MachineConfigs that 2668 might have generated, this will cause it to create a new MachineConfig, causing a clash?

If that's the case, then yes, we definitely need to make sure these go together or at least, are followed closely and handled appropriately

@rphillips
Copy link
Contributor

That is correct... We need a merge process of sorts at day1 to handle the FeatureGate and the KubeletConfig.

@JoelSpeed
Copy link
Contributor Author

If this is appropriate, and with @QiWang19's blessing, I can work on getting a patch combining the two and testing that out tomorrow, thanks Ryan for helping understand that nuance

@yuqi-zhang
Copy link
Contributor

Joel and I did some testing. I think this works day 2 (featuregates are considered when we render kubeletconfigs, maybe in the future they can just be 1 sync look). So this PR by itself should work today.

2547, if we want to consolidate the two together, needs to be modified to consider featuregates in this case.

@JoelSpeed
Copy link
Contributor Author

To extend Jerry's comment above:

Today

When I create a FeatureGate, the KubeletConfig FeatureHandler takes this and creates a machine config at level 98.

If I then create a KubeletConfig, the KubeletConfig Controller takes this AND the FeatureGate and produces a machine config at level 99. Importantly, the Controller includes the logic for the feature gates from the feature handler, so the two produce the same output WRT feature gates.

With 2668 and 2547

As 2668 stands today, it will create a config at level 98, identical to the behaviour of the FeatureHandler.

As 2547 stands today, it will create a config at level 99, but it does not account for FeatureGates, so it's behaviour differs from the Controller which it is trying to implement in day 1 - This is where a conflict might occur.

We need to include FeatureGate handling within 2547 to ensure that it creates the same output as the Controller does day 2. I took this branch, cherry-picked 2547 and added an extra commit on top which does this.

In my opinion, and I appreciate I'm not an owner here, this means that, if we can ensure that 2547 includes the changes I'm suggesting above when it is fixed up and merged, then it's not a strict requirement to merge them both in at the same time.

The Controller today stomps the FeatureHandler when a KubeletConfig exists, so 2547 should stomp 2668 once it is added, matching the behaviour of the day 2 operation.

If we are able to merge 2668 (or I can do this before if preferred), I can set up a new PR E2E job that proves that the cluster bootstraps correctly when a feature gate is installed, hopefully preventing regressions in the future (ie when 2547 is merged)

@rphillips
Copy link
Contributor

I think part of the problem is that we have not gotten #2547 right for KubeletConfigs and any changes there could impact this PR, since the configs need to be identical.

@kikisdeliveryservice
Copy link
Contributor

I think part of the problem is that we have not gotten #2547 right for KubeletConfigs and any changes there could impact this PR, since the configs need to be identical.

#2547 is a large feature in its own right and would very much prefer to get #2547 in as planned and ensure it is working correctly with the appropriate amount of care and consideration. Once that is working we can iterate with other changes. But doing both at once makes this all destined to be very brittle.

@sunzhaohua2
Copy link

I build a release image by using cluster-bot with this pr and launch cluster on aws with this image, the cluster could be created successfully.

$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-139-89.us-east-2.compute.internal    Ready    worker   22m   v1.21.1+f15374c
ip-10-0-140-153.us-east-2.compute.internal   Ready    master   28m   v1.21.1+f15374c
ip-10-0-172-94.us-east-2.compute.internal    Ready    master   28m   v1.21.1+f15374c
ip-10-0-180-211.us-east-2.compute.internal   Ready    worker   22m   v1.21.1+f15374c
ip-10-0-192-112.us-east-2.compute.internal   Ready    master   28m   v1.21.1+f15374c
ip-10-0-194-163.us-east-2.compute.internal   Ready    worker   22m   v1.21.1+f15374c
$ oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.ci.test-2021-07-16-014839-ci-ln-gfgy1gt-latest   True        False         34s     Cluster version is 4.8.0-0.ci.test-2021-07-16-014839-ci-ln-gfgy1gt-latest
$ oc get ns
NAME                                               STATUS   AGE
default                                            Active   37m
kube-node-lease                                    Active   37m
kube-public                                        Active   37m
kube-system                                        Active   37m
openshift                                          Active   25m
openshift-apiserver                                Active   27m
openshift-apiserver-operator                       Active   36m
openshift-authentication                           Active   27m
openshift-authentication-operator                  Active   36m
openshift-cloud-controller-manager                 Active   36m
openshift-cloud-controller-manager-operator        Active   36m
$ oc get featuregate cluster -o yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  name: cluster
  resourceVersion: "693"
  uid: add7c41b-a641-4347-ba05-ccbc480fc72f
spec:
  featureSet: TechPreviewNoUpgrade

@JoelSpeed
Copy link
Contributor Author

As part of the effort of testing this further, I've added a PR to add a new presubmit test to check that clusters bootstrap correctly while a feature gate is added to the release manifests: openshift/release#20372

@JoelSpeed
Copy link
Contributor Author

/test e2e-aws-techpreview-featuregate

@JoelSpeed
Copy link
Contributor Author

While the feature gate E2E test failed, you can see from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2668/pull-ci-openshift-machine-config-operator-master-e2e-aws-techpreview-featuregate/1417752892005158912 that the cluster bootstrapped successfully and the E2E suite flaked rather than it being a problem with MCO.

Compare this with https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/20372/rehearse-20372-pull-ci-openshift-machine-config-operator-master-e2e-aws-techpreview-featuregate/1417478739909939200 where the cluster doesn't even get that far, because MCO degrades during the bootstrap

@@ -561,29 +578,21 @@ func (ctrl *Controller) syncKubeletConfig(key string) error {
delete(specKubeletConfig.SystemReserved, "cpu")
}

// FeatureGates must be set from the FeatureGate.
// Remove them here to prevent the specKubeletConfig merge overwriting them.
specKubeletConfig.FeatureGates = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does the FeatureGates get injected in for this object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 543 uses a new kubelet generator function as part of the refactor, which is shared between this and the feature gate handler, which handles the feature gate injection

originalKubeConfig, err := generateOriginalKubeletConfigWithFeatureGates(cc, ctrl.templatesDir, role, features)

If you look at 1b4429b explicitly it's a bit easier to see what the change here is

I could expand this comment to make this more clear

// FeatureGates must be set from the FeatureGate.
// Remove them here to prevent the specKubeletConfig merge overwriting them.
// The originalKubeConfig already has these merged in from the FeatureGate.

@rphillips
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2021
@JoelSpeed
Copy link
Contributor Author

/test e2e-aws-techpreview-featuregate

Would like to see if the E2E flakes pass a second time around, the MCO is healthy throughout in the existing test failure though, proving IMO that this PR fixes what it says it does :)

@JoelSpeed
Copy link
Contributor Author

/test e2e-aws-techpreview-featuregate

1 similar comment
@sinnykumari
Copy link
Contributor

/test e2e-aws-techpreview-featuregate

@yuqi-zhang
Copy link
Contributor

So CI-wise, I see that of the existing failures (excluding permanently failing jobs), most of them seem unrelated to the MCO. At least for the upgrade jobs and aws* jobs, the MCO operator status and pool status both look fine. We can retest a few more times on the latest commit to make sure, but I do see a few past commits go green.

In terms of e2e-aws-techpreview-featuregate that one fails pretty consistently on the same issues, a few notable ones:

  1. Some cluster operators are not ready: kube-apiserver (Upgradeable=False FeatureGates_RestrictedFeatureGates_TechPreviewNoUpgrade: FeatureGatesUpgradeable: "TechPreviewNoUpgrade" does not allow updates) -> I guess this one is expected since the FeatureGate is supposed to be non-upgradable if enabled. Our CI however does not like that. Is that some adjustable flag?
  2. Prometheus failures -> not sure why there are so many Prom failures (the etcd one is also Prom). Do they pass on other test setups?
  3. Horizontal pod autoscaling (scale resource: CPU) ReplicationController light Should scale from 1 pod to 2 pods -> this one is the odd one out. Not sure what that's about

So for the test to get a good signal we need at least to resolve those 3 categories of issues it seems.

/retest

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally lgtm pending CI concerns. One general question below:

@@ -133,6 +141,14 @@ func (b *Bootstrap) Run(destDir string) error {
}
configs = append(configs, rconfigs...)

if featureGate != nil {
kConfigs, err := kubeletconfig.RunFeatureGateBootstrap(b.templatesDir, featureGate, cconfig, pools)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I want to wrap my head around this scenario so I understand this properly.

In case of no user-provided featureGate (the default scenario), this Bootstrap portion does not run, which is normal, so there is no featuregate machineconfig being created.

In the sync loop of the KubeletConfigController, we fetch in-cluster feature objects if they exist (can this be nil?), and pass them to generateFeatureMap to parse.

I guess my question here is, can that feature object being fetched by ctrl.featLister.Get(ctrlcommon.ClusterFeatureInstanceName) be nil? If it can, why does generateFeatureMap not fail, and if it can't, and there is always a default set of features on the cluster, why does this not cause a drift since the in-cluster KCC now always considers featuregates (default), whereas the bootstrap here will not if there isn't any user-provided ones? Based on CI this does not fail, so I'm a bit confused.

Hopefully that question made a bit of sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok Joel helped me understand this a bit more. The default generation and "user provided" generation are different. There is always a default set of featuregates that's provided as part of the base kubelet MC

@JoelSpeed
Copy link
Contributor Author

In terms of e2e-aws-techpreview-featuregate that one fails pretty consistently on the same issues, a few notable ones:

  1. Some cluster operators are not ready: kube-apiserver (Upgradeable=False FeatureGates_RestrictedFeatureGates_TechPreviewNoUpgrade: FeatureGatesUpgradeable: "TechPreviewNoUpgrade" does not allow updates) -> I guess this one is expected since the FeatureGate is supposed to be non-upgradable if enabled. Our CI however does not like that. Is that some adjustable flag?

Yep, this is something we can set, I need to raise a PR to the release repo to exclude that particular test with that particular result from being a failure. Haven't had a chance to do this yet but I know roughly how (or at least I know my team did this before so I can find an example easily)

  1. Prometheus failures -> not sure why there are so many Prom failures (the etcd one is also Prom). Do they pass on other test setups?

The forum-monitoring folks rather handily told me they believe prometheus is displaying symptoms that it cannot write to it's PVC. Given that enabling the feature gate enables CSI migration on AWS, I suspect I'll need to chat to the storage team to find out what's wrong here.

  1. Horizontal pod autoscaling (scale resource: CPU) ReplicationController light Should scale from 1 pod to 2 pods -> this one is the odd one out. Not sure what that's about

This one I haven't started looking into yet, but, what I do know about the HPA is that it can rely on prometheus for metrics for the scaling, so this could just be a case that prometheus being broken breaks this particular test too

So for the test to get a good signal we need at least to resolve those 3 categories of issues it seems.

@sinnykumari and I were actually discussing on #2687 whether or not this test is the best signal for this particular feature. For example, we can see that MCO has worked as the cluster is bootstrapped correctly and MCO isn't degraded, the rest of the failures are unrelated to MCO. The work I'm doing in 2687 proves that the bootstrap and day 2 configs are identical, so maybe is a better signal for this feature overall, WDYT?

@JoelSpeed
Copy link
Contributor Author

I've liased with the storage team regarding the prometheus failures. Turns out these are known and their CSI migration logic currently breaks the PVs. They have this bug https://bugzilla.redhat.com/show_bug.cgi?id=1977807 which will be resolved by the openshift/kubernetes 1.22 rebase. So I won't be able to get the techpreview-featuregate CI job passing until the rebase has happened 😞

@JoelSpeed
Copy link
Contributor Author

I've raised a PR on the release repo openshift/release#20554 to fix the alert about the kube apiserver blocking upgrades

@yuqi-zhang
Copy link
Contributor

@sinnykumari and I were actually discussing on #2687 whether or not this test is the best signal for this particular feature. For example, we can see that MCO has worked as the cluster is bootstrapped correctly and MCO isn't degraded, the rest of the failures are unrelated to MCO. The work I'm doing in 2687 proves that the bootstrap and day 2 configs are identical, so maybe is a better signal for this feature overall, WDYT?

Yeah I think so. I just mostly wanted to make sure that we have some type of signal, and e2e-aws-techpreview-featuregate sounds like it will be failing for awhile. Maybe it's worth disabling as always-run for now (and keep it as optional so we can still trigger it), until storage bug gets fixed? Just to reduce the amount of permanent failing jobs.

I agree also with the general sentiment that this shows bootstrap configs do work, namely I see in the cluster configs for the test,

    "APIPriorityAndFairness": true,
    "CSIDriverAzureDisk": true,
    "CSIDriverVSphere": true,
    "CSIMigrationAWS": true,
    "CSIMigrationAzureDisk": true,
    "CSIMigrationGCE": true,
    "CSIMigrationOpenStack": true,
    "DownwardAPIHugePages": true,
    "LegacyNodeRoleBehavior": false,
    "NodeDisruptionExclusion": true,
    "RotateKubeletServerCertificate": true,
    "ServiceNodeExclusion": true,
    "SupportPodPidsLimit": true

That looks correct. I am fine with merging this so long as the other tests also pass. I do see that the upgrade tests have succeeded this time around.

/retest

@yuqi-zhang
Copy link
Contributor

yuqi-zhang commented Jul 23, 2021

Ok I think this PR is good as it stands. I did some manual testing and everything looks good except 1 bug, which I tested to not be caused by this PR (it always existed I believe), so we can fix that separately.

The bug happens during post-install and is based on order of applications. Basically if you:

  1. add a kubeletconfig
  2. add a featuregate

It doesn't work. the 99-kubelet conf object gets generated first, which overrides the 98-kubelet feature conf generated later, BUT the 99-kubelet doesn't resync when you add the featuregate (maybe due to generation checks seeing that the kubeletconf itself wasn't changed?). So you end up with nothing for the featuregate.

If you do the reverse and add the featuregate, and then the kubeletconfig, the 99-kubeletconfig has the featuregate in it (as expected) and it works fine. So the merge logic there is working.

/approve
/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 23, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed, rphillips, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 23, 2021

@JoelSpeed: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-techpreview-featuregate 46b6ace link /test e2e-aws-techpreview-featuregate
ci/prow/e2e-aws-single-node 46b6ace link /test e2e-aws-single-node
ci/prow/e2e-aws-workers-rhel7 46b6ace link /test e2e-aws-workers-rhel7
ci/prow/e2e-aws-disruptive 46b6ace link /test e2e-aws-disruptive

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants