New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-831: added feature gate to mco for on cluster builds #4060
MCO-831: added feature gate to mco for on cluster builds #4060
Conversation
/retest |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, this should work since the operator is already wired up to wait for featuregates.
@dkhater-redhat here is some info on the failures: it looks like its possible the FGs changed since we last vendored in the API, that is why unit is failing. Make sure you check the cluster config operator and/or API to see if anyone removed any features for the default FG. if they did, you'll need to edit Other than that, verify is failing bc you bumped k8s to a version that changed what errorf and a few other functions allow as formatted args you'll need to go into the functions and change Going to take a look at e2e failures now, I have a feeling its all FG related. |
actually. It seems the failure is on build: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/4060/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1732782030187401216/build-log.txt at the bottom either
I'll go look at c/common to make sure they didn't delete anything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please squash the commits? Thanks!
f3efccc
to
b62162e
Compare
/test ci/prow/unit |
@dkhater-redhat: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test unit |
99c8cf4
to
06c6bb6
Compare
Do you have unit tests that depend on the build controller working? If so, those aren't going to work since the FG isn't enabled. Also, if they are in unit tests, and you try to check if a FG exists you will get errors. I can't tell if that is the issue or if the pods can't build. Try make binaries locally maybe and see if there are any failures? |
there are also a "does not support" errors for formatting in here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/4060/pull-ci-openshift-machine-config-operator-master-unit/1732880126053453824/build-log.txt |
/test unit |
9e1a1ac
to
5b26e22
Compare
/retest-required |
I know the SCOS test isn't required, but the operator pod on SCOS is panic-ing because the SCOS featuregate list doesn't have the new featuregate in it:
And it's right, it doesn't: But it does have |
/hold |
/retest-required |
by default featureGate: OnClusterBuild is disabled. $ oc get featuregate/cluster -o yaml | yq -y '.status.featureGates[].disabled' | grep OnClusterBuild
- name: OnClusterBuild try to turn on OCB when this featureGate is disabled # create custom mcp
$ cat infra.mcp.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/infra: ""
$ oc apply -f infra.mcp.yaml
machineconfigpool.machineconfiguration.openshift.io/infra created
$ oc label node/ip-10-0-118-209.us-west-1.compute.internal node-role.kubernetes.io/infra=
node/ip-10-0-118-209.us-west-1.compute.internal labeled
$ mcp infra
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
infra rendered-infra-ee03b72baa24ec9b3184492526b023b3 True False False 1 1 1 0 11m
# create configmap, pullsecret etc.
$ cat cm-on-cluster-build-config.yaml
apiVersion: v1
data:
baseImagePullSecretName: mco-global-pull-secret
finalImagePushSecretName: mco-test-pull-secret
finalImagePullspec: "quay.io/mcoqe/layering"
kind: ConfigMap
metadata:
name: on-cluster-build-config
namespace: openshift-machine-config-operator
$ oc apply -f cm-on-cluster-build-config.yaml
configmap/on-cluster-build-config created
$ oc apply -f base-image-pull-secret.yaml
secret/mco-global-pull-secret created
$ oc apply -f final-image-pull-secret.yaml
secret/mco-test-pull-secret created
$ oc get cm/on-cluster-build-config -n openshift-machine-config-operator
NAME DATA AGE
on-cluster-build-config 3 60s
$ oc get secret -n openshift-machine-config-operator | grep pull-secret
mco-global-pull-secret kubernetes.io/dockerconfigjson 1 81s
mco-test-pull-secret kubernetes.io/dockerconfigjson 1 65s
# label mcp/infra
$ oc label mcp/infra machineconfiguration.openshift.io/layering-enabled=
machineconfigpool.machineconfiguration.openshift.io/infra labeled
$ oc get mcp/infra -o yaml | yq -y '.metadata.labels'
machineconfiguration.openshift.io/layering-enabled: ''
# check deployment
$ oc get deployment -n openshift-machine-config-operator
NAME READY UP-TO-DATE AVAILABLE AGE
machine-config-controller 1/1 1 1 63m
machine-config-operator 1/1 1 1 67m so when the featureGate is disabled, even required resources are created, deployment will not be created. this feature cannot be enabled. patch featuregate/cluster to enable featureGate: OnClusterBuild $ cat ../mc/featuregate_techpreview.yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
name: cluster
spec:
featureSet: TechPreviewNoUpgrade
$ oc apply -f ../mc/featuregate_techpreview.yaml
Warning: resource featuregates/cluster is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by oc apply. oc apply should only be used on resources created declaratively by either oc create --save-config or oc apply. The missing annotation will be patched automatically.
featuregate.config.openshift.io/cluster configured
$ oc get featuregate/cluster -o yaml | yq -y '.status.featureGates[].enabled' | grep OnClusterBuild
- name: OnClusterBuild check whether deployment and pod are ready and running $ oc get deployment/machine-os-builder -n openshift-machine-config-operator
NAME READY UP-TO-DATE AVAILABLE AGE
machine-os-builder 1/1 1 1 20m
$ oc get pod -n openshift-machine-config-operator -l k8s-app=machine-os-builder
NAME READY STATUS RESTARTS AGE
machine-os-builder-bdc4f7d8c-cm6c2 1/1 Running 0 20m btw, featureGate: $ oc get featuregate/cluster -o yaml | yq -y '.status.featureGates[].disabled'
- name: ClusterAPIInstall
- name: DisableKubeletCloudCredentialProviders
- name: EventedPLEG
- name: MachineAPIOperatorDisableMachineHealthCheckController |
/retitle MCO-831 added feature gate to mco for on cluster builds |
CI is just rotten today |
/hold Revision 251523f was retested 3 times: holding |
There were issues with build02 earlier, hopefully better now /hold cancel |
/test e2e-gcp-op |
/override ci/prow/e2e-gcp-op-single-node |
@jkyros: Overrode contexts on behalf of jkyros: ci/prow/e2e-gcp-op-single-node In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
hahaha I don't think build02 is fixed:
|
/test e2e-gcp-op |
It's not telling me here what the conflict is but locally it looks like we just got beat to a dependency bump, probably #4119 ? |
That looks like it worked -- the tests passed, we just failed in teardown. I'd pull out that "merge" commit 09ec41d (I assume that was just collateral damage from the rebase or somesuch) and then we can try again to get this in? 😄 |
12a01d0
to
1f41fcb
Compare
1f41fcb
to
df610a6
Compare
/test e2e-hypershift |
/lgtm |
/override ci/prow/e2e-gcp-op-single-node |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cdoern, dkhater-redhat, jkyros The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jkyros: Overrode contexts on behalf of jkyros: ci/prow/e2e-gcp-op-single-node In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dkhater-redhat: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
3d833c0
into
openshift:master
- What I did
- How to verify it
- Description for the changelog