New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-565: MCO-568: MCO-659: MCO-660 On-cluster build opt-in function, building machine-os-builder stub, RBAC and service acct inclusions #3763
Conversation
@dkhater-redhat: This pull request references MCO-565 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I don't think the e2e test failures are related to the changes this PR introduces. |
4eb1826
to
6c70aca
Compare
/test e2e-gcp-op |
deb6feb
to
7eb7bba
Compare
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Looking pretty good, just a couple of small things and whatever you want to do with that e2e test.
I did try this on a cluster, and rocked it back and forth with:
oc label mcp worker machineconfiguration.openshift.io/layering-enabled=
oc label mcp worker machineconfiguration.openshift.io/layering-enabled-
And it seemed to work great 🎉
EDIT: If you could update the commit name/description to match what this PR does (vs pausing NodeController) future generations will be extremely grateful 😄
resources: ["*"] | ||
verbs: ["*"] | ||
- apiGroups: [""] | ||
resources: ["configmaps", "secrets"] | ||
verbs: ["*"] | ||
- apiGroups: ["config.openshift.io"] | ||
resources: ["images", "clusterversions", "featuregates", "nodes", "nodes/status"] | ||
verbs: ["*"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look like some of these were probably copy-paste from the machine-config-controller
. It's probably fine for now, but eventually the least privilege/wildcard police will come for us.
Like, I'm sure the MOB doesn't need to create/delete nodes, etc -- ideally we'd only give it the verbs/resources it actually needed 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Once we've evolved on-cluster builds, we should audit these and get them down to a bare minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookin' good. Not trying to be nitpicky, just making sure the test is telling us what we want it to 😄
After this we can poke QE and have them take a look.
/retest-required |
/test okd-scos-e2e-aws-ovn |
Looks like gcp-op is failing on machine os builder test, the pod isn't coming up. Could be flakey, but this is like the first failure of the new test v/s all the other previous runs |
/retest-required |
Looking at the pod log of mos builder, it looks like its stuck in a deadlock, any ideas? @dkhater-redhat
...and I'm sorry for the 3 separate comments :P |
@djoshy thank you for pointing that out! i removed something in the main.go that i believe was causing the deadlock. hoping it works! |
Without that I assume that select was put in there to make sure it was long-running. (The deadlock detector is right "all goroutines are asleep" because there is only one goroutine and it's asleep.) A sleep might work, I don't think it counts as a deadlock:
At first I was just like "we could just start a webserver!":
but then I was like "hmm, I don't know how the FIPS scanner works, do I want to risk triggering it by shipping a dummy webserver right now?" 😄 |
/retest-required |
/test verify |
/retest-required |
2 similar comments
/retest-required |
/retest-required |
/test e2e-gcp-op |
2 similar comments
/test e2e-gcp-op |
/test e2e-gcp-op |
/test e2e-gcp-op |
/test e2e-gcp-op |
/test e2e-gcp-op |
…BAC and service acct inclusion, e2e tests
/retest-required |
@dkhater-redhat: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required |
@sergiordlr @rioliu-rh ready for QE :-) |
No initial builder pod deployed
Verify synced resources:
Verify deployment behavior: When labeling and unlabeling a MCP the machine-os-builder deployment is scaled to 1 and a new pod is created:
It worked fine with "worker" pool, "master" pool and a custom "infra" pool When the label is removed the deployment is scaled to zero. When we label more than 1 MCP, only 1 pod is create. We can label "master", "worker" and "infra" pool and only 1 machine-os-builder pod will be created. This pod will only be removed once we remove the label from all the MCP. "version" command is added in this PR, but it is never used. We assume that it will be included later when developing the whole machine-os-builder functionality.
Please, could you confirm that the "version" command behavior and the "machine-os-builder" deployment behavior when several MCPs are tagged are expected behaviors? That's my only concern, if those behaviors are expected we will add the qe-approved label. Thank you very much! PS: I have not checked the bare minimum requirements in the permissions since in the comments it was said that it will be done at the end of the development. |
After talking with @dkhater-redhat we have confirmed that the behavior that we are seeing is the expected one. We can add the qe-approved label. |
woohoo! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dkhater-redhat, jkyros The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4954066
into
openshift:master
…ler with label
and rework Machine OS Builder startup logic
- What I did
- How to verify it
- Description for the changelog