New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update openshift/api to disable EventedPLEG featuregate in techpreview #317
Conversation
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed, sairameshv The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Needs valid BZ |
freeing up featuregates. |
@JoelSpeed: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
For context, if you observe these earlier test failures on tech preview, these have Some symptoms that we are seeing are pods stuck in leader election and pods with bind already in use issues. Diving into this with the node team we found that crio had created duplicate containers on the host level, so, for example, we had two machine-config-controller containers which were conflicting and one had leader, the other did not, one had bound, one had not. By putting in a hack, I prevented the Now that it has been taken out, we can try re-enabling individually once openshift/machine-config-operator#3688 is merged, as we will be able to promote the feature via a single PR |
@mrunalp before the eventedPLEG goes in again, we should have e2e tests that fail on this condition so we get a clear fix signal on payload test. |
I just want to highlight here that Evented PLEG was never enabled in kubelet config because the wiring to do that is still in progress. |
MCO, in master today, takes feature gates from the openshift/api repository and passes them directly through to the kubelet. So as soon as something exists in openshift/api, and is revendored into MCO, it gets passed through to Kubelet, no additional wiring needed. In the future it will come from cluster-config-operator rather than being vendored, but will still be passed through. Both myself and @sergiordlr grabbed kubelet configs from our clusters and could see I think we need to have a chat about how the feature gates stuff works as gating an upstream feature gate behind a configuration in the API is not necessarily straightforward and needs some work in MCO to support it, if that's something we want to support. |
@JoelSpeed Just want to point out that the diff in the slack link is for the Openshift feature gates, and not for the kubelet configuration. Kubelet feature gates can be found at - https://github.com/kubernetes/kubernetes/blob/bba9833c398230f68cd30ff23e22f303e898e6a9/pkg/features/kube_features.go#L858 As you can see some of the enabled features in that slack message, such as, In case any of the Openshift feature gates require any changes on the node (such as additional arguments for kubelet, crio settings or kernel arguments), usually we use MCO to handle that wiring. A good example would be WorkerLatencyProfiles, Cgroup v2 support etc. The point @sairameshv was trying to convey that, just having a feature gate declaration in Openshift API does not automatically set any values on the node related components (e.g Kubelet, crio, kernel etc). We need to add some wiring in the background (usually in MCO) to make it happen. Looking at the logs on your cluster it was very evident that evented pleg wasn't even used on the nodes, and it is kind of expected because there is no way for MCO to know that should be enabled in the kubelet as well as crio because @sairameshv had not put in the required code to actually orchestrate that. |
I"m sorry but that's not true. The excerpt was taken from the
Yes you're right, but the MCO doesn't know that. The MCO has a features controller that generates a kubelet config using the feature gates enabled in the cluster. It blindly passes these to Kubelet and you can see in kubelet logs on my PR that kubelet is warning that there are unknown feature gates. Ideally the MCO filters out the openshift only feature gates, but that means keeping a list of ones for it to filter out which is tricky to not get bit-rot on. For example. from one of the failed test runs, I've pulled out
That's also not true. As mentioned above, the MCO blindly sends any and all feature gates configured in the
I don't know what logs you're looking at, have you got a link? In my testing last week I was observing Kubelet logs discarding unknown feature gates and enabling |
@JoelSpeed thanks for that clarification. When I logged into your system the evented pleg was not in use, it was generic pleg that was updating the cache. This was expected because, even if you enable evented pleg in the kubelet but not in crio, the kubelet will automatically fall back to the generic pleg. Anyway, we would look into it more closely from node side. Thanks for your help. |
This currently also pulls in the changes from #314
See openshift/api#1483 for context on why this is being disabled temporarily