-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot reference imagestream name and tag from StatefulSet image field, only DeploymentConfig in OCP 4.10.3 #339
Comments
yes, it should work. Can you provide the exact steps(oc commands, in order) you are taking along w/ your yaml files for the imagestream and statefulset? Due to some limitations in the admission logic, i think it's critical that your imagestream exist and be successfully imported before you create the statefulset. At least depending what mechanism you're using (which will be clearer if we can see your statefulset yaml and imagestream). i'm guessing you've already been here, but here are the docs on these features: |
Hi Ben, thank you for grabbing this topic, I really appreciate it. Imagestream is applied by a kustomize-style yaml files deployment. Statefulset is created by Kubeflow Notebooks Controller as a result of a Notebook CR being present. The imagestream exists and tag has been successfully imported at created: '2022-11-21T13:14:31Z'. We do not use an the internal openshift docker registry. The statefulset references imagestream name and tag in its container image field. Its creation timestamp is way past the imagestream one creationTimestamp: '2022-12-07T16:54:40Z' We are getting the following events on the pod, which is why I got to the bugzilla link, access.redhat link and so on in the first message of this issue here.
I am referencing the yamls below. The problem occurs on OCP 4.10.3 and also on the latest DEV versions 4.13.x. Imagestream and StatefulSet are in the same namespace. Imagestream, when referenced by DeploymentConfigs, Deployments, and Pods in the image-field, resolves to the src location of the imagestream-tag. In those types of objects, the resolve and image pull through is working fine, just not with StatefulSet. There are two containers in the pod, disregard the oauth-one, please, we are trying to instead use the central openshift imagestream for that some day, too :-)
It does not really matter, too, if it is that particular StatefulSet or one created ad-hoc by me, the behavior and bug is always the same. This is problematic because KF Notebook Controller instantiation of pods is based on StatefuSet, not Deployment or DeploymentConfig or Pod directly. |
I attempted to create this behavior and could not. Here's my version of the yamls (just stripped the status fields and the fields that are populated at creation time, and stripped the NS):
I ran:
then i ran
to confirm the import, which i did by checking the status section of the output:
and then i confirmed that the image field is substituted as expected (this happens during creation admission):
No pods ended up being created and i didn't really look into why, but fundamentally the statefulset's pod template is as expected: the image field points to the resolved imagestreamtag value and not 's2i-generic-data-science-notebook:v0.0.5' as was in the original statefulset yaml. (your pod's "failure to pull the image" error are directly caused by the fact that the image field in the statefulset was not substituted correctly when the statefulset was created) The OCP version i used was one of our 4.13 CI builds: 4.13.0-0.ci-2022-12-07-190258 but this behavior hasn't been modified in several releases as far as i know (since the bug you pointed to in 4.6 i believe) i'm not really sure what to tell you since i used pretty much your exact yaml files to attempt the recreate and it seems to be working as expected. What OCP version are you on? |
I am on OCP 4.10.3 on-prem without openshift registry and on 4.13 DEV on AWS with openshift registry. So hold on, I am probably doing circular reasoning, should work more along the lines of Occam. You analysis helps for now. @bparees Do not go any further for now, thank you. The differentiating feature I have is the KF Notebook and ODH Notebook Controller, that watches for Notebook CRs and then creates StatefulSets from them. I think I until now only manually modified the existing, KF or ODH Notebook controller-created StatefulSet / Notebook CR. Will try with a brand new one without any annotations from notebook controller. What bugs me is that the substitution you mention works for everything except a StatefulSet, which is really pretty weird. The substitution in my case is |
ah hah. Yes, if you have some controller that is reconciling the statefulset, it is very likely that the substituted value is being stomped back by the controller, after the admission logic has mutated it.
assuming those other resources don't have a controller reconciling them, that would explain the difference. The substitution is probably working, but then immediately getting stomped/reverted by the notebook controller. Try creating a statefulset directly and i think you'll see it work correctly.
more accurately, it is probably reverting the change after it is made, but yes, fundamentally it's a case of multiple writers competing over the resource. |
Will look more into that tomorrow, but we are on to something, yes. |
hah, indeed. Turns out with a fresh StatefulSet, without any notebook controller annotations, the image name and digest is resolved correctly from the imagestream tag and name. So yes, the Notebook controller Kubeflow Notebook CR owns StatefulSets created from a Notebook def instance and thus interferes with or overrides imagestream resolve on admission of the StatefulSet image resolve to the cluster.
@kpouget @DaoDaoNoCode seems the notebook controller overrides the value of the container.image field as resolved before by the Openshift image policy admission plugin/controller from the imagestream tag and name. As far as notebook controller goes, reconciling logic and StatefulSet logic is here, it seems: Apparently, there is a kubeflow reconcilehelper package and the reconciliation logic is here (update-logic) The whole StatefulSet template spec, of which the image-field is a part of, is continually updated, thus overwriting any changes from the image policy admission plug-in: Anyway, the image policy admission plugin itself is working as expected on the Kubernetes / Openshift side for StatefulSets not owned by KF Notebook controller. Here is the result with a fresh StatefulSet, working as expected: Statefulset before apply, non-notebook-controller-related, referencing imagestream name and tag in same namespace:
Result working as expected, excerpt from the StatefulSet container spec after applied in cluster:
With this fresh StatefulSet and the pod created from it, all is working fine afterwards.
Looks like I need to escalate this error with the ODH Notebook Controller team. Leaving ticket open for now, but thank you for all the help. I also created a ticket at kubeflow explaining the context kubeflow/kubeflow#6829 |
@bparees We have had some issues on a hyperscaler-based Openshift 4.11 and 4.13.0-ec.0 cluster with the image policy admission plugin / metadata annotation doing nothing. Where in Openshift's config is it defined which plugins are active and which are not? |
it's not configurable in 4.x. |
weird, thank you. Could PodSecurityPolicy be an issue potentially interfering with image-resolve? Context: fresh StatefulSet on a hyperscaler cluster, annotation in metadata
on 4.10.37 in-house, it worked. On the hyperscalers with 4.11 and 4.13, it did not work. The image-field did not get updated. We carefully made sure to reference the correct container name, imagestream name and tag, all that. I wonder if something keeps changing in the backend. For example, on 4.10.3 in-house, I had to either have image: " " or no image-field specified initially in the containers section of the StatefulSet for lookup to work. Now, on 4.10.37 in-house, I can use whatever value I want for the image-field, and it gets updated accordingly. |
does the same annotation work on a deployment or any other resource type? I don't think PSP could be interfering. The way this is supposed to work, iirc is:
I don't think the initial value of the image field should matter, though you should be aware that an initial rollout of pods may occur w/ the initial value of the image field, before the imagechangetrigger observes the resources and updates them (which will then cause a subsequent rollout) |
Yes, the annotation works on Pods, for example, on my in-house OCP 4.10.3 and 4.10.37 clusters. Just not on the hyperscalers 4.11 and 4.13. That one-space-string or no-field-necessary issue for image-value might have been something specific to 4.10.3. Now, on 4.10.37, I can have any string I want as the image-value, the actual value from the imagestream gets inserted / replaced. Once I apply the definition, the image-field gets updated accordingly. Also, the imagestream is initialized and applied, no changes to the images referenced by the imagestream. So, in that way, that substitution of the image-field does not even necessitate an image-change when applying to a new Pod, Deployment, StatefulSet. |
You may have hit a HyperShift bug - https://issues.redhat.com/browse/OCPBUGS-4025 |
hyper-cool you found that, Oleg, děkuji / merci vielmal |
@shalberd these are hypershift guest clusters where you're seeing the issue? When you said hyperscalers i just thought you meant a traditional ocp cluster running on AWS/GCP/Azure. |
It is on aws *aws-2.ci.openshift.org .... For more @andrewballantyne from Open Data Hub (ODH) team can answer. I am not a member of Open Data Hub, but pushing the use of imagestream abstractions, non-dependence on openshift-internal docker registry, ImageContentSourcePolicy support from an enterprise perspective, very on-prem OCP oriented, but making sure to test extensively with the ODH team. Why use hypershift with AWS and Azure anyways? I mean, does that not add an unneccessary layer of abstraction, with reconciliations interfering there, too. Just interested ... |
@bparees Regarding collisions between the image policy admission plugin / imagechangetrigger writing to the image-field and the notebook controller overwriting it on reconcilation, what is the conceptually best way to avoid overwriting the image-field by KF notebook controller? Is it a) excluding the image-field from reconciliation, for example, or is it instead to To me, the approach that the Elastic operator took looks like duplicating logic, but I am very much new to the topic of operators and object reconciliations. Also, since imagestream is specific to openshift and not always is the image-field of a notebook CR content based an a metadata annotation, this being a speciality case on openshift, I'd go towards approach a. |
I think approach (a) is reasonable in this case, but keep in mind that it means that the kf notebook controller now loses the ability to update the image to a new value (such as it might want to do during an upgrade). Unless your plan is that the resource would always be defined with the imagetrigger annotation, in which case upgrades would be handled by updating the imagestream itself and letting the trigger flow through to rollout the workload resource. But now you've tied your logic to openshift behavior (the imagetrigger annotation and imagestream resource, which aren't things that exist on vanilla k8s) So you need to fully think through who should be controlling the value of the image field, and where this operator/controller will be used. One way to approach it would be for the kf controller to manage the image field value if, and only if, there is no And then if there is not such an annotation, the kf notebook controller can directly set the image field, allowing it to update the image field to new values during upgrades of the kf notebook controller. This would allow a user to update the workload resource to be imagetriggered based on an imagestream of their choice, when they want that, but if they didn't want it (or weren't on openshift), they'd get whatever standard image the kf notebook controller specifies/defaults. Again, ultimately this really comes down to a philosophy of how the image value should be managed, and by whom. Is it a default provided by the notebook, but intended to be modified/controlled by the user? Or is it something that's intended to be tightly specified/controlled by the notebook? That will guide you as to how best to implement the behavior and what controls to offer users. (e.g. another option would be to introduce an entirely new annotation that tells the controller whether or not to manage the image field. Then users could disable controller management of the image field for both that case of "i want the imagetrigger to manage it" but also "i just want to manually set it to something else for some reason") |
btw, https://kubernetes.io/docs/reference/using-api/server-side-apply/ I suspect that would first require updates to the imagechangetrigger logic to respect those semantics (in addition to the kf notebook controller supporting them), but since serverside apply is k8s' long term strategic answer to these sorts of conflicts, i'd be remiss in not at least mentioning it. |
Thanks a lot, that helps me get started, I really appreciate your input. Here's to a good new year. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
@dmage @benluddy @bparees I hope I am at the correct place here, found this online for context:
When I have an imagestream in the same namespace as a StatefulSet, I cannot use the image field and reference imagestream name and tag there. It does not get resolved.
In contrast, when I reference the imagestream from a DeploymentConfig, or from a Pod, or from a Deployment, the image name and digest get resolved correctly, as it should be.
Should this not have been fixed in the admission controller or something like that a long time ago, for all kinds of objects?
See https://bugzilla.redhat.com/show_bug.cgi?id=2000216 and https://access.redhat.com/solutions/6455191
In any case, the expected result would be: image stream tags get resolved in created and edited StatefulSets.
The text was updated successfully, but these errors were encountered: