-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster-samples-operator and stuck upgrade from 4.6 to 4.7 with restricted networks #385
Comments
Yeah across all the openshift operators, you should be using Removed instead of Unmanaged if you don't want that operator running. Definitely something that has evolved since 4.1 when this stuff first arrived. I can take a look at updating the README to better explain this. |
We also have a recent fix wrt delays in start up when in disconnected clusters. See #384 Also, overriding is for development purposes only. Having overrides violates your warranting for official OCP support. I'll make sure the README makes that clear. |
Why are samples a clusteroperator? Why not just another operator that is completely removable? |
Yeah unfortunately it was the only reliable means of delivering samples back in the 4.0/4.x days a few years ago. Although not a given/mandatory as you noted, more of our customers at the time (and probably still the case) needed them, so this is how we landed. There have been internal discussions in engineering as well as external requests from time to time to pull samples out of the payload. I am told that among other things https://issues.redhat.com/browse/RFE-722 will be re-opened soon and decoupling samples from the OCP payload could be part of that. It is also possible it will be achieved via separate RFEs as well. @sbose78 is currently working on some internal proposals for this. But it is not far enough along @toastbrotch that I can give you a definitive target date. @dperaza4dustbit FYI Short term, as I noted originally in your issue, I'll be making some README updates here wrt some of the related details. |
even with "managementState: Removed" the upgrade on another cluster stoped! again with the same problem as originaly reported. everthing else worked flawless... So in my opinion this cluster operator is broken in network-restricted setups where you have only your own registry allowed with allowedRegistriesForImport (https://docs.openshift.com/container-platform/4.7/post_installation_configuration/preparing-for-users.html#images-configuration-parameters_post-install-preparing-for-users)! if you happen to experience the same, try this: # whitelist temporary the redhat registries
#
# WARNING: this might not work as your firewall is not open or you're not allowed to get direct access to those registries, ...
#
oc edit images.config cluster
- domainName: quay.io
insecure: false
- domainName: registry.redhat.io
insecure: false
- domainName: registry.access.redhat.com
insecure: false
# let the management-operator be managed
oc edit configs.samples.operator.openshift.io -n openshift-cluster-samples-operator
managementState: Managed
# restart operators
oc delete pods -n openshift-cluster-samples-operator --all
# upgrade should proceed after some minutes
# let it be removed again
oc edit configs.samples.operator.openshift.io -n openshift-cluster-samples-operator
managementState: Removed
# remove those added registries
oc edit images.config cluster
# restart operator
oc delete pods -n openshift-cluster-samples-operator --all additionaly here some information how to build air-gap friendly operators: |
thanks for the detail @toastbrotch as to where the problem may lie, from what you described, it very well could be in the samples operator's main dependencies, the imagestream controller/apiserver support and the internal images registry, and CVO itself, where even though we are removed, there are a few "in payload" imagestreams like cli and for a time hello openshift, which are actually created / maintained by the CVO, if though their specifications are in the sample operator manifest. I'll have to reach out to our QE to redo this test with the precise setup you have described, but ideally, if you could provide me
That would expedite progress. Also, in case I do need to pull in other teams, ideally, if you are an OCP customer, open a support case or even bugzilla if you have direct access to that. A bugzilla would facilitate me bringing in other teams if need be. thanks again |
@gabemontero attached the output after upgrade with my described workaround |
Thanks for the data @toastbrotch It was interesting in that everythings from a samples operator perspective is clean and what we would expect. It says it is a 4.7, everything removed, not degraded. And yes, the hello openshift imagestream works when you specify quay.io in the allowed registries, facilitating your mirror. The image imported: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9da94ec207ee7ea43ba0749ad89df14eccc0676f98eef7d355585f68085d35cd I have been in contact with the team that owns imagestream import (both development and test). I'll tag them here: @dmage for dev, @xiuwang for test. @xiuwang is performing one more test to confirm, but from what @dmage tells me, having to specify your local mirror in allowedImageRegistries while disconnected is required. He says the OpenShift API server doesn't know about the local mirror registry. I'll defer to him on how possible a change might be so that having the imagestream import logic in the OpenShift API server automatically The further complication with all this is that even though the From @dmage :
So, next steps:
|
OK everyone, my last status update wrt this:
when #394 merges, I'll be closing this item out in favor of the bugzillas referenced (which are open to the public). |
Hi
we had set cluster-samples-operator to unmanaged and we have limited registries to our own (quay which ist not pull-thru-able) with allowedRegistriesForImport in the "images.config cluster" in our network restricted environment.
As we startet upgrade from 4.6.latest to 4.7.22. the whole upgrade got stuck with the creation of the is/hello-openshift. The only way we were able to finish upgrade was adding quay.io, registry.redhat.io and registry.access.redhat.com to the allowedRegistriesForImport and set cluster-samples-operator to Managed. i'm not sure if the steps described at https://github.com/openshift/cluster-samples-operator#troubleshooting helped us anything at all. nevertheless i had to remove the override in the clusterversion as otherwise openshift had always a warning hanging around (sorry forgot which).
to be honest this took us almost 1 day to figure out, which is way to much for just samples, we're not even interested. i hope the whole removal of the cluster-samples-operator would be possible in future... so its off and stays off without breaking my cluster.
regards,ivo
The text was updated successfully, but these errors were encountered: