-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
force default to rhel and registry to old redhat one #81
force default to rhel and registry to old redhat one #81
Conversation
@bparees : looking at the image-ecosystem failures, some of them seem related to not finding the wildfly imagestream: With It does a check on the imagestreams before continuing, and:
There are a few other similar examples besides the nodejs one noted above.
|
on the operator e2e there were a bunch of apiserver comm errors and the samples did not stabilize within the tests' time expectations |
/retest |
1 similar comment
/retest |
this one. I don't think we have too many actual tests that rely on it (beyond checking for its presence), so hopefully it's not too painful. If you do run into cases where we need to do a legitimate java-based s2i build or something, you should be able to switch the TC to create its own wildfly imagestream. |
OK ...shall I update openshift/origin#21762 with those additional e2e tweaks, or do you prefer a separate PR? |
separate please |
Some api server comm hiccups, but also some image import failures going to registry.access.redhat.com prevented the e2e tests to get to the expected final state ..... ... the switch to rhel might force us to implement image import retry within the operator (along with longer grace times to get the samples stable) @bparees :-( or allow for image import failures wrt e2e validations and cluster operator status :-< |
fyi note, I had to manually retry an image import got get all the rhel imagestreams clean during that test (it was in an eap/jboss imagestream that was unrelated to image_ecosystem) Per discussions with @bparees I'll start tackling the operator retrying image imports with this PR in the hopes of getting |
these bits lgtm. please put the retry logic in a separate commit for easy reviewing. |
580e654
to
121b594
Compare
OK @bparees ptal please note, a piece of the retry commit (adding a retry condition), proved to be unnecessary and a bit klunky to maintain ... removed it in the perf/fixes commit |
/test e2e-aws |
image eco is passing now |
the e2e-operator test suffered from various api server conn timeouts and enough import image errors to registry.access.redhat.com |
/retest |
ok @bparees ptal 2 commits
left old form present but commented out (for reference in case we switch back) and added some comments on rationale |
Some of the condition logic looks off to me, let's talk through it on monday. What i'd naively expect: available=true - content created (doesn't mean the import happened yet). We have no choice about this to avoid blocking the installer progressing=true - ideally: we are in the process of creating/updating the content. maybe also we are in the process of importing content (import hasn't finished yet). maaaaaaaaybe also "some content is currently failing to import but we're periodically retrying it" failing=true - we couldn't create content for some reason. I'm also ok w/ the idea that failing=true means an import failed (which is what this currently does I think). However right now it looks like when we have import errors, failing will be set to true, and when failing is set to true, we'll report progressing=true: a4f6417#diff-b62740dde2d9512574a75d6568b42fefR425 which..might be ok? but it seems weird that upon creating an imagestream we might go from progressing=false, to progressing=true if the import fails. |
ok I'll see about looking into that over the weekend in detail and update the commit / comment as needed in prep for talking on Monday |
Couldn't help it ... some more thought embedded below, and I will be
changing some
On Fri, Feb 1, 2019 at 6:24 PM Ben Parees ***@***.***> wrote:
Some of the condition logic looks off to me, let's talk through it on
monday.
What i'd naively expect:
available=true - content created (doesn't mean the import happened yet).
We have no choice about this to avoid blocking the installer
progressing=true - ideally: we are in the process of creating/updating the
content. maybe also we are in the process of importing content (import
hasn't finished yet). *maaaaaaaaybe* also "some content is currently
failing to import but we're periodically retrying it"
My interpretation is already waffling ... my first today read on
https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions
was that both progressing and available being true as only pertaining to a
release migration (4.0.0 to 4.0.1) in his example
But your comment prompted me to re-review .... even on the same level, we
are progressing to some goal.
I'm going to revert that to the old form.
failing=true - we couldn't create content for some reason. I'm also ok w/
the idea that failing=true means an import failed (which is what this
currently does I think).
No import failed meaning failing is true is commented out / that was removed
But I could revert that if we like. But my thought was the CVO at some
point could "block" on failing==true
But yeah we can talk on Monday
However right now it looks like when we have import errors, failing will
be set to true, and when failing is set to true, we'll report
progressing=true:
a4f6417#diff-b62740dde2d9512574a75d6568b42fefR425
<a4f6417#diff-b62740dde2d9512574a75d6568b42fefR425>
which..might be ok? but it seems weird that upon creating an imagestream
we might go from progressing=false, to progressing=true if the import fails.
I based that (even prior to this change) off of Clayton's last example
under
https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions
Progressing will be true, but it will have a negative sounding message
things going wrong.
I think the idea is that if you have not reached your "goal", you are still
"progressing", even if you may not be actively doing anything else,
though we will for image import on the sync/relist interval.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#81 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADbadC45jUSecjTjAm1Ni0msYW6Ywidzks5vJMzBgaJpZM4aEJIK>
.
|
a4f6417
to
11a70f6
Compare
/retest |
there we go @bparees ... between the e2e test change and adding the operator stub here we've got clean image-eco as well going to retest again to see how consistent |
/test all |
image-eco failed, but it was after the verification of the imagestreams being present that bit us before; this time, it was a problem pulling an image from the RH registry during a build:
|
/retest |
This image eco failure was just an incredibly slow dancer build. See these contents of the build log. The start:
and the end:
That's 9 Minutes |
…tale deploy yaml gate clusteroperator available on samples exists; don't set failing==true on import errors
11a70f6
to
a67081f
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bparees, gabemontero The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bparees, gabemontero The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
slow registry |
/retest |
/test e2e-aws per clayton's email, let's make sure this passes e2e-aws a few times so we know the new clusteroperator object doesn't cause us to block the install due to flakes/failures in our operator. |
unrelated flakes. |
networking flakes during initial install bringup of api server /test e2e-aws |
A passing e2e-aws run @bparees ... I think that is 2 passing with the latest changes, with some unrelated flakes in between |
@gabemontero cool, i'm ready to remove the hold if you are. |
/hold cancel |
https://jira.coreos.com/browse/DEVEXP-251
/assign @bparees