force default to rhel and registry to old redhat one #81

gabemontero · 2019-01-16T23:15:18Z

https://jira.coreos.com/browse/DEVEXP-251

gabemontero · 2019-01-17T00:22:16Z

@bparees : looking at the image-ecosystem failures, some of them seem related to not finding the wildfly imagestream:

With openshift-tests [image_ecosystem][Slow] openshift sample application repositories [image_ecosystem][nodejs] images with nodejs-ex repo Building nodejs app from new-app should build a nodejs image and run it in a pod [Suite:openshift] 46s

It does a check on the imagestreams before continuing, and:

Jan 16 23:48:57.056: INFO: ImageStream Error: &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:""}, Status:"Failure", Message:"imagestreams.image.openshift.io \"wildfly\" not found", Reason:"NotFound", Details:(*v1.StatusDetails)(0xc422232a20), Code:404}}

wildfly is not included in the rhel/ocp version given all the EAP/JBoss content I would presume.

There are a few other similar examples besides the nodejs one noted above.

Start updating the e2e's to not look for wildfly?
Retag wildfly in openshift/library to be included in ocp?
Special case the samples operator to include wildfly in rhel/ocp?

gabemontero · 2019-01-17T02:35:56Z

on the operator e2e there were a bunch of apiserver comm errors and the samples did not stabilize within the tests' time expectations

gabemontero · 2019-01-17T02:36:04Z

/retest

gabemontero · 2019-01-17T14:49:07Z

/retest

bparees · 2019-01-17T15:04:03Z

Start updating the e2e's to not look for wildfly?

this one.

I don't think we have too many actual tests that rely on it (beyond checking for its presence), so hopefully it's not too painful. If you do run into cases where we need to do a legitimate java-based s2i build or something, you should be able to switch the TC to create its own wildfly imagestream.

gabemontero · 2019-01-17T15:11:01Z

OK ...shall I update openshift/origin#21762 with those additional e2e tweaks, or do you prefer a separate PR?

bparees · 2019-01-17T15:13:41Z

OK ...shall I update openshift/origin#21762 with those additional e2e tweaks, or do you prefer a separate PR?

separate please

gabemontero · 2019-01-17T19:18:20Z

Some api server comm hiccups, but also some image import failures going to registry.access.redhat.com prevented the e2e tests to get to the expected final state .....

... the switch to rhel might force us to implement image import retry within the operator (along with longer grace times to get the samples stable) @bparees :-(

or allow for image import failures wrt e2e validations and cluster operator status :-<

gabemontero · 2019-01-18T18:30:42Z

fyi aws-image-ecosystem passed on my local AWS cluster with this PR's changes as well as openshift/origin#21824

note, I had to manually retry an image import got get all the rhel imagestreams clean during that test (it was in an eap/jboss imagestream that was unrelated to image_ecosystem)

Per discussions with @bparees I'll start tackling the operator retrying image imports with this PR in the hopes of getting e2e-aws-operator happier

bparees · 2019-01-18T18:46:29Z

these bits lgtm.

please put the retry logic in a separate commit for easy reviewing.

README.md

pkg/apis/samples/v1/types.go

pkg/stub/handler.go

pkg/stub/imagestreams.go

gabemontero · 2019-01-23T00:23:10Z

OK @bparees ptal

please note, a piece of the retry commit (adding a retry condition), proved to be unnecessary and a bit klunky to maintain ... removed it in the perf/fixes commit

gabemontero · 2019-01-23T00:37:12Z

/test e2e-aws

gabemontero · 2019-01-23T01:31:05Z

image eco is passing now

gabemontero · 2019-01-23T01:35:03Z

the e2e-operator test suffered from various api server conn timeouts and enough import image errors to registry.access.redhat.com (Client.Timeout exceeded while awaiting headers) where the retries also failed such that the tests did not observe their expected results in time.

gabemontero · 2019-01-23T01:49:28Z

/retest

gabemontero · 2019-02-01T23:05:37Z

ok @bparees ptal

2 commits

the clusteroperator stub yaml
cluster operator available / progressing only gated by CR samples exists

left old form present but commented out (for reference in case we switch back) and added some comments on rationale

bparees · 2019-02-01T23:24:49Z

Some of the condition logic looks off to me, let's talk through it on monday.

What i'd naively expect:

available=true - content created (doesn't mean the import happened yet). We have no choice about this to avoid blocking the installer

progressing=true - ideally: we are in the process of creating/updating the content. maybe also we are in the process of importing content (import hasn't finished yet). maaaaaaaaybe also "some content is currently failing to import but we're periodically retrying it"

failing=true - we couldn't create content for some reason. I'm also ok w/ the idea that failing=true means an import failed (which is what this currently does I think).

However right now it looks like when we have import errors, failing will be set to true, and when failing is set to true, we'll report progressing=true:

a4f6417#diff-b62740dde2d9512574a75d6568b42fefR425

which..might be ok? but it seems weird that upon creating an imagestream we might go from progressing=false, to progressing=true if the import fails.

gabemontero · 2019-02-01T23:53:30Z

ok I'll see about looking into that over the weekend in detail and update the commit / comment as needed in prep for talking on Monday

gabemontero · 2019-02-02T00:17:19Z

Couldn't help it ... some more thought embedded below, and I will be changing some

On Fri, Feb 1, 2019 at 6:24 PM Ben Parees ***@***.***> wrote: Some of the condition logic looks off to me, let's talk through it on monday. What i'd naively expect: available=true - content created (doesn't mean the import happened yet). We have no choice about this to avoid blocking the installer progressing=true - ideally: we are in the process of creating/updating the content. maybe also we are in the process of importing content (import hasn't finished yet). *maaaaaaaaybe* also "some content is currently failing to import but we're periodically retrying it"

My interpretation is already waffling ... my first today read on https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions was that both progressing and available being true as only pertaining to a release migration (4.0.0 to 4.0.1) in his example But your comment prompted me to re-review .... even on the same level, we are progressing to some goal. I'm going to revert that to the old form.

failing=true - we couldn't create content for some reason. I'm also ok w/ the idea that failing=true means an import failed (which is what this currently does I think).

No import failed meaning failing is true is commented out / that was removed But I could revert that if we like. But my thought was the CVO at some point could "block" on failing==true But yeah we can talk on Monday

However right now it looks like when we have import errors, failing will be set to true, and when failing is set to true, we'll report progressing=true: a4f6417#diff-b62740dde2d9512574a75d6568b42fefR425 <a4f6417#diff-b62740dde2d9512574a75d6568b42fefR425> which..might be ok? but it seems weird that upon creating an imagestream we might go from progressing=false, to progressing=true if the import fails.

I based that (even prior to this change) off of Clayton's last example under https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions Progressing will be true, but it will have a negative sounding message things going wrong. I think the idea is that if you have not reached your "goal", you are still "progressing", even if you may not be actively doing anything else, though we will for image import on the sync/relist interval.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#81 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADbadC45jUSecjTjAm1Ni0msYW6Ywidzks5vJMzBgaJpZM4aEJIK> .

gabemontero · 2019-02-03T20:25:40Z

/retest

gabemontero · 2019-02-03T21:34:06Z

there we go @bparees ... between the e2e test change and adding the operator stub here we've got clean image-eco as well

going to retest again to see how consistent

gabemontero · 2019-02-03T21:34:16Z

/test all

gabemontero · 2019-02-04T03:33:33Z

image-eco failed, but it was after the verification of the imagestreams being present that bit us before; this time, it was a problem pulling an image from the RH registry during a build:

Feb  3 22:18:27.156: INFO: Running 'oc logs --config=/tmp/admin.kubeconfig --namespace=e2e-test-s2i-php-jw8mm pod/cakephp-mysql-example-1-build -c sti-build -n e2e-test-s2i-php-jw8mm'
Feb  3 22:18:27.590: INFO: Log for pod "cakephp-mysql-example-1-build"/"sti-build"
---->
Caching blobs under "/var/cache/blobs".
Pulling docker://image-registry.openshift-image-registry.svc:5000/openshift/php@sha256:3895d8c39906fb07578ad5eb6dbdfb91471d2ebba570c67b2a1fccdf56c40c20
error: build error: 1 error occurred:
	* Error determining manifest MIME type for docker://image-registry.openshift-image-registry.svc:5000/openshift/php@sha256:3895d8c39906fb07578ad5eb6dbdfb91471d2ebba570c67b2a1fccdf56c40c20: Error reading manifest sha256:3895d8c39906fb07578ad5eb6dbdfb91471d2ebba570c67b2a1fccdf56c40c20 in image-registry.openshift-image-registry.svc:5000/openshift/php: unknown: unable to pull manifest from registry.access.redhat.com/rhscl/php-71-rhel7:latest: Get https://registry.access.redhat.com/v2/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
<----end of log for "cakephp-mysql-example-1-build"/"sti-build"

gabemontero · 2019-02-04T03:33:50Z

/retest

gabemontero · 2019-02-04T15:25:24Z

This image eco failure was just an incredibly slow dancer build. See these contents of the build log.

The start:

eb  4 04:18:42.138: INFO: Running 'oc logs --config=/tmp/admin.kubeconfig --namespace=e2e-test-dancer-repo-test-bpbdx -f bc/dancer-example --timestamps'
Feb  4 04:19:54.647: INFO: 

  build logs : 2019-02-04T04:10:24.511992547Z Cloning "https://github.com/sclorg/dancer-ex.git" ...
2019-02-04T04:10:28.401366083Z 	Commit:	96e1791a65027d96707a28035f74c1bee78c2882 (Merge pull request #75 from liangxia/okd)
2019-02-04T04:10:28.401366083Z 	Author:	Honza Horak <hhorak@redhat.com>
2019-02-04T04:10:28.401366083Z 	Date:	Tue Oct 16 15:46:22 2018 +0200
2019-02-04T04:10:41.767674694Z Caching blobs under "/var/cache/blobs".
2019-02-04T04:10:41.769595241Z Pulling docker://image-registry.openshift-image-registry.svc:5000/openshift/perl@sha256:32456e743a929894f32aa7319401399c32cd9043391a1a4ed348b49455626555

and the end:

2019-02-04T04:19:40.182365362Z Copying blob sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
2019-02-04T04:19:45.317481138Z Copying config sha256:168025691acb68568fc7fd35c100ad4213fb48ea65b0851a41757ee52aa99be8
2019-02-04T04:19:46.505204907Z Writing manifest to image destination
2019-02-04T04:19:46.756201437Z Storing signatures
2019-02-04T04:19:46.756201437Z Successfully pushed //image-registry.openshift-image-registry.svc:5000/e2e-test-dancer-repo-test-bpbdx/dancer-example:latest@sha256:fb8dd2c77ea71b4766c86332bd02d68d75858bcbd37c9da2694e19bba9ec48e9
2019-02-04T04:19:46.800940757Z Push successful

That's 9 Minutes

…tale deploy yaml gate clusteroperator available on samples exists; don't set failing==true on import errors

bparees · 2019-02-04T16:23:26Z

/lgtm

openshift-ci-robot · 2019-02-04T16:23:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [bparees,gabemontero]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2019-02-04T16:23:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [bparees,gabemontero]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gabemontero · 2019-02-04T18:16:45Z

slow registry

gabemontero · 2019-02-04T18:16:51Z

/retest

bparees · 2019-02-04T19:49:37Z

/test e2e-aws
/hold

per clayton's email, let's make sure this passes e2e-aws a few times so we know the new clusteroperator object doesn't cause us to block the install due to flakes/failures in our operator.

bparees · 2019-02-04T22:00:46Z

unrelated flakes.
/test e2e-aws

gabemontero · 2019-02-05T14:40:59Z

networking flakes during initial install bringup of api server

/test e2e-aws

gabemontero · 2019-02-05T16:29:13Z

A passing e2e-aws run @bparees ... I think that is 2 passing with the latest changes, with some unrelated flakes in between

bparees · 2019-02-05T16:29:53Z

@gabemontero cool, i'm ready to remove the hold if you are.

gabemontero · 2019-02-05T16:30:26Z

/hold cancel

openshift-ci-robot assigned bparees Jan 16, 2019

openshift-ci-robot requested review from dmage and legionus January 16, 2019 23:15

openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 16, 2019

openshift-ci-robot requested a review from bparees January 16, 2019 23:15

gabemontero mentioned this pull request Jan 18, 2019

no longer validate existence of wildfly imagestream as part of switch… openshift/origin#21824

Merged

gabemontero force-pushed the rhel-def-but-jenkins-centos branch from 580e654 to 121b594 Compare January 23, 2019 00:19

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 23, 2019

gabemontero commented Jan 23, 2019

View reviewed changes

README.md Outdated Show resolved Hide resolved

gabemontero commented Jan 23, 2019

View reviewed changes

pkg/apis/samples/v1/types.go Outdated Show resolved Hide resolved

gabemontero commented Jan 23, 2019

View reviewed changes

pkg/stub/handler.go Outdated Show resolved Hide resolved

gabemontero commented Jan 23, 2019

View reviewed changes

pkg/stub/imagestreams.go Outdated Show resolved Hide resolved

gabemontero force-pushed the rhel-def-but-jenkins-centos branch from a4f6417 to 11a70f6 Compare February 2, 2019 00:21

add stub cluster op def so cvo blocks on samples operator; clean up s…

a67081f

…tale deploy yaml gate clusteroperator available on samples exists; don't set failing==true on import errors

gabemontero force-pushed the rhel-def-but-jenkins-centos branch from 11a70f6 to a67081f Compare February 4, 2019 15:54

openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. labels Feb 4, 2019

bparees added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 4, 2019

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2019

openshift-merge-robot merged commit b118904 into openshift:master Feb 5, 2019

gabemontero deleted the rhel-def-but-jenkins-centos branch February 5, 2019 18:01

force default to rhel and registry to old redhat one #81

force default to rhel and registry to old redhat one #81

Conversation

gabemontero commented Jan 16, 2019 • edited Loading

gabemontero commented Jan 17, 2019 • edited Loading

gabemontero commented Jan 17, 2019

gabemontero commented Jan 17, 2019

gabemontero commented Jan 17, 2019

bparees commented Jan 17, 2019

gabemontero commented Jan 17, 2019

bparees commented Jan 17, 2019

gabemontero commented Jan 17, 2019

gabemontero commented Jan 18, 2019

bparees commented Jan 18, 2019

gabemontero commented Jan 23, 2019

gabemontero commented Jan 23, 2019

gabemontero commented Jan 23, 2019

gabemontero commented Jan 23, 2019

gabemontero commented Jan 23, 2019

gabemontero commented Feb 1, 2019

bparees commented Feb 1, 2019

gabemontero commented Feb 1, 2019

gabemontero commented Feb 2, 2019 via email

gabemontero commented Feb 3, 2019

gabemontero commented Feb 3, 2019

gabemontero commented Feb 3, 2019

gabemontero commented Feb 4, 2019

gabemontero commented Feb 4, 2019

gabemontero commented Feb 4, 2019

bparees commented Feb 4, 2019

openshift-ci-robot commented Feb 4, 2019

openshift-ci-robot commented Feb 4, 2019

gabemontero commented Feb 4, 2019

gabemontero commented Feb 4, 2019

bparees commented Feb 4, 2019

bparees commented Feb 4, 2019

gabemontero commented Feb 5, 2019

gabemontero commented Feb 5, 2019

bparees commented Feb 5, 2019

gabemontero commented Feb 5, 2019

gabemontero commented Jan 16, 2019 •

edited

Loading

gabemontero commented Jan 17, 2019 •

edited

Loading