Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

force default to rhel and registry to old redhat one #81

Conversation

gabemontero
Copy link
Contributor

@gabemontero gabemontero commented Jan 16, 2019

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 16, 2019
@gabemontero
Copy link
Contributor Author

gabemontero commented Jan 17, 2019

@bparees : looking at the image-ecosystem failures, some of them seem related to not finding the wildfly imagestream:

With openshift-tests [image_ecosystem][Slow] openshift sample application repositories [image_ecosystem][nodejs] images with nodejs-ex repo Building nodejs app from new-app should build a nodejs image and run it in a pod [Suite:openshift] 46s

It does a check on the imagestreams before continuing, and:

Jan 16 23:48:57.056: INFO: ImageStream Error: &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:""}, Status:"Failure", Message:"imagestreams.image.openshift.io \"wildfly\" not found", Reason:"NotFound", Details:(*v1.StatusDetails)(0xc422232a20), Code:404}} 

wildfly is not included in the rhel/ocp version given all the EAP/JBoss content I would presume.

There are a few other similar examples besides the nodejs one noted above.

  • Start updating the e2e's to not look for wildfly?
  • Retag wildfly in openshift/library to be included in ocp?
  • Special case the samples operator to include wildfly in rhel/ocp?

@gabemontero
Copy link
Contributor Author

on the operator e2e there were a bunch of apiserver comm errors and the samples did not stabilize within the tests' time expectations

@gabemontero
Copy link
Contributor Author

/retest

1 similar comment
@gabemontero
Copy link
Contributor Author

/retest

@bparees
Copy link
Contributor

bparees commented Jan 17, 2019

Start updating the e2e's to not look for wildfly?

this one.

I don't think we have too many actual tests that rely on it (beyond checking for its presence), so hopefully it's not too painful. If you do run into cases where we need to do a legitimate java-based s2i build or something, you should be able to switch the TC to create its own wildfly imagestream.

@gabemontero
Copy link
Contributor Author

OK ...shall I update openshift/origin#21762 with those additional e2e tweaks, or do you prefer a separate PR?

@bparees
Copy link
Contributor

bparees commented Jan 17, 2019

OK ...shall I update openshift/origin#21762 with those additional e2e tweaks, or do you prefer a separate PR?

separate please

@gabemontero
Copy link
Contributor Author

Some api server comm hiccups, but also some image import failures going to registry.access.redhat.com prevented the e2e tests to get to the expected final state .....

... the switch to rhel might force us to implement image import retry within the operator (along with longer grace times to get the samples stable) @bparees :-(

or allow for image import failures wrt e2e validations and cluster operator status :-<

@gabemontero
Copy link
Contributor Author

fyi aws-image-ecosystem passed on my local AWS cluster with this PR's changes as well as openshift/origin#21824

note, I had to manually retry an image import got get all the rhel imagestreams clean during that test (it was in an eap/jboss imagestream that was unrelated to image_ecosystem)

Per discussions with @bparees I'll start tackling the operator retrying image imports with this PR in the hopes of getting e2e-aws-operator happier

@bparees
Copy link
Contributor

bparees commented Jan 18, 2019

these bits lgtm.

please put the retry logic in a separate commit for easy reviewing.

@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 23, 2019
README.md Outdated Show resolved Hide resolved
pkg/stub/handler.go Outdated Show resolved Hide resolved
pkg/stub/imagestreams.go Outdated Show resolved Hide resolved
@gabemontero
Copy link
Contributor Author

OK @bparees ptal

please note, a piece of the retry commit (adding a retry condition), proved to be unnecessary and a bit klunky to maintain ... removed it in the perf/fixes commit

@gabemontero
Copy link
Contributor Author

/test e2e-aws

@gabemontero
Copy link
Contributor Author

image eco is passing now

@gabemontero
Copy link
Contributor Author

the e2e-operator test suffered from various api server conn timeouts and enough import image errors to registry.access.redhat.com (Client.Timeout exceeded while awaiting headers) where the retries also failed such that the tests did not observe their expected results in time.

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

ok @bparees ptal

2 commits

  1. the clusteroperator stub yaml
  2. cluster operator available / progressing only gated by CR samples exists

left old form present but commented out (for reference in case we switch back) and added some comments on rationale

@bparees
Copy link
Contributor

bparees commented Feb 1, 2019

Some of the condition logic looks off to me, let's talk through it on monday.

What i'd naively expect:

available=true - content created (doesn't mean the import happened yet). We have no choice about this to avoid blocking the installer

progressing=true - ideally: we are in the process of creating/updating the content. maybe also we are in the process of importing content (import hasn't finished yet). maaaaaaaaybe also "some content is currently failing to import but we're periodically retrying it"

failing=true - we couldn't create content for some reason. I'm also ok w/ the idea that failing=true means an import failed (which is what this currently does I think).

However right now it looks like when we have import errors, failing will be set to true, and when failing is set to true, we'll report progressing=true:

a4f6417#diff-b62740dde2d9512574a75d6568b42fefR425

which..might be ok? but it seems weird that upon creating an imagestream we might go from progressing=false, to progressing=true if the import fails.

@gabemontero
Copy link
Contributor Author

ok I'll see about looking into that over the weekend in detail and update the commit / comment as needed in prep for talking on Monday

@gabemontero
Copy link
Contributor Author

gabemontero commented Feb 2, 2019 via email

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

there we go @bparees ... between the e2e test change and adding the operator stub here we've got clean image-eco as well

going to retest again to see how consistent

@gabemontero
Copy link
Contributor Author

/test all

@gabemontero
Copy link
Contributor Author

image-eco failed, but it was after the verification of the imagestreams being present that bit us before; this time, it was a problem pulling an image from the RH registry during a build:

Feb  3 22:18:27.156: INFO: Running 'oc logs --config=/tmp/admin.kubeconfig --namespace=e2e-test-s2i-php-jw8mm pod/cakephp-mysql-example-1-build -c sti-build -n e2e-test-s2i-php-jw8mm'
Feb  3 22:18:27.590: INFO: Log for pod "cakephp-mysql-example-1-build"/"sti-build"
---->
Caching blobs under "/var/cache/blobs".
Pulling docker://image-registry.openshift-image-registry.svc:5000/openshift/php@sha256:3895d8c39906fb07578ad5eb6dbdfb91471d2ebba570c67b2a1fccdf56c40c20
error: build error: 1 error occurred:
	* Error determining manifest MIME type for docker://image-registry.openshift-image-registry.svc:5000/openshift/php@sha256:3895d8c39906fb07578ad5eb6dbdfb91471d2ebba570c67b2a1fccdf56c40c20: Error reading manifest sha256:3895d8c39906fb07578ad5eb6dbdfb91471d2ebba570c67b2a1fccdf56c40c20 in image-registry.openshift-image-registry.svc:5000/openshift/php: unknown: unable to pull manifest from registry.access.redhat.com/rhscl/php-71-rhel7:latest: Get https://registry.access.redhat.com/v2/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
<----end of log for "cakephp-mysql-example-1-build"/"sti-build"

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

This image eco failure was just an incredibly slow dancer build. See these contents of the build log.

The start:

eb  4 04:18:42.138: INFO: Running 'oc logs --config=/tmp/admin.kubeconfig --namespace=e2e-test-dancer-repo-test-bpbdx -f bc/dancer-example --timestamps'
Feb  4 04:19:54.647: INFO: 

  build logs : 2019-02-04T04:10:24.511992547Z Cloning "https://github.com/sclorg/dancer-ex.git" ...
2019-02-04T04:10:28.401366083Z 	Commit:	96e1791a65027d96707a28035f74c1bee78c2882 (Merge pull request #75 from liangxia/okd)
2019-02-04T04:10:28.401366083Z 	Author:	Honza Horak <hhorak@redhat.com>
2019-02-04T04:10:28.401366083Z 	Date:	Tue Oct 16 15:46:22 2018 +0200
2019-02-04T04:10:41.767674694Z Caching blobs under "/var/cache/blobs".
2019-02-04T04:10:41.769595241Z Pulling docker://image-registry.openshift-image-registry.svc:5000/openshift/perl@sha256:32456e743a929894f32aa7319401399c32cd9043391a1a4ed348b49455626555

and the end:

2019-02-04T04:19:40.182365362Z Copying blob sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
2019-02-04T04:19:45.317481138Z Copying config sha256:168025691acb68568fc7fd35c100ad4213fb48ea65b0851a41757ee52aa99be8
2019-02-04T04:19:46.505204907Z Writing manifest to image destination
2019-02-04T04:19:46.756201437Z Storing signatures
2019-02-04T04:19:46.756201437Z Successfully pushed //image-registry.openshift-image-registry.svc:5000/e2e-test-dancer-repo-test-bpbdx/dancer-example:latest@sha256:fb8dd2c77ea71b4766c86332bd02d68d75858bcbd37c9da2694e19bba9ec48e9
2019-02-04T04:19:46.800940757Z Push successful

That's 9 Minutes

…tale deploy yaml

gate clusteroperator available on samples exists; don't set failing==true on import errors
@bparees
Copy link
Contributor

bparees commented Feb 4, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. labels Feb 4, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [bparees,gabemontero]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [bparees,gabemontero]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gabemontero
Copy link
Contributor Author

slow registry

@gabemontero
Copy link
Contributor Author

/retest

@bparees
Copy link
Contributor

bparees commented Feb 4, 2019

/test e2e-aws
/hold

per clayton's email, let's make sure this passes e2e-aws a few times so we know the new clusteroperator object doesn't cause us to block the install due to flakes/failures in our operator.

@bparees bparees added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 4, 2019
@bparees
Copy link
Contributor

bparees commented Feb 4, 2019

unrelated flakes.
/test e2e-aws

@gabemontero
Copy link
Contributor Author

networking flakes during initial install bringup of api server

/test e2e-aws

@gabemontero
Copy link
Contributor Author

A passing e2e-aws run @bparees ... I think that is 2 passing with the latest changes, with some unrelated flakes in between

@bparees
Copy link
Contributor

bparees commented Feb 5, 2019

@gabemontero cool, i'm ready to remove the hold if you are.

@gabemontero
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2019
@openshift-merge-robot openshift-merge-robot merged commit b118904 into openshift:master Feb 5, 2019
@gabemontero gabemontero deleted the rhel-def-but-jenkins-centos branch February 5, 2019 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants