Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust clusteroperator msgs to https://github.com/openshift/cluster-v… #70

Merged

Conversation

gabemontero
Copy link
Contributor

…ersion-operator/blob/master/docs/dev/clusteroperator.md#what-should-an-operator-report-with-clusteroperator-custom-resource

Fixes #69

@bparees @smarterclayton

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 18, 2018
@gabemontero
Copy link
Contributor Author

getting ci setup errors so far, but have some changes coming based on local testing

/hold

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 18, 2018
@gabemontero gabemontero added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 18, 2018
@gabemontero
Copy link
Contributor Author

yeah @bparees @smarterclayton don't bother reviewing until I push my next commit and announce this is viable

pkg/operatorstatus/operatorstatus.go Outdated Show resolved Hide resolved
@bparees
Copy link
Contributor

bparees commented Dec 18, 2018

yeah @bparees @smarterclayton don't bother reviewing until I push my next commit and announce this is viable

sorry, didn't see this in time :)

@gabemontero
Copy link
Contributor Author

gabemontero commented Dec 18, 2018 via email

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 19, 2018
@gabemontero
Copy link
Contributor Author

OK the installer (both aws and libvirt) quit working for me between the office and home, so I couldn't finish testing, but I think I got far enough along to at least start reviewing.

Hopefully I can test tomorrow AM in conjunction with iterating over any review we get to over PTO (which starts tomorrow PM for me, but I plan on spending some cycles over PTO).

@bparees ^^

@gabemontero
Copy link
Contributor Author

got cluster install working again @bparees but have found some things that need to fix ... so please consider "paused" again until I update

@gabemontero
Copy link
Contributor Author

OK things look good in my local testing now. Both initial startup and release upgrade flows.

@bparees ptal whenever you have the chance

pkg/operatorstatus/operatorstatus.go Outdated Show resolved Hide resolved
pkg/operatorstatus/operatorstatus.go Outdated Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 27, 2018
@gabemontero
Copy link
Contributor Author

OK I've pushed updates @bparees .

  • overall better organized and cleaner I think ... see if you agree
  • Also a little more settled on the logic around determining state after re-review
  • Added a few Conditions per discussion above to help manage state and errors
  • I don't have a "truth table" documented in one place, but I think the comments around various decisions are a reasonable substitute ... see if you agree
  • some minor fixes,
  • and also a change to the imagestream retry between samples upserts and setting of the image import condition to true ... it worked better in local testing (started seeing fragility in the counter based retry), and the operator state being maintained can be lost with no ill consequence

@gabemontero
Copy link
Contributor Author

bump @bparees .... happy new year

Copy link
Contributor

@bparees bparees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm loving all these comments :)

// scheduled imports for example to fail ... the imagestream "state" could
// be sufficiently compromised, so we'll flag false there
if needCreds {
return falseRC, falseMsg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not clear on why this isn't based on whether the samplesexist==true && importcomplete? isn't that what can resolve whether the imagestream state actually is compromised or not?

(sorry if i'm dragging us back into a discussion we resolved pre-holiday)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or per your comments below, shouldn't it be checking notAtAnyVersionYet? (if we're at some version, then we're available, even if we currently "needCreds" to move to the desired version)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmmm ... so in a) taking in your point, and b) rereading (yet again) https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions, I think there are these considerations here:

  1. An API error during the initial install
  2. We have installed at least once, are still at the installed version, but now have an error of some sort (could be bad api server interactions, needs creds for rhel, or the config is invalid
  3. Item 2), but are migrating to a new version vs. staying at the installed version

For 1), I'll keep the empty message, and false ... details in the other two conditions

For 3), per the guideline, we need to set available to false, with the unable to reach message

For 2), as you say, we should set available to true, with the ok at the version in question, where again the error will show up in the failing/progressing messages

To get to this behavior, I think we

  • add spec.Version/status.Version comparison, accounting for not initially installed, to determine if we are migrating versions
  • if migrating, ANY error condition means we return false with the unable to reach message
  • if not migrating, we just leverage not any version, etc. as you note below

Thoughts? ... I'll work on the update if we agree here ...

thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per hallway discussion we'll revisit this after there is org-wide agreement on how failing/progressing/available should be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK assuming openshift/cluster-version-operator#73 holds up, scenario 3) goes away @bparees

If so, I'll create a new PR to get rid of the apiError and needsCreds conditionals.

@bparees
Copy link
Contributor

bparees commented Jan 2, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. labels Jan 2, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [bparees,gabemontero]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [bparees,gabemontero]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gabemontero gabemontero removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 2, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@gabemontero
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 85ee5a9 into openshift:master Jan 3, 2019
@gabemontero gabemontero deleted the msgs-clayton-will-like branch January 3, 2019 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants