Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report template controller's status in ControllerConfig #339

Merged
merged 5 commits into from
Jan 27, 2019

Conversation

abhinavdahiya
Copy link
Contributor

@abhinavdahiya abhinavdahiya commented Jan 23, 2019

ControllerConfig is currently used to configure the template controller.
And, while all other controllers report progress on various objects like,

  • render controller reports status on machine config pools
  • node controller reports status on machine config pools etc.

Any status updates for template controller have been missing..

This adds status field to ControllerConfig to allow reporting various conditions.
Currently 3 conditions have been added for template controller namely, completed, running and failing.

The operator needs to know when the internal machineconfigs in template controller have been synced to the cluster.
In terms of operator done is when template controller reports completed, not failing and not running for that current generation of ControllerConfig.

This makes sure that we are installing the server and daemon when the controller is installing its operands which has been causing some races like #338.

/cc @cgwalters @ashcrow

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2019
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 23, 2019
@abhinavdahiya abhinavdahiya force-pushed the template_status branch 2 times, most recently from 0eb682b to cc394d7 Compare January 23, 2019 02:05
@cgwalters
Copy link
Member

Is this working towards a fix for #338 ?

@abhinavdahiya
Copy link
Contributor Author

/test e2e-aws

@abhinavdahiya abhinavdahiya force-pushed the template_status branch 3 times, most recently from 6224fb0 to 802ff4a Compare January 23, 2019 23:06
@abhinavdahiya abhinavdahiya changed the title WIP: report template controller's status in ControllerConfig report template controller's status in ControllerConfig Jan 23, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2019
@abhinavdahiya
Copy link
Contributor Author

Is this working towards a fix for #338 ?

@cgwalters it is ready for review, updated the PR message too.

@abhinavdahiya
Copy link
Contributor Author

failures from e2e-aws

[sig-storage] Subpath [Volume type: hostPathSymlink] should fail for new directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: hostPathSymlink] should support existing directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: hostPathSymlink] should support existing directory [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: hostPathSymlink] should support existing single file [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: hostPathSymlink] should support file as subpath [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: hostPathSymlink] should support non-existent path [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: hostPathSymlink] should support readOnly directory specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: hostPathSymlink] should support readOnly file specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: hostPath] should support existing directory [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: nfsPVC] should support existing directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: nfs] should support existing directory [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Subpath [Volume type: nfs] should support existing single file [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Volumes iSCSI [Feature:Volumes] should be mountable [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

@kikisdeliveryservice
Copy link
Contributor

Pulling this down now, I should expect to see status in oc get -o yaml controllerconfig?

@ashcrow
Copy link
Member

ashcrow commented Jan 25, 2019

/retest

@abhinavdahiya
Copy link
Contributor Author

Pulling this down now, I should expect to see status in oc get -o yaml controllerconfig?

Yes

ControllerConfig is currently used to configure the `template controller`.
 And, while all other controllers report progress on various objects like,
- `render controller` reports status on `machine config pools`
- `node controller` reports status on `machine config pools` etc.

Any status updates for `template controller` have been missing..

This adds `status` field to `ControllerConfig` to allow reporting various conditions.
Currently 3 conditions have been added for `template controller` namely, `completed`, `running` and `failing`.
The operator needs to know when the internal machineconfigs in `template controller` have been synced to the cluster.
In terms of operator `done` is when `template controller` reports `completed`, not `failing` and not `running` for that current generation of `ControllerConfig`.
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated images in cluster to use this PR and can config that I now see:

status:
    conditions:
    - lastTransitionTime: 2019-01-25T22:59:17Z
      reason: sync completed towards (2) generation using controller version 3.11.0-507-gf19f48d4
      status: "True"
      type: TemplateContollerCompleted
    - lastTransitionTime: 2019-01-25T22:59:17Z
      status: "False"
      type: TemplateContollerRunning
    - lastTransitionTime: 2019-01-25T22:59:17Z
      status: "False"
      type: TemplateContollerFailing
    observedGeneration: 2

when I run: oc get -o yaml controllerconfig

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated review: Can we take a look at this logging to see if we can improve the UX?

pkg/operator/sync.go Outdated Show resolved Hide resolved
return wait.Poll(controllerConfigCompletedInterval, controllerConfigCompletedTimeout, func() (bool, error) {
if err := isControllerConfigCompleted(resource, optr.ccLister.Get); err != nil {
glog.Errorf("controllerconfig is not completed: %v", err)
return false, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return false, err ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to try to poll until timeout to prevent us exiting on interruptions / transient errors. the pool time out is 1 minute, which seems not too long.

return optr.waitForDeploymentRollout(mcc)
var waitErrs []error
waitErrs = append(waitErrs, optr.waitForDeploymentRollout(mcc))
waitErrs = append(waitErrs, optr.waitForControllerConfigToBeCompleted(cc))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. OK, I think I see how this approach should solve the problem.

And so then it should obsolete this outstanding change: c8109db

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked the logging tweak that I requested and it's resolved 👍
Will let @cgwalters loop back in to give his final review.

@cgwalters
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 26, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ashcrow
Copy link
Member

ashcrow commented Jan 26, 2019

/test e2e-aws

1 similar comment
@ashcrow
Copy link
Member

ashcrow commented Jan 26, 2019

/test e2e-aws

@ashcrow
Copy link
Member

ashcrow commented Jan 26, 2019

/retest

@ashcrow
Copy link
Member

ashcrow commented Jan 26, 2019

/test e2e-aws

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 59b15ca into openshift:master Jan 27, 2019
@abhinavdahiya abhinavdahiya deleted the template_status branch February 5, 2019 19:09
osherdp pushed a commit to osherdp/machine-config-operator that referenced this pull request Apr 13, 2021
Add IBM Cloud managed annotations to CVO manifests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants