Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload controller for CAO #243

Conversation

p0lyn0mial
Copy link
Contributor

@p0lyn0mial p0lyn0mial commented Jan 20, 2020

it's based on the workload controller in cas-o except it

  • tries to be generic at least in (CAO)
  • doesn't take operator.Generation into account because CAO will manage multiple operands and that would make them compete.

TODO:

TODO:

QUESTIONS:

  • do we need WithoutClusterOperatorStatusController, WithoutFinalizerController, WithoutLogLevelController, WithoutConfigUpgradableController ?

    Answer from Standa: we do want to call WithoutClusterOperatorStatusController, WithoutLogLevelController, WithoutConfigUpgradableController as these are already running

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 20, 2020
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 20, 2020
@p0lyn0mial
Copy link
Contributor Author

tested it on my local cluster and it seems to be working.

k get ds -n openshift-oauth-apiserver 
NAME        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
apiserver   3         3         3       3            3           node-role.kubernetes.io/master=   47m
k get po -n openshift-oauth-apiserver 
NAME              READY   STATUS    RESTARTS   AGE
apiserver-4htqj   1/1     Running   0          44m
apiserver-5q2hd   1/1     Running   0          44m
apiserver-w5rbv   1/1     Running   0          44m

@p0lyn0mial
Copy link
Contributor Author

p0lyn0mial commented Jan 20, 2020

the new conditions look really good (thanks @deads2k)

  - lastTransitionTime: "2020-01-20T12:23:40Z"
    reason: AsExpected
    status: "True"
    type: OAuthAPIServerOperatorDaemonSetAvailable
  - lastTransitionTime: "2020-01-20T12:18:34Z"
    status: "False"
    type: OAuthAPIServerOperatorWorkloadDegraded
  - lastTransitionTime: "2020-01-20T12:20:53Z"
    reason: AsExpected
    status: "False"
    type: OAuthAPIServerOperatorsDaemonSetDegraded
  - lastTransitionTime: "2020-01-20T12:23:40Z"
    reason: AsExpected
    status: "False"
    type: OAuthAPIServerOperatorsDaemonSetProgressing

Copy link
Member

@stlaz stlaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good start, needs polishing though

Comment on lines 62 to 88
// manage assets
directResourceResults := resourceapply.ApplyDirectly(c.kubeClient, c.eventRecorder, assets.Asset,
"oauth-apiserver/ns.yaml",
"oauth-apiserver/apiserver-clusterrolebinding.yaml",
"oauth-apiserver/svc.yaml",
"oauth-apiserver/sa.yaml",
"oauth-apiserver/cm.yaml",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should eventually be handled by openshift/cluster-openshift-apiserver-operator#304 from within the APIServerControllerset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

pkg/operator2/workload/sync_openshift_oauth_apiserver.go Outdated Show resolved Hide resolved
pkg/operator2/workload/workload_controller.go Outdated Show resolved Hide resolved
pkg/operator2/workload/sync_openshift_oauth_apiserver.go Outdated Show resolved Hide resolved
return err
}

if run, err := c.canRunOperator(operatorSpec); !run {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe c.shouldSync(operatorSpec)?


type syncOperatorFunc func() (*appsv1.DaemonSet, []error)

type OAuthAPIServerOperator struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/OAuthAPIServerOperator/OAuthAPIServerWorkloadController? Or maybe just APIServerWorkloadController

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I should change the name, thx.

Comment on lines 232 to 291
dsAvailableCondition.Status = operatorv1.ConditionFalse
dsAvailableCondition.Reason = "NoDaemon"
dsAvailableCondition.Message = message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a helper function for each condition

dsDegradedCondition.Reason = "NoDaemon"
dsDegradedCondition.Message = message

return errors.NewAggregate(errs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the conditions won't ever get set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thx.

return true, nil
}

func (c *OAuthAPIServerOperator) updateOperatorStatus(workload *appsv1.DaemonSet, errs []error) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having a bit of a trouble following everything that's being checked in this function, could you try to split it a bit to smaller separate units, have helpers for each condition type?

eventRecorder events.Recorder
versionRecorder status.VersionGetter
preRunCachesSynced []cache.InformerSynced
queue workqueue.RateLimitingInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squash formatting commit to the introducing one :)

@@ -57,7 +57,7 @@ spec:
--etcd-keyfile=/var/run/secrets/etcd-client/tls.key \
--etcd-certfile=/var/run/secrets/etcd-client/tls.crt \
--shutdown-delay-duration=3s \
--v=2
${FLAGS}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the other repo we dealt with this using a specific replace, not a general flags

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the --v flags we did, replace or append.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a replace if I understand the code right, just with a clever string replace syntax ${FLAGS}. So this is what I expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I do use string replace and append --v to all flags.

@@ -57,7 +57,7 @@ spec:
--etcd-keyfile=/var/run/secrets/etcd-client/tls.key \
--etcd-certfile=/var/run/secrets/etcd-client/tls.crt \
--shutdown-delay-duration=3s \
--v=2
${FLAGS}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call verbosity a standard replacer. YOu can imagine having those. it doesn't preclude a ${FLAGS} too

// TODO: add LatestAvailableRevision support
//"${REVISION}", strconv.Itoa(int(authOperator.Status.LatestAvailableRevision)),
"${REVISION}", "1",
"${VERBOSITY}", loglevelToKlog(authOperator.Spec.LogLevel),
"${FLAGS}", strings.Join(padFlags(operandFlags, strings.Repeat(" ", 14)), " \\\n"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there some clever bash escape func we can use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has to be a valid YAM file - I tried to look up a library that would know how to format it properly but didn't find anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escape for bash and then for yaml 😱

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for yaml escape, we can switch from string replace to yaml replace by unmarshalling first, and then marshalling again.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 23, 2020
@p0lyn0mial p0lyn0mial force-pushed the workload-controller-for-oauth-apiserver branch from 69c961d to 5c1b9b7 Compare January 24, 2020 15:13
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 24, 2020
@p0lyn0mial p0lyn0mial changed the title [WIP] workload controller for CAO workload controller for CAO Jan 24, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 24, 2020
@p0lyn0mial
Copy link
Contributor Author

/retest

@p0lyn0mial
Copy link
Contributor Author

/test e2e-aws

@p0lyn0mial
Copy link
Contributor Author

/retest

1 similar comment
@p0lyn0mial
Copy link
Contributor Author

/retest

@p0lyn0mial
Copy link
Contributor Author

p0lyn0mial commented Jan 27, 2020

e2e-aws looks quite good, openshift-oauth-apiserver namespace was created and authentication operator reported success:

  - lastTransitionTime: "2020-01-27T13:10:11Z"
    reason: AsExpected
    status: "True"
    type: OAuthAPIServerControllerWorkloadCtrlDaemonSetAvailable
  - lastTransitionTime: "2020-01-27T13:05:17Z"
    status: "False"
    type: OAuthAPIServerControllerWorkloadCtrlWorkloadDegraded
  - lastTransitionTime: "2020-01-27T13:10:51Z"
    reason: AsExpected
    status: "False"
    type: OAuthAPIServerControllerWorkloadCtrlsDaemonSetDegraded
  - lastTransitionTime: "2020-01-27T13:05:19Z"
    reason: AsExpected
    status: "False"
    type: OAuthAPIServerControllerWorkloadCtrlsDaemonSetProgressing

the only thing that worries me a bit is the initial logs from the servers - as if they were installed too early (error building REST storage: context deadline exceeded)

@p0lyn0mial
Copy link
Contributor Author

actually error building REST storage: context deadline exceeded might be a transient error - e2e-aws-console-login looks better

@p0lyn0mial
Copy link
Contributor Author

/test e2e-aws-upgrade

@p0lyn0mial
Copy link
Contributor Author

/test e2e-aws

@p0lyn0mial
Copy link
Contributor Author

Alright, so etcd related errors (for example, or the errors like) seem to have been occurring during "the initial phase" and are considered normal. Eventually, the app is up and ready.

pkg/operator2/starter.go Outdated Show resolved Hide resolved
pkg/operator2/starter.go Outdated Show resolved Hide resolved
kyaml "k8s.io/apimachinery/pkg/util/yaml"
"k8s.io/client-go/kubernetes"
"regexp"
"strings"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order

makes it compile again
gens bindata
adds oauth-apiserver to image references
wires apiServerArguments
escapes args and wires PreconditionFulfilled
wires ObserveStorageURLsToArguments
wires the encryption-config secret
switches to ApplyDeployment and the new EnsureAtMostOnePodPerNode
adds config observer for oauth apiserver
adds config observer for encryption-config secret
wires ObserveStorageURLsToArguments
adds the finalizer controller to the oauth-apiserver controllerset and the secret revision pruner
moves configOverridesController and logLevelController to RunOperator
@p0lyn0mial p0lyn0mial force-pushed the workload-controller-for-oauth-apiserver branch from 3887f85 to ce68aac Compare July 29, 2020 14:07
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 29, 2020
@p0lyn0mial p0lyn0mial force-pushed the workload-controller-for-oauth-apiserver branch from ce68aac to fdac04e Compare July 29, 2020 14:11
@p0lyn0mial p0lyn0mial force-pushed the workload-controller-for-oauth-apiserver branch from fdac04e to 3582236 Compare July 29, 2020 14:43
@sttts
Copy link
Contributor

sttts commented Jul 29, 2020

/approve
Lgtm

Waiting for test feedback.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2020
@p0lyn0mial
Copy link
Contributor Author

Error: Error launching source instance: RequestLimitExceeded: Request limit exceeded

/retest

@sttts
Copy link
Contributor

sttts commented Jul 29, 2020

Registry down.

/retest

@sttts
Copy link
Contributor

sttts commented Jul 29, 2020

Network operator degraded.

/retest

@p0lyn0mial
Copy link
Contributor Author

[sig-arch][Early] Managed cluster should start all core operators [Suite:openshift/conformance/parallel]
[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Suite:openshift/conformance/parallel]

/retest

@p0lyn0mial
Copy link
Contributor Author

p0lyn0mial commented Jul 29, 2020

Entrypoint received interrupt: terminated for (e2e-aws-operator)

/test e2e-aws-operator

@p0lyn0mial
Copy link
Contributor Author

Entrypoint received interrupt for (e2e-aws-operator )

/retest

@sttts
Copy link
Contributor

sttts commented Jul 30, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 6224122 into openshift:master Jul 30, 2020
@sttts
Copy link
Contributor

sttts commented Jul 30, 2020

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants