things david doesn't like #141

deads2k · 2019-06-28T15:42:09Z

https://github.com/openshift/cluster-authentication-operator/blob/master/pkg/operator2/configmap.go#L55-L59 . route.spec isn't the same as route.status.
Your route hasn't been accepted yet, so the kube-apiserver could end up forwarding to an endpoint that isn't you!

This configmap is an output, not an input. Please make it a separate control loop to keep your main loop tight with logical dependencies. Also, you don't want a rollout failure to disrupt keeping this outbound information correct.

cluster-authentication-operator/pkg/operator2/operator.go

Lines 321 to 326 in 04eedbb

    
           // make sure API server sees our metadata as soon as we've got a route with a host 
        
           metadata, _, err := resourceapply.ApplyConfigMap(c.configMaps, c.recorder, getMetadataConfigMap(route)) 
        
           if err != nil { 
        
           	return fmt.Errorf("failure applying configMap for the .well-known endpoint: %v", err) 
        
           } 
        
           resourceVersions = append(resourceVersions, metadata.GetResourceVersion())

Use a different FooDegraded for each control loop and allow the status union to combine them. It will make each condition write more obvious.
handleAuthConfig is outbound state. Move into a different sync loop
your check functions are good, but they should each set different FooDegraded conditions.
serviceCA, servingCert, err := c.handleServiceCA() appears unnecessary. Directly depend on the key and the kubelet will properly put your pod into pending.
hard code the hardcoded values in your oauth config. Service name for instance. Flexibility that isn't flexible is really hard to read.
instead of passing a bunch of state through various handles, use your clients to lookup values. You can make them caching if the load becomes too high.
accessTokenInactivityTimeoutSeconds appears to be a default. why didn't you default it?
handleOAuthConfig appears to be an attempt at combining multiple different configobservers into a single loop. You do logically own all these things, but configobservation (even a single value) distinct from the main loop will give you working generations and logicaly separation you're lacking here.
sync-ing of user config is driven outside the main loop of other operators to ensure that it always works regardless of the state of other rollouts. Secrets must rotate without choice, the old values are invalid.
ensureBootstrappedOAuthClients looks completely distinct.
looks like availability may be worth separating like static pod operators that have complicated availability rules.

The text was updated successfully, but these errors were encountered:

enj · 2019-06-28T23:06:12Z

100% agree with all above except:

1. https://github.com/openshift/cluster-authentication-operator/blob/master/pkg/operator2/configmap.go#L55-L59 .  route.spec isn't the same as route.status.
   Your route hasn't been accepted yet, so the kube-apiserver could end up forwarding to an endpoint that isn't you!

We never use the route until it is accepted and we explicitly set the host value. We also ignore any changes to that value - it is always oauth-openshift. + ingress.spec.domain. We can make the code more clear, but there is no chance of forwarding traffic to the wrong place.

6. `serviceCA, servingCert, err := c.handleServiceCA()` appears unnecessary. Directly depend on the key and the kubelet will properly put your pod into pending.

That code only exists to force a rollout on service CA rotation. Most of it could be moved to manifests and delegated to CVO though.

8. instead of passing a bunch of state through various handles, use your clients to lookup values.  You can make them caching if the load becomes too high.

I think this will naturally fallout as things are broken apart. The passing of state is mostly a by product of the state being available in one giant sync loop.

9. `accessTokenInactivityTimeoutSeconds` appears to be a default.  why didn't you default it?

I do not understand exactly what you are saying. Tokens have no inactivity timeout by default since we do not hate our users.

accessTokenInactivityTimeoutSeconds is actually dead code in the operator since the logic to honor it lives in KAS, and KAS-O has no config observer to wire it into the authenticator.

12. ensureBootstrappedOAuthClients looks completely distinct.

Standa was having a hard time writing his first controller so I asked him to do the minimal work in the main loop to unblock Sally. But yeah, it needs to go.

stlaz · 2019-12-10T13:48:42Z

2019-12-10 update:
Still todo:

skip based on the above comment? 🆗
2.move oauth-metada CM handling to a separate control loop
naturally each controller should have its own degraded messages, we only have two (about to be three) controllers, any new controller should stick by the rule of own degraded messages
separate control loop for authentication.config
this should now be fine 🆗
looks like this could be safely moved to manifests now
hard-code the values in bindata similarly to what OAS-o does in its workload controller
this should mostly be solved by 10., there might still be some fallout from the the original mono-loop
this would be a topic for a standalone feature
this is attempted in Bug 1777137: add observation of idp config and validation of its cm/secrets #222
I wonder if this means just syncing the config outside the main sync loop, say in the resource syncer, or if that's something else. @deads2k is that it or did you have something else in mind?
yeah, this needs to be moved, I wanted to go with the framework used elsewhere in the operator in [WIP] Handle the challenging and browser oauth clients #136 which means splitting this into two separate controllers, but never got to finish it
condition handling seems somewhat fine at the moment it would make sense to review it to see if it matches somewhat to OAS-o or KAS-o are doing

stlaz · 2019-12-10T14:41:57Z

We have a gdoc splitting the work among other team members (only viewable if from Red Hat, sorry, community, not enough sharing options 😞 ) - https://docs.google.com/document/d/1S1o_K9CqQIwqnwL76PZczOaUCtznaH1K7eBX9CE0FOE/edit?usp=sharing

openshift-bot · 2020-09-23T17:16:30Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2020-10-24T13:10:40Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2020-12-02T21:38:26Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2020-12-02T21:38:44Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

benjaminapetersen mentioned this issue Jul 1, 2019

Suggested Improvements based on things David doesn't like... openshift/console-operator#251

Closed

enj mentioned this issue Jul 1, 2019

Bug 1725935: Split handling of degraded across multiple conditions #142

Merged

enj mentioned this issue Jul 30, 2019

WIP: Honor the None authentication type and the WebhookTokenAuthenticators #154

Closed

vareti mentioned this issue Jan 27, 2020

refactor authentication.config into separate control loop #247

Closed

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2020

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 24, 2020

openshift-ci-robot closed this as completed Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

things david doesn't like #141

things david doesn't like #141

deads2k commented Jun 28, 2019 •

edited by stlaz

enj commented Jun 28, 2019

stlaz commented Dec 10, 2019

stlaz commented Dec 10, 2019

openshift-bot commented Sep 23, 2020

openshift-bot commented Oct 24, 2020

openshift-bot commented Dec 2, 2020

openshift-ci-robot commented Dec 2, 2020

things david doesn't like #141

things david doesn't like #141

Comments

deads2k commented Jun 28, 2019 • edited by stlaz

enj commented Jun 28, 2019

stlaz commented Dec 10, 2019

stlaz commented Dec 10, 2019

openshift-bot commented Sep 23, 2020

openshift-bot commented Oct 24, 2020

openshift-bot commented Dec 2, 2020

openshift-ci-robot commented Dec 2, 2020

deads2k commented Jun 28, 2019 •

edited by stlaz