Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

things david doesn't like #141

Closed
deads2k opened this issue Jun 28, 2019 · 7 comments
Closed

things david doesn't like #141

deads2k opened this issue Jun 28, 2019 · 7 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@deads2k
Copy link
Contributor

deads2k commented Jun 28, 2019

  1. https://github.com/openshift/cluster-authentication-operator/blob/master/pkg/operator2/configmap.go#L55-L59 . route.spec isn't the same as route.status.
    Your route hasn't been accepted yet, so the kube-apiserver could end up forwarding to an endpoint that isn't you!
  2. This configmap is an output, not an input. Please make it a separate control loop to keep your main loop tight with logical dependencies. Also, you don't want a rollout failure to disrupt keeping this outbound information correct.
    // make sure API server sees our metadata as soon as we've got a route with a host
    metadata, _, err := resourceapply.ApplyConfigMap(c.configMaps, c.recorder, getMetadataConfigMap(route))
    if err != nil {
    return fmt.Errorf("failure applying configMap for the .well-known endpoint: %v", err)
    }
    resourceVersions = append(resourceVersions, metadata.GetResourceVersion())
  3. Use a different FooDegraded for each control loop and allow the status union to combine them. It will make each condition write more obvious.
  4. handleAuthConfig is outbound state. Move into a different sync loop
  5. your check functions are good, but they should each set different FooDegraded conditions.
  6. serviceCA, servingCert, err := c.handleServiceCA() appears unnecessary. Directly depend on the key and the kubelet will properly put your pod into pending.
  7. hard code the hardcoded values in your oauth config. Service name for instance. Flexibility that isn't flexible is really hard to read.
  8. instead of passing a bunch of state through various handles, use your clients to lookup values. You can make them caching if the load becomes too high.
  9. accessTokenInactivityTimeoutSeconds appears to be a default. why didn't you default it?
  10. handleOAuthConfig appears to be an attempt at combining multiple different configobservers into a single loop. You do logically own all these things, but configobservation (even a single value) distinct from the main loop will give you working generations and logicaly separation you're lacking here.
  11. sync-ing of user config is driven outside the main loop of other operators to ensure that it always works regardless of the state of other rollouts. Secrets must rotate without choice, the old values are invalid.
  12. ensureBootstrappedOAuthClients looks completely distinct.
  13. looks like availability may be worth separating like static pod operators that have complicated availability rules.
@enj
Copy link
Contributor

enj commented Jun 28, 2019

100% agree with all above except:

1. https://github.com/openshift/cluster-authentication-operator/blob/master/pkg/operator2/configmap.go#L55-L59 .  route.spec isn't the same as route.status.
   Your route hasn't been accepted yet, so the kube-apiserver could end up forwarding to an endpoint that isn't you!

We never use the route until it is accepted and we explicitly set the host value. We also ignore any changes to that value - it is always oauth-openshift. + ingress.spec.domain. We can make the code more clear, but there is no chance of forwarding traffic to the wrong place.

6. `serviceCA, servingCert, err := c.handleServiceCA()` appears unnecessary. Directly depend on the key and the kubelet will properly put your pod into pending.

That code only exists to force a rollout on service CA rotation. Most of it could be moved to manifests and delegated to CVO though.

8. instead of passing a bunch of state through various handles, use your clients to lookup values.  You can make them caching if the load becomes too high.

I think this will naturally fallout as things are broken apart. The passing of state is mostly a by product of the state being available in one giant sync loop.

9. `accessTokenInactivityTimeoutSeconds` appears to be a default.  why didn't you default it?

I do not understand exactly what you are saying. Tokens have no inactivity timeout by default since we do not hate our users.

accessTokenInactivityTimeoutSeconds is actually dead code in the operator since the logic to honor it lives in KAS, and KAS-O has no config observer to wire it into the authenticator.

12. ensureBootstrappedOAuthClients looks completely distinct.

Standa was having a hard time writing his first controller so I asked him to do the minimal work in the main loop to unblock Sally. But yeah, it needs to go.

@stlaz
Copy link
Member

stlaz commented Dec 10, 2019

2019-12-10 update:
Still todo:

  1. skip based on the above comment? 🆗
    2.move oauth-metada CM handling to a separate control loop
  2. naturally each controller should have its own degraded messages, we only have two (about to be three) controllers, any new controller should stick by the rule of own degraded messages
  3. separate control loop for authentication.config
  4. this should now be fine 🆗
  5. looks like this could be safely moved to manifests now
  6. hard-code the values in bindata similarly to what OAS-o does in its workload controller
  7. this should mostly be solved by 10., there might still be some fallout from the the original mono-loop
  8. this would be a topic for a standalone feature
  9. this is attempted in Bug 1777137: add observation of idp config and validation of its cm/secrets #222
  10. I wonder if this means just syncing the config outside the main sync loop, say in the resource syncer, or if that's something else. @deads2k is that it or did you have something else in mind?
  11. yeah, this needs to be moved, I wanted to go with the framework used elsewhere in the operator in [WIP] Handle the challenging and browser oauth clients #136 which means splitting this into two separate controllers, but never got to finish it
  12. condition handling seems somewhat fine at the moment it would make sense to review it to see if it matches somewhat to OAS-o or KAS-o are doing

@stlaz
Copy link
Member

stlaz commented Dec 10, 2019

We have a gdoc splitting the work among other team members (only viewable if from Red Hat, sorry, community, not enough sharing options 😞 ) - https://docs.google.com/document/d/1S1o_K9CqQIwqnwL76PZczOaUCtznaH1K7eBX9CE0FOE/edit?usp=sharing

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2020
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 24, 2020
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link
Contributor

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants