New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Idling to v1alpha2 #19205
Update Idling to v1alpha2 #19205
Conversation
@knobunc ptal |
@sallyom FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Nice work Solly.
0120b7a
to
d1585f3
Compare
I haven't actually tested any of the new commits to see if they cause explosions or whatnot, and they certainly don't pass units tests, but it compiles. The router was a bit hairy. That took up a decent chunk of my evening trying to get a clean-ish solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow... the router changes are rather large. This is going to need some time to review, and a change this large is a bit scary.
/retest |
95e0a4d
to
9137c9a
Compare
glog.Fatalf("error: Could not initialize Kubernetes Proxy. You must run this process as root (and if containerized, in the host network namespace as privileged) to use the service proxy: %v", err) | ||
} | ||
signaler := unidler.NewIdlerSignaler(c.IdlingClientset.IdlingV1alpha2(), lookup) | ||
// NB: this must not be the masquerade bit (0x1 by default) or the drop bit (0x8000 by default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what "this" applies to. And perhaps move it to where mark bit is defined, or set...
pkg/idling/lookup.go
Outdated
return res, nil | ||
} | ||
|
||
// IdlerServiceLookup knows how to look which idler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is weird and redundant with the one two lines later... perhaps just move that one up and get rid of this one?
pkg/proxy/hybrid/proxy.go
Outdated
glog.V(8).Infof("hybrid proxy: ignore unchanged idled state for %s/%s on update", newIdler.Namespace, newIdler.Name) | ||
return | ||
} | ||
// TODO: deal with the case where someone adds a trigger service during idling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do this now?
pkg/proxy/hybrid/proxy.go
Outdated
|
||
func (h *idlerChangeHandler) OnAdd(obj interface{}) { | ||
// the addition of an idler is always going to be either a no-op | ||
// (if not idled) or a switch to idled (if idled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording is a little confusing.
pkg/proxy/hybrid/proxy.go
Outdated
func (h *idlerChangeHandler) switchIdledOn(namespace string, svcNames []string) { | ||
glog.V(6).Infof("hybrid proxy: switch idling on for %s/%v", namespace, svcNames) | ||
for _, svcName := range svcNames { | ||
// make sure this is before the endpoints, so that if we have missing endpoints, we can still switch endpoints on and off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"services on and off"
pkg/proxy/unidler/unidlerproxy.go
Outdated
} | ||
|
||
func (p *UnidlerProxy) OnServiceSynced() { | ||
// TODO: do we need to do anything here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enlighten me...
pkg/oc/cli/cmd/idle.go
Outdated
kcmdutil.AddFilenameOptionFlags(cmd, &opts.FilenameOptions, "Filename, directory, or URL to a file identifying the resource to get from a server.") | ||
cmd.Flags().StringVarP(&opts.LabelSelector, "selector", "l", opts.LabelSelector, "Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)") | ||
cmd.Flags().BoolVar(&opts.AllNamespaces, "all-namespaces", opts.AllNamespaces, "If present, list the requested object(s) across all namespaces. Namespace in current context is ignored even if specified with --namespace.") | ||
cmd.Flags().BoolVar(&opts.Unidle, "un", opts.Unidle, "If present, unidle the given idlers.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd got with --unidle... But I like --un more.
test/extended/idling/idling.go
Outdated
}) | ||
|
||
// TODO: Work out how to make this test work correctly when run on AWS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this?
pkg/cmd/infra/router/template.go
Outdated
@@ -119,6 +127,7 @@ func (o *TemplateRouter) Bind(flag *pflag.FlagSet) { | |||
flag.StringVar(&o.Ciphers, "ciphers", util.Env("ROUTER_CIPHERS", ""), "Specifies the cipher suites to use. You can choose a predefined cipher set ('modern', 'intermediate', or 'old') or specify exact cipher suites by passing a : separated list.") | |||
flag.BoolVar(&o.StrictSNI, "strict-sni", isTrue(util.Env("ROUTER_STRICT_SNI", "")), "Use strict-sni bind processing (do not use default cert).") | |||
flag.StringVar(&o.MetricsType, "metrics-type", util.Env("ROUTER_METRICS_TYPE", ""), "Specifies the type of metrics to gather. Supports 'haproxy'.") | |||
flag.BoolVar(&o.EnableUnidling, "enable-unidling", false, "Whether or not to enable support for interacting with the idling API to properly wake up idled services.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this an env too please? We'll need to add 'oc adm' support and that's easier with env.
pkg/cmd/infra/router/template.go
Outdated
if err != nil { | ||
return err | ||
} | ||
|
||
factory := o.RouterSelection.NewFactory(routeclient, projectclient.Project().Projects(), kc) | ||
factory.RouteModifierFn = o.RouteUpdate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did I go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
9137c9a
to
cf0f4ab
Compare
83f6606
to
b4f2a8a
Compare
b4f2a8a
to
26cb6d2
Compare
seems like the node proxy is now running under the sdn pod, but nobody bound the system:node-proxier role to the openshift-sdn:sdn service account... |
26cb6d2
to
07324c8
Compare
b25f591
to
c1d7389
Compare
/hold @openshift/api-review the API needs review, I plan on looking at it but haven't yet. |
|
||
// NFQueueNumber is the number of the NFQueue used to intercept unidling traffic. | ||
// It must be unique. | ||
NFQueueNumber uint16 `json:"nfQueueNumber"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are incredibly specific config values - I'm not sure a human should ever set them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, specifically, it needs to never conflict with the two Kubernetes mark bits (one is the masquerade bit, and the other is similiar), and it needs to not conflict with any other rules people set on their systems -- that's why it's configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is not something a human should have to set ever. Can you determine values for this for 99% of cases that won't conflict?
return fmt.Errorf("no idler found for service %s", service.String()) | ||
} | ||
|
||
if idler.Spec.WantIdle == false { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the API design yet, but unless this becomes a constant you should never be comparing to a boolean, just testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, agreed -- must have been tired when I wrote that :-P
@DirectXMan12: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@DirectXMan12: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I don't think this is going to make it for 3.11. There's too much unresolved in the install path, the API needs a thorough design review, and most of the reviewers have other critical path items right now. |
@smarterclayton is there no way that we can get this for 3.11? We've been waiting for it so that we can fix the issues that openshift.io has uncovered with the existing unidling (e.g. events getting eaten due to rate limiting). I had hoped that since @DirectXMan12 had this PR open so early that we would be able to get it for this release. I understand that the installation is effectively a manual process at the moment, but in 3.12 we would make an operator for it. Given that this is effectively for internal use, so that we can get some mileage on it, how much work do you need to have done for the installer in 3.11? And since the API is alpha and subject to change, how thorough does the review need to be? |
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: DirectXMan12, knobunc If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Looks like it's coming from this repo: https://github.com/openshift/service-idler, but the whole repo has only ever had two pulls. Was the rest reviewed sometime? It's pretty unusual not to have an api in https://github.com/openshift/api when we want to use and support it. I think the only exceptions have been: 1) things originated in origin that we forgot, 2) a toy API we immediately mutated/removed for clusterup. |
/hold
It looks like the main API controlling the behavior is coming in via https://github.com/openshift/service-idler . For APIs that are going to be exposed to end users and that we expect to support (and if I'm reading the API right, we would end up supporting it and at least providing some kind of migration instructions), we expect those to live in https://github.com/openshift/api . A good test is probably, "is this owned by openshift and does it have a client". The idea of the API seems reasonable. A persistent object describing the idled/unidled state of a particular set of resources. Details about the, particularly around edge case behavior and some potential for conflicting information deserve more discussion. |
Yes @knobunc has reviewed it, and we've been testing it unofficially for a while now. Particularly, @sallyom has been using it to rewrite and simplify the OpenShift Online automated idling.
The CRD approach was originally suggested by @smarterclayton. I my mind, there's two reasons to keep this as a CRD, at least for the time being.
We have had a lot of that discussion, but I'd be happy to discuss further. Both when we (@knobunc and I) designed the original system (more on that below), and when we were iterating on the new system, based on what we learned from the existing system. We know the existing system doesn't work at the scales we need for online (in addition to not handling some corner cases we'd like it to handle), and we'd like to get some mileage on this system in Online before we fully support it (i.e. tech preview for a release or two), but we can't do that until we can deploy both the idling/unidling controller (the easy part -- it's a self-contained pod with a CRD) and an updated unidling proxy with router support (the harder part, since the router and proxy are part of OpenShift proper). |
Using a CRD is not a way to skip an API review. See https://github.com/openshift/api/blob/master/servicecertsigner/v1alpha1/types.go#L82-L88 an example of a CRD that is based on an API reviewed type. openshift/api has a light footprint and openshift APIs should live there. Living in openshift/api does not mean that your code has to live in openshift/origin. |
/test launch-gcp |
As per a discussion with @deads2k, I'm going to file a PR against |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This updates idling to v1alpha2. It strips away the idling controller and API types (in favor of https://github.com/openshift/service-idler), and rewrites the proxy logic to use the
Idler
typeinstead of annotations (and similarly with
oc idle
).TODO:
oc idle
(simple toggle version)oc create idler
with previous discovery code (follow-up PR, if needed)oc adm router
support (follow-up PR)