Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Flag Routing #4736

Open
duglin opened this issue Jul 12, 2019 · 28 comments
Open

Feature Flag Routing #4736

duglin opened this issue Jul 12, 2019 · 28 comments
Labels
area/API API objects and controllers area/networking kind/feature Well-understood/specified features, ready for coding. kind/spec Discussion of how a feature should be exposed to customers. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Issues which should be fixed (post-triage)
Milestone

Comments

@duglin
Copy link

duglin commented Jul 12, 2019

In what area(s)?

/area API
/area networking

/kind spec
/kind proposal

Describe the feature

One of the things we hear from customers is that routing between revisions based on percentage isn't really of interest to them once in production. In practice they tend to lean more towards a model where only a select group of users are expected to use the latest versions so that everyone else (who need more stability) are not impacted during the testing phase. We call that "Feature Flag Routing". Where the routers (e.g. istio) route requests to certain versions based on some metadata in the message (e.g. perhaps some http header) rather than the random % based stuff. We'd like for Knative to eventually support this as an option. It's not clear if this can be done solely under the covers or whether it might result in an API change. If an API change might be needed it might be best to at least design it out prior to v1.0 when the API is locked down.

Some of the uses for the "select group":

  • testers to test stuff prior to opening it up to everyone
  • customers who opt-in for some kind of beta program
  • customers who pay extra for access to really old versions that you may not want available to everyone
  • might even want to direct certain users to certain versions of the service based on other criteria such as their SLA. Might be for slight difference in runtime performance or even to give certain customers their own dedicated revisions to use to provide a bit of isolation (either for security concerns or fear of noisy neighbors).
@duglin duglin added the kind/feature Well-understood/specified features, ready for coding. label Jul 12, 2019
@knative-prow-robot knative-prow-robot added area/API API objects and controllers area/networking kind/spec Discussion of how a feature should be exposed to customers. labels Jul 12, 2019
@mattmoor
Copy link
Member

If an API change might be needed it might be best to at least design it out prior to v1.0 when the API is locked down.

A nit, this is only true if a breaking API change is needed. The core kube APIs (e.g. Pod) have continued to evolve long past v1 to incorporate new (non-trivial) functionality. An example of this is projection of secrets and configmaps into environment variables, which didn't land until 1.3 (IIRC), and this is far from the last change.

Do you have a reason to think that this would require a breaking API change?

route requests to certain versions based on some metadata in the message (e.g. perhaps some http header)

I am keen to add a new abstraction that can do these kinds of things as well:

  • Headers
  • Paths (e.g. /v1/foo/bar)
  • Method (e.g. GET, POST)

I've always seen this as a higher-level abstraction (atop Service) because often the things you are switching over are fairly different. e.g.

  • Different methods of an API (e.g. different paths or methods)
  • Authorization handler (e.g. no auth header)
  • An early version of a new application (your example)

My feeling is that this should target Addressable things, which includes Service and Route, but opens the possibility for this to target other abstractions implementing Addressable. I also think it itself should be Addressable so it can be composed with consumers of Addressable (e.g. Eventing).

We should be careful about folding everything into a single abstraction as Istio's VirtualService has, as it can become incredibly hard to use properly. This is (part of) why my bias is towards much more focused and clear abstractions that can be composed.


Another notable challenge this introduces is that it will require an expansion of our networking interface. Istio's API can clearly handle it, but I'm unsure of Gloo (or others I have heard are in the works).

@duglin
Copy link
Author

duglin commented Jul 13, 2019

Do you have a reason to think that this would require a breaking API change?

Off the top of my head I see one potential area... the percent field. When not present it defaults to zero, so adding a new field to specify some other routing thingy might feel like a bit of a conflict. So, we may need to modify it to say "w/o any routing data at all, default is percent=0", instead of "if no percent, default is percent=0".

I agree with you that I think this is mostly a higher-level abstraction thing, and mainly something that can be additive after v1.0 if we don't make the timeline. But, I wanted to bring it up now for 2 reasons:

  1. get a sense for interest among the community
  2. make sure there's nothing in v1.0 that might cause an API conflict/breaking change to support this.

And so far, it's just that percent thing that's worrying me - which might not be too hard to address.

But I'd like to hear if others know of spots to think about....

duglin pushed a commit to duglin/serving that referenced this issue Jul 26, 2019
Closes: knative#4736

Signed-off-by: Doug Davis <dug@us.ibm.com>
@markusthoemmes
Copy link
Contributor

I do agree with @mattmoore‘s point that this can be folded into a higher level abstraction.

My understanding of the need for percentage based traffic splitting is that we need to have a mechnism for unobstrusive rollouts. Percentage based traffic splitting allows us to do what deployment does with its rolling upgrade but also allows us to have immutable deployments (the rolling is a result of mutating the deployment).

Other means are undoubtedly super useful for various use-cases but maybe should not be part of the core types to keep them somewhat focused on what they need to be and to not further push requirements into the networking layer as Matt mentioned.

@knative-housekeeping-robot

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

@knative-prow-robot knative-prow-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2019
@knative-housekeeping-robot

Stale issues rot after 30 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.
Rotten issues close after an additional 30 days of inactivity.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle rotten

@knative-prow-robot knative-prow-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 26, 2020
@knative-housekeeping-robot

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

@knative-prow-robot
Copy link
Contributor

@knative-housekeeping-robot: Closing this issue.

In response to this:

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@duglin
Copy link
Author

duglin commented May 10, 2020

@mattmoor (or anyone) can I get this reopened? I missed the lifecycle messages.

@vagababov
Copy link
Contributor

/reopen

@knative-prow-robot
Copy link
Contributor

@vagababov: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@duglin
Copy link
Author

duglin commented May 10, 2020

@vagababov thanks!

@knative-housekeeping-robot

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

@knative-prow-robot
Copy link
Contributor

@knative-housekeeping-robot: Closing this issue.

In response to this:

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@duglin
Copy link
Author

duglin commented Jun 9, 2020

/reopen

@knative-prow-robot
Copy link
Contributor

@duglin: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-housekeeping-robot

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

@knative-prow-robot
Copy link
Contributor

@knative-housekeeping-robot: Closing this issue.

In response to this:

Rotten issues close after 30 days of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@duglin
Copy link
Author

duglin commented Jul 10, 2020

/reopen

@knative-prow-robot
Copy link
Contributor

@duglin: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aliok
Copy link
Member

aliok commented Aug 27, 2020

@mattmoor
Copy link
Member

/lifecycle frozen

This was sent to our mailing list as well. I'm a little reluctant to creep ksvc too much vs. adding new resources that compose well with ksvc (and others) in the same way we're trying to do vanity domains.

I sketched some thoughts here over the weekend, and have a PoC that does (part of) this: https://docs.google.com/document/d/1Cp_h4MIRGt2Vy-EE0Yy5L5Y0wRNaMyp6SUq0TCi6XLM/edit

@knative-prow-robot knative-prow-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 27, 2020
@evankanderson
Copy link
Member

Other than the request for the user to choose the request header, it seems like Header-based Tag Routing should be able to do this?

/triage needs-user-input

@knative-prow-robot knative-prow-robot added the triage/needs-user-input Issues which are waiting on a response from the reporter label Mar 22, 2021
@evankanderson
Copy link
Member

Assuming that header-based tag routing will work here unless someone wants to chime in with a reason why it won't.

/close

@knative-prow-robot
Copy link
Contributor

@evankanderson: Closing this issue.

In response to this:

Assuming that header-based tag routing will work here unless someone wants to chime in with a reason why it won't.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@duglin
Copy link
Author

duglin commented Jun 24, 2021

/reopen
actually, allowing other headers is kind of important since we can't mandate that the user the inject Knative header into each message. This is why we may want to route based on some other header that's already flowing, like an auth header. Also, it would be nice to route to a non-tagged revision - similar to how % based routing doesn't need tagged either. Plus, if I'm remembering correctly, with tags you're required to expose those to the internet - you can't make them private but then have the main URL public. So, if I want to only allow certain users access to a revision I wouldn't want it visible via a tag.

@knative-prow-robot
Copy link
Contributor

@duglin: Reopened this issue.

In response to this:

/reopen
actually, allowing other headers is kind of important since we can't mandate that the user the inject Knative header into each message. This is why we may want to route based on some other header that's already flowing, like an auth header. Also, it would be nice to route to a non-tagged revision - similar to how % based routing doesn't need tagged either. Plus, if I'm remembering correctly, with tags you're required to expose those to the internet - you can't make them private but then have the main URL public. So, if I want to only allow certain users access to a revision I wouldn't want it visible via a tag.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mapleeit
Copy link

mapleeit commented Mar 17, 2022

Agree with @duglin It would help us a lot if supporting route traffic by custom headers, because in our case, we can't gurantee we can inject the Knative-Serving-Tag: rev1 header to every request. And it also expose the purpose of trying to treat somebody differently.

@ReToCode ReToCode added triage/accepted Issues which should be fixed (post-triage) and removed triage/needs-user-input Issues which are waiting on a response from the reporter labels Feb 15, 2023
@mkloveyy
Copy link

Agree with @duglin, It would be helpful for us too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers area/networking kind/feature Well-understood/specified features, ready for coding. kind/spec Discussion of how a feature should be exposed to customers. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

Successfully merging a pull request may close this issue.