Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic pod labels: Spinnaker pod quarantine #17332

Closed
mandarjog opened this issue Sep 24, 2019 · 20 comments
Closed

Dynamic pod labels: Spinnaker pod quarantine #17332

mandarjog opened this issue Sep 24, 2019 · 20 comments
Labels
area/extensions and telemetry area/networking lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Milestone

Comments

@mandarjog
Copy link
Contributor

Spinnaker uses dynamic pod labels to quarantine pods out of a service.
Removing a specific label directly from a pod, removes the endpoint from the service.

This is an example of dynamic pod labels.

Recently we have made several decisions regarding static nature of pod labels.

  1. Pod labels are placed in nodemetada and deemed constants for the life of the proxy process.
  2. Pilot uses this node metadata to decide ServiceInstances.

a. Investigate the mode of failure for spinnaker like behaviour.
b. Decide if we want to support this mode.

@mandarjog
Copy link
Contributor Author

@howardjohn
Copy link
Member

@hzxuzhonghu

@kyessenov
Copy link
Contributor

Is this on k8s?

@mandarjog
Copy link
Contributor Author

yes, this is used with k8s and other platforms.

@kyessenov
Copy link
Contributor

on k8s, this is also breaking downwards API.

@howardjohn
Copy link
Member

howardjohn commented Sep 24, 2019 via email

@kyessenov
Copy link
Contributor

kyessenov commented Sep 24, 2019 via email

@hzxuzhonghu
Copy link
Member

We can use downward api volume mount the labels to sidecar, and watch the file changes, restart envoy when it changes. This is same as And when we change labels in deployment, the pod will recreate too.

I am not sure if it is acceptable in this case.

@rshriram
Copy link
Member

I think that is fine. And given that its not a common thing to do, its okay to do the envoy restart thing, given that the other 90% of users would benefit greatly from higher decoupling from k8s, lesser reliance on k8s api, as well as the reduced impact of eventual consistency. It seems like a worthwhile tradeoff for usability. What do you guys think?

@rshriram
Copy link
Member

Ofcourse, if it turns out that 50% of the planet is using Spinnaker in production and that they all rely a LOT on this dynamic label thing, then we have to rethink. May be one option is to have some flag in Pilot that allows these people to choose the old style mode of always going to k8s for the pod labels.

@douglas-reid
Copy link
Contributor

Having the labels be considered consts for the lifetime of the proxy was a known compromise to support the simple use case without adding a whole bunch of complexity.

What is the Envoy story around support for dynamic metadata? I know that we could add a filter that could run for the extensions use case, but that won't help Pilot.

@kyessenov
Copy link
Contributor

Envoy supports dynamic metadata on listeners, routes, clusters, endpoints, but not on singleton resources such as bootstrap.

We probably want to implement some sort of MetadataDiscoveryService that can subscribe to metadata and update node-level metadata with it.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Dec 24, 2019
@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Jan 8, 2020
@mandarjog
Copy link
Contributor Author

@rshriram i don’t think we can restart envoy, because quarantine is the main use case and we typically want to inspect the proxy and app after quarantine and not restart it.
Perhaps we can directly support quarantine instead of this particular method.

@mandarjog mandarjog reopened this Jan 8, 2020
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jan 8, 2020
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 7, 2020
@mandarjog
Copy link
Contributor Author

I think this is still an issue.

@mandarjog mandarjog reopened this Aug 24, 2020
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Aug 24, 2020
@mandarjog mandarjog removed the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Aug 24, 2020
@mandarjog mandarjog added this to the Backlog milestone Aug 24, 2020
@howardjohn howardjohn added this to P1 in Prioritization Sep 4, 2020
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Nov 23, 2020
@mandarjog
Copy link
Contributor Author

Not stale

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Nov 23, 2020
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Feb 21, 2021
Prioritization automation moved this from P1 to Done Mar 8, 2021
@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Mar 8, 2021
@kyessenov kyessenov reopened this Nov 18, 2021
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Nov 18, 2021
@kyessenov
Copy link
Contributor

Not stale.

@mandarjog mandarjog added lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed and removed lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. labels Nov 18, 2021
@hzxuzhonghu
Copy link
Member

We probably want to implement some sort of MetadataDiscoveryService that can subscribe to metadata and update node-level metadata with it.

As we have xds proxy now, i think this is easier now. I can dig into it.

@mandarjog
Copy link
Contributor Author

I think we have multiple use cases. The quarantine main use case is to remove the it from receiving traffic. We want the outbound config to be unaffected.
Inbound policies should apply just like before if the pod happens to receive traffic. Perhaps even start drain.

we want the outbound traffic to work as normal if the pod needs to reach outside.

@kyessenov
Copy link
Contributor

We don't have restart envoy when labels change. That argues for not having labels in the bootstrap to begin with which is a radical change from today. An interesting case is that pods can re-assign their canonical services as well.

@howardjohn
Copy link
Member

dynamic change was implemented a while back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/extensions and telemetry area/networking lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Projects
Development

No branches or pull requests

7 participants