Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

istioctl pod troubleshooting commands should be able to determine associated control plane #22274

Open
Tracked by #34682
esnible opened this issue Mar 18, 2020 · 22 comments
Labels
area/user experience lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Milestone

Comments

@esnible
Copy link
Contributor

esnible commented Mar 18, 2020

In a dual control plane scenario, the istioctl troubleshooting commands should be able to identify the control plane without a new --revision <rev> argument.

The Istio sidecar injector, istio-sidecar-injector<-revision>, SHOULD record the revision of the injector within the pod metadata. (Currently it sets MESH_CONFIG to contain the name of the Istiod service, but does set the owning control plane revision). I recommend it add a "istio.io/rev" label.

Currently istioctl uses kubectl exec to find ALL Pilot pods. istioctl COULD use metadata added by the injector to distinguish between Pilot pods in the same revision as the sidecar. If we go this route, the Pilot deployment should include "istio.io/rev" (it currently doesn't).

Alternately, istioctl COULD make an additional API call to get services, to find the istiod<-revision> service with "istio.io/rev" label, and then use its selector to find the Istiod pods.

My preference is for the former, which has one less API call, but I could support either approach.

cc @howardjohn @ostromart

@esnible esnible added area/user experience area/environments/operator Issues related to Operator or installation labels Mar 18, 2020
@esnible esnible changed the title istioctl pod troubleshooting commands should be able to determine control plane istioctl pod troubleshooting commands should be able to determine associated control plane Mar 18, 2020
@howardjohn
Copy link
Member

Everything in the pilot component (deployment, configmap,etc) should get the istio.io/rev label if revision is set. are you not seeing this?

@esnible
Copy link
Contributor Author

esnible commented Mar 18, 2020

@howardjohn

Using master, I did one istioctl manifest apply and another using revision: canary.

  • The Canary istiod pod has label "version: canary" NOT istio.io/rev
  • Non-canary istiod pod has no label "version".
  • The Canary svc has label "istio.io/rev: canary"
  • Non-canary svc has no label "istio.io/rev"
  • The Canary istiod-svc has label "istio.io/rev: canary"
  • Non-canary istiod-svc has label no label "istio.io/rev"
  • The Canary mutatingwebhookconfigurations has label "istio.io/rev: canary"
  • Non-canary mutatingwebhookconfigurations has no label "istio.io/rev"

@howardjohn
Copy link
Member

howardjohn commented Mar 18, 2020 via email

@esnible
Copy link
Contributor Author

esnible commented Mar 18, 2020

@howardjohn The configmap has it.

Are we married to the idea that if revision is "" the istio.io/rev is not needed? I recommend we set istio.io/rev to something, perhaps "". I can do kubectl -n istio-system get cm --selector istio.io/rev=canary and it works. If we set istio.io/rev to "" I can do --selector istio.io/rev==, but there is no way to --selector a missing label.

@howardjohn
Copy link
Member

howardjohn commented Mar 18, 2020 via email

@esnible
Copy link
Contributor Author

esnible commented Mar 26, 2020

@sdake @ostromart

We need a design for how the Revision is recorded in the istio[-<revision>] and istio-sidecar-injector[-<revision>] ConfigMaps.

The manifest commands MUST add the istio.io/rev label tag when creating both ConfigMaps.

inject.IntoObject() MUST label pods with istio.io/rev so that istioctl troubleshooting commands can associate pods with their control plane. The logic is simple, just metadata.Labels[model.RevisionLabel] = <something>.

That method doesn't receive the metadata of the ConfigMap, so we need to store the Revision in the configuration data as well.

The obvious place to put it is in the MeshConfig. To do that, we need to modify api/mesh/v1alpha1/proxy.proto. We could also put this information into the .data.values or .data.config of the injector.

@esnible
Copy link
Contributor Author

esnible commented Apr 28, 2020

Moving to 1.7. 1.6 has support for explicit --revision <rev>. In some cases for 1.6.0 --revision will be optional because istioctl will consult all control planes. The work to test that --revision is entirely optional will wait for 1.7.

@esnible
Copy link
Contributor Author

esnible commented May 15, 2020

If argument is specified, get the pod metadata to determine the control plane and its troubleshooting API endpoint.
Currently that endpoint is an arg to the sidecar, not an annotation or label! [esnible]
We need to reimplement for the troubleshooting API and/or new XDS anyway

@esnible
Copy link
Contributor Author

esnible commented May 15, 2020

We need to be able to find the out-of-cluster control plane, instead of just assuming all control planes are in istio-system istiod pods.

Currently the control plane address is passed to the proxy as an argument, that isn’t good enough.

@esnible
Copy link
Contributor Author

esnible commented Jun 23, 2020

The initial plan was based on the idea that istioctl could look at the istio.io/rev label and then find matching Istiod in istio-system.

With Central Istiod this will not be possible. We need #24394 to find a pod's control plane. That item has not been triaged yet.

@esnible
Copy link
Contributor Author

esnible commented Jun 23, 2020

@costinm @howardjohn I added an area/networking to this item because under Central Istiod it is may not be sufficient that istioctl looks for breadcrumbs left by the injector.

A command like istioctl proxy-status <pod> need to make an XDS request on the same control plane (or shard?) that the pod is using. Please either triage #24394 or suggest an alternative.

@therealmitchconnors
Copy link
Contributor

To clarify, we are talking about identifying which control plane a sidecar is using, including which revision, or perhaps the address of a Central Istiod the sidecar is connected to. We are not talking about identifying which particular instance of Istiod we are connecting to inside the control plane.

@esnible
Copy link
Contributor Author

esnible commented Jul 6, 2020

Without #24394 there is no way to determine control plane.

The injector adds annotations, but these annotation don't help find XDS endpoint under Central Istiod.
The injector may (or may not) add pass the XDS endpoint to the pod, but not in a declarative way. Even if it did, without an unsharded view provided by XDS there is no way to connect to it and be sure it gives correct information about the pod.

@esnible
Copy link
Contributor Author

esnible commented Jul 12, 2020

I don't think this will happen for 1.7.

Any objection to targeting 1.8 for this work?

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Oct 11, 2020
@sdake sdake added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Oct 11, 2020
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Oct 11, 2020
@esnible esnible modified the milestones: 1.8, 1.9 Nov 17, 2020
@esnible
Copy link
Contributor Author

esnible commented Dec 29, 2020

Testing against 1.8.0

  • injected pods are labeled with istio.io/rev: default.
  • injected pods receive spec.containers[*].env.PROXY_CONFIG JSON that includes "discoveryAddress":"<hostname>:15012" or has no discoveryAddress.

@howardjohn Would it be reasonable for the troubleshooting commands to default the --xds-address to the value scraped from the pod's env.PROXY_CONFIG if it exists? And if there is nothing, default to port-forwarding to an Istiod pod co-located in the cluster?

@howardjohn
Copy link
Member

It is impossible to accurately derive the discovery address from a pod. If we are okay with potential inaccuracies that could work.

@howardjohn
Copy link
Member

although it does seem quite dependant on implementation details that are very likely to change in the future

@howardjohn
Copy link
Member

Which command even needs this? Is it just ps POD?

@esnible
Copy link
Contributor Author

esnible commented Jan 4, 2021

Which command even needs this? Is it just ps POD?

@howardjohn

Currently it is experimental ps and experimental pc. The commands that use the debug interface should all get this: authz, authn, describe, wait.

My goal is to avoid the user, in a external control plane config, supply the --xds-address every time. You are correct, pulling it out of the env.PROXY_CONFIG is a detail that will change in the future. As the injector supplies that, my hope is that before it changes we can come up with an approved way to do it. In the past I have suggested it would be a good annotation for the injector to add to every pod.

@howardjohn
Copy link
Member

In the past I have suggested it would be a good annotation for the injector to add to every pod.

Example of where this goes wrong: user builds a custom docker image with PROXY_CONFIG preconfigured. No amount of introspection in the pod spec will solve this

experimental pc

Why does this need istiod access? Isn't it just reading envoy config dump?

experimental ps

If I am right about pc, this is the only command we need this for, and its used to diff between envoy and istiod config. IMO this is not a very useful command. I think a lot of istioctl commands were added because we could add them, or they were useful long ago, and we haven't re-evaluated. if this is really the only command we are adding all of this complexity for, it would be much better, in my opinion, to just remove the command.

If the ps command actually does show a diff, its a clear bug in Istiod, which is generally pretty rare these days, andthe correct debugging procedure is to run istioctl bug-report which already captures this info.

@esnible
Copy link
Contributor Author

esnible commented Jan 5, 2021

@howardjohn You are correct, I should not have said pc.

Ignoring the mechanism, do you have an objection to pods having some kind of breadcrumb trail so that tools can find their control plane? Am I asking for the wrong kind of mechanism, or is any mechanism going to be a problem?

@howardjohn
Copy link
Member

It depends if it needs to be accurate, or you want a best effort. A best effort trail isn't a problem, to get the real value is much harder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user experience lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Projects
Status: P1
Development

No branches or pull requests

7 participants