Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine ServiceInstances from Proxy labels instead of Pod Spec when possible #16483

Merged
merged 1 commit into from
Aug 30, 2019

Conversation

howardjohn
Copy link
Member

No description provided.

@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Aug 22, 2019
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Aug 22, 2019
@istio-testing istio-testing added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 22, 2019
@howardjohn
Copy link
Member Author

/retest

@howardjohn
Copy link
Member Author

seems this breaks multicluster

@howardjohn
Copy link
Member Author

/retest

@hzxuzhonghu
Copy link
Member

This looks like WorkloadUpdate, the cause is pod not comes up yet.

As #16476 says, there is still a potential bug for this.

@howardjohn
Copy link
Member Author

howardjohn commented Aug 23, 2019

the problem is with split listeners we need to be able to apply inbound rules, etc, immediatelyotherwise there will be a period of time when inbound traffic comes in and rules are not applied

the reason this is safe currently is because we have the readiness check to ensure inbound listeners are received, but this won't work now since container port is not required.

better ideas are welcome, as I'm not sure this will work

@hzxuzhonghu
Copy link
Member

but this won't work now since container port is not required.

I am not sure about this?

@howardjohn
Copy link
Member Author

@hzxuzhonghu see changes like #16528. We will now capture all ports in inbound iptables. If you don't have containport, you will get Passthrough.

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..

// metadata already. Because of this, we can still get most of the information we need.
// While this may not be 100% accurate, the pod information should show up within a few second, so the impact should be minimal.
// With this information we can set up RBAC, so we are not vulnerable during this time.
if len(out) == 0 && len(proxy.WorkloadLabels) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not do this as fallback but as the main way of looking up the pod labels as it’s a very costly operation to look up by labels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree now, working on changing this to detect if we have all info require in labels and use as main lookup

@istio-testing istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 26, 2019
@istio-testing istio-testing added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 27, 2019
@istio-testing istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 28, 2019
@howardjohn
Copy link
Member Author

/retest

infra flake

@howardjohn
Copy link
Member Author

Fixes #16630

@@ -405,6 +405,10 @@ global:
# have a shared root CA for this model to work.
enabled: false

# Should be set to the name of the cluster this installation will run in. This is required for sidecar injection
# to properly label proxies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name must match the secret name configured for multicluster ( or whatever we use on the other side ).

// metadata already. Because of this, we can still get most of the information we need.
// If we cannot accurately construct ServiceInstances from just the metadata, this will return an error and we can
// attempt to read the real pod.
instances, err := c.getProxyServiceInstancesFromMetadata(proxy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about doing this only of getPodByIP returns null ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That beats the purpose of the effort in some sense. It is a lot of overhead to keep looking up stuff from k8s, especially in large clusters. Even if k8s code returns null, we have "done" the work which actually increases the backlog of connections that Pilot needs to service. OTOH, quickly creating the instance and starting to service the proxy will relieve the backlog.

Copy link
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good - but I would still do the move of the logic to 'after pod was not found'.

@lambdai
Copy link
Contributor

lambdai commented Aug 28, 2019

/lgtm
I think it would address the #16630 Thanks!

@howardjohn
Copy link
Member Author

howardjohn commented Aug 28, 2019 via email

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for one major comment on pre-processing the pod ports to avoid repeated json unmarshall. Please fix that and then merge. Approving in good faith..

@@ -405,6 +405,10 @@ global:
# have a shared root CA for this model to work.
enabled: false

# Should be set to the name of the cluster this installation will run in. This is required for sidecar injection
# to properly label proxies
clusterName: ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Call it as clusterID if the meta is called id. I wonder if we can auto initialize this to some uuid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above - this must match the name used in the 'registry' config, I don't think it can be a generated UUID ( without making it hard to configure the registries). And there are plans to use the cluster name in telemetry, policy,etc.

// metadata already. Because of this, we can still get most of the information we need.
// If we cannot accurately construct ServiceInstances from just the metadata, this will return an error and we can
// attempt to read the real pod.
instances, err := c.getProxyServiceInstancesFromMetadata(proxy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That beats the purpose of the effort in some sense. It is a lot of overhead to keep looking up stuff from k8s, especially in large clusters. Even if k8s code returns null, we have "done" the work which actually increases the backlog of connections that Pilot needs to service. OTOH, quickly creating the instance and starting to service the proxy will relieve the backlog.

dummyPod := &v1.Pod{
ObjectMeta: metav1.ObjectMeta{
Namespace: proxy.ConfigNamespace,
Labels: proxy.WorkloadLabels[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think thats the case in Consul either. I have no idea why this workloadLabels was an array of maps in the first place. But make sure that the map is not split into each array entry (with one k-v per array entry)

}

// Find the Service associated with the pod.
svcLister := listerv1.NewServiceLister(c.services.informer.GetIndexer())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about the right or most efficient k8s client apis to use. Would be good to get an ack from @hzxuzhonghu on this piece alone as he does work in that area

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most efficient way currently using k8s native method. But compared to the reverse search it is more costly.

Network: c.endpointNetwork(proxy.IPAddresses[0]),
Locality: util.LocalityToString(proxy.Locality),
},
Service: svcMap[hostname],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully we are guaranteed that this map is not being changed while these instances are being constructed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, I think this could happen. I'll fix this

}

// findPortFromMetadata resolves the TargetPort of a Service Port, by reading the Pod spec.
func findPortFromMetadata(svcPort v1.ServicePort, podPorts string) (int, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly suggest to not do this metadata lookup in that double forloop above, as this function is doing a bunch of json marshalling which is going to kill perf. Instead, parse the podPorts into some useful structure, may be even our own Port object where name can be "" if the port is of type int. Then in https://github.com/istio/istio/pull/16483/files#diff-288be3f23828adceb9ebb059ce53a23eR619, you can simply do a lookup from the pre-processed array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, will move it out

@howardjohn howardjohn added the do-not-merge/hold Block automatic merging of a PR. label Aug 28, 2019
@howardjohn howardjohn removed the do-not-merge/hold Block automatic merging of a PR. label Aug 29, 2019
@howardjohn
Copy link
Member Author

/retest

@istio-testing istio-testing merged commit 2e789b9 into istio:master Aug 30, 2019
@istio-testing
Copy link
Collaborator

In response to a cherrypick label: #16483 failed to apply on top of branch "release-1.3":

Using index info to reconstruct a base tree...
M	install/kubernetes/helm/istio/files/injection-template.yaml
M	install/kubernetes/helm/istio/values.yaml
M	pilot/pkg/model/context.go
M	pilot/pkg/serviceregistry/kube/controller/controller.go
M	pilot/pkg/serviceregistry/kube/controller/controller_test.go
A	pkg/kube/inject/testdata/inject/app_probe/hello-probes-with-flag-set-in-annotation.yaml.injected
A	pkg/kube/inject/testdata/inject/app_probe/hello-probes-with-flag-unset-in-annotation.yaml.injected
A	pkg/kube/inject/testdata/inject/app_probe/hello-probes.yaml.injected
A	pkg/kube/inject/testdata/inject/app_probe/hello-readiness.yaml.injected
A	pkg/kube/inject/testdata/inject/app_probe/https-probes.yaml.injected
A	pkg/kube/inject/testdata/inject/app_probe/named_port.yaml.injected
A	pkg/kube/inject/testdata/inject/app_probe/one_container.yaml.injected
A	pkg/kube/inject/testdata/inject/app_probe/ready_only.yaml.injected
A	pkg/kube/inject/testdata/inject/app_probe/two_container.yaml.injected
A	pkg/kube/inject/testdata/inject/auth.cert-dir.yaml.injected
A	pkg/kube/inject/testdata/inject/auth.non-default-service-account.yaml.injected
A	pkg/kube/inject/testdata/inject/auth.yaml.injected
A	pkg/kube/inject/testdata/inject/cronjob.yaml.injected
A	pkg/kube/inject/testdata/inject/daemonset.yaml.injected
A	pkg/kube/inject/testdata/inject/deploymentconfig-multi.yaml.injected
A	pkg/kube/inject/testdata/inject/deploymentconfig.yaml.injected
A	pkg/kube/inject/testdata/inject/enable-core-dump.yaml.injected
A	pkg/kube/inject/testdata/inject/format-duration.yaml.injected
A	pkg/kube/inject/testdata/inject/frontend.yaml.injected
A	pkg/kube/inject/testdata/inject/hello-always.yaml.injected
A	pkg/kube/inject/testdata/inject/hello-config-map-name.yaml.injected
A	pkg/kube/inject/testdata/inject/hello-ignore.yaml.injected
A	pkg/kube/inject/testdata/inject/hello-multi.yaml.injected
A	pkg/kube/inject/testdata/inject/hello-namespace.yaml.injected
A	pkg/kube/inject/testdata/inject/hello-never.yaml.injected
A	pkg/kube/inject/testdata/inject/hello-proxy-override.yaml.injected
A	pkg/kube/inject/testdata/inject/hello-template-in-values.yaml.injected
A	pkg/kube/inject/testdata/inject/hello-tproxy.yaml.injected
A	pkg/kube/inject/testdata/inject/hello.yaml.injected
A	pkg/kube/inject/testdata/inject/job.yaml.injected
A	pkg/kube/inject/testdata/inject/kubevirtInterfaces.yaml.injected
A	pkg/kube/inject/testdata/inject/kubevirtInterfaces_list.yaml.injected
A	pkg/kube/inject/testdata/inject/list-frontend.yaml.injected
A	pkg/kube/inject/testdata/inject/list.yaml.injected
A	pkg/kube/inject/testdata/inject/multi-init.yaml.injected
A	pkg/kube/inject/testdata/inject/pod.yaml.injected
A	pkg/kube/inject/testdata/inject/replicaset.yaml.injected
A	pkg/kube/inject/testdata/inject/replicationcontroller.yaml.injected
A	pkg/kube/inject/testdata/inject/statefulset.yaml.injected
A	pkg/kube/inject/testdata/inject/status_annotations.yaml.injected
A	pkg/kube/inject/testdata/inject/status_params.yaml.injected
A	pkg/kube/inject/testdata/inject/traffic-annotations-empty-includes.yaml.injected
A	pkg/kube/inject/testdata/inject/traffic-annotations-wildcards.yaml.injected
A	pkg/kube/inject/testdata/inject/traffic-annotations.yaml.injected
A	pkg/kube/inject/testdata/inject/traffic-params-empty-includes.yaml.injected
A	pkg/kube/inject/testdata/inject/traffic-params.yaml.injected
A	pkg/kube/inject/testdata/webhook/daemonset.yaml.injected
A	pkg/kube/inject/testdata/webhook/deploymentconfig-multi.yaml.injected
A	pkg/kube/inject/testdata/webhook/deploymentconfig.yaml.injected
A	pkg/kube/inject/testdata/webhook/frontend.yaml.injected
A	pkg/kube/inject/testdata/webhook/hello-config-map-name.yaml.injected
A	pkg/kube/inject/testdata/webhook/hello-multi.yaml.injected
A	pkg/kube/inject/testdata/webhook/hello-probes.yaml.injected
A	pkg/kube/inject/testdata/webhook/job.yaml.injected
A	pkg/kube/inject/testdata/webhook/list-frontend.yaml.injected
A	pkg/kube/inject/testdata/webhook/list.yaml.injected
A	pkg/kube/inject/testdata/webhook/replicaset.yaml.injected
A	pkg/kube/inject/testdata/webhook/replicationcontroller.yaml.injected
A	pkg/kube/inject/testdata/webhook/resource_annotations.yaml.injected
A	pkg/kube/inject/testdata/webhook/statefulset.yaml.injected
A	pkg/kube/inject/testdata/webhook/status_annotations.yaml.injected
A	pkg/kube/inject/testdata/webhook/traffic-annotations-empty-includes.yaml.injected
A	pkg/kube/inject/testdata/webhook/traffic-annotations-wildcards.yaml.injected
A	pkg/kube/inject/testdata/webhook/traffic-annotations.yaml.injected
A	pkg/kube/inject/testdata/webhook/user-volume.yaml.injected
Falling back to patching base and 3-way merge...
Auto-merging pilot/pkg/serviceregistry/kube/controller/controller_test.go
Auto-merging pilot/pkg/serviceregistry/kube/controller/controller.go
CONFLICT (content): Merge conflict in pilot/pkg/serviceregistry/kube/controller/controller.go
Auto-merging pilot/pkg/model/context.go
Auto-merging pilot/pkg/kube/inject/testdata/webhook/user-volume.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/traffic-annotations.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/traffic-annotations-wildcards.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/traffic-annotations-empty-includes.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/status_annotations.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/statefulset.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/resource_annotations.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/replicationcontroller.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/replicaset.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/list.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/list-frontend.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/job.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/hello-probes.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/hello-multi.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/hello-config-map-name.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/frontend.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/deploymentconfig.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/deploymentconfig-multi.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/webhook/daemonset.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/traffic-params.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/traffic-params-empty-includes.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/traffic-annotations.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/traffic-annotations-wildcards.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/traffic-annotations-empty-includes.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/status_params.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/status_annotations.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/statefulset.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/replicationcontroller.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/replicaset.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/pod.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/multi-init.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/list.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/list-frontend.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/kubevirtInterfaces_list.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/kubevirtInterfaces.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/job.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/hello.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/hello-tproxy.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/hello-template-in-values.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/hello-proxy-override.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/hello-never.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/hello-namespace.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/hello-multi.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/hello-ignore.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/hello-config-map-name.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/hello-always.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/frontend.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/format-duration.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/enable-core-dump.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/deploymentconfig.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/deploymentconfig-multi.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/daemonset.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/cronjob.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/auth.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/auth.non-default-service-account.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/auth.cert-dir.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/app_probe/two_container.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/app_probe/ready_only.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/app_probe/one_container.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/app_probe/named_port.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/app_probe/https-probes.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/app_probe/hello-readiness.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/app_probe/hello-probes.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/app_probe/hello-probes-with-flag-unset-in-annotation.yaml.injected
Auto-merging pilot/pkg/kube/inject/testdata/inject/app_probe/hello-probes-with-flag-set-in-annotation.yaml.injected
Auto-merging install/kubernetes/helm/istio/values.yaml
Auto-merging install/kubernetes/helm/istio/files/injection-template.yaml
error: Failed to merge in the changes.
Patch failed at 0001 Determine ServiceInstances from Proxy labels

@rlenglet
Copy link
Contributor

@howardjohn Please submit a manual backport PR.

howardjohn added a commit to howardjohn/istio that referenced this pull request Aug 30, 2019
@howardjohn
Copy link
Member Author

howardjohn commented Aug 30, 2019 via email

istio-testing pushed a commit that referenced this pull request Aug 30, 2019
Port: targetPort,
ServicePort: svcPort,
Network: c.endpointNetwork(proxy.IPAddresses[0]),
Locality: util.LocalityToString(proxy.Locality),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn This is a regression, we can never get the real locality info from the node labels now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants