Skip to content

The linkerd proxy does not work with headless services#3470

Merged
ihcsim merged 2 commits intolinkerd:masterfrom
JohannesEH:fix-3308
Oct 15, 2019
Merged

The linkerd proxy does not work with headless services#3470
ihcsim merged 2 commits intolinkerd:masterfrom
JohannesEH:fix-3308

Conversation

@JohannesEH
Copy link
Copy Markdown
Contributor

@JohannesEH JohannesEH commented Sep 24, 2019

The linkerd proxy does not work with headless services (i.e. endpoints not referencing a pod).

Changed endpoints_watcher to also return endpoints with no targetref. Changed endpoint_translator to handle addresses with no associated pod.

Fixes #3308

Ran tests in minikube verifying that the proxy handles headless services correctly both in cases with and without port-remapping.

Signed-of-by: Johannes Hansen johannesh1980@gmail.com

@JohannesEH
Copy link
Copy Markdown
Contributor Author

JohannesEH commented Sep 24, 2019

I have to admit. I'm pretty unsure if this change can have unforeseen consequences. Specifically regarding the test case I altered. Is it a normal thing that endpoints keep ip addresses around if a pod goes down? What will the linkerd proxy do in this case?

Secondly it seems "dirty" to return a podset from the function if no pods exists. Maybe the code should use another abstraction.

Also if I need to write more tests please let me know.

I have had very limited time to setup the project, code and test. So I accept if this change needs to be scrapped after review due to my lack of understanding the codebase.

@olix0r olix0r requested review from adleong and ihcsim September 24, 2019 14:32
@olix0r
Copy link
Copy Markdown
Member

olix0r commented Sep 24, 2019

Thanks @JohannesEH. We really appreciate you taking the time to submit a fix and for providing helpful context about the change!

Copy link
Copy Markdown
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JohannesEH! This change looks like the correct behavior to me. We trust the Kubernetes endpoints API to be the authoritative source of truth on what endpoints exist. So whenever an endpoint (be it a pod, or otherwise) is deleted, the Linkerd proxy will remove that IP from its load balancer.

I'll do a bit of manual testing to be sure, but this all looks good to me.

@adleong
Copy link
Copy Markdown
Member

adleong commented Sep 25, 2019

I'm getting this stack trace in the destination controller when I try this:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb0 pc=0x13feefa]

goroutine 288 [running]:
github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).toWeightedAddr(0xc000969630, 0xc000427300, 0xc, 0x50, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0004d1f40, ...)
	/linkerd-build/controller/api/destination/endpoint_translator.go:123 +0x3a
github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Add(0xc000969630, 0xc00049c5d0)
	/linkerd-build/controller/api/destination/endpoint_translator.go:50 +0x1c5
github.com/linkerd/linkerd2/controller/api/destination/watcher.(*portPublisher).subscribe(0xc0004ae340, 0x1b1d100, 0xc000969630)
	/linkerd-build/controller/api/destination/watcher/endpoints_watcher.go:501 +0x68
github.com/linkerd/linkerd2/controller/api/destination/watcher.(*servicePublisher).subscribe(0xc000422480, 0xc000000050, 0x0, 0x0, 0x1b1d100, 0xc000969630)
	/linkerd-build/controller/api/destination/watcher/endpoints_watcher.go:322 +0xda
github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).Subscribe(0xc0004064e0, 0xc0005806d2, 0x5, 0xc0005806c0, 0x11, 0xc000000050, 0x0, 0x0, 0x1b1d100, 0xc000969630, ...)
	/linkerd-build/controller/api/destination/watcher/endpoints_watcher.go:158 +0x290
github.com/linkerd/linkerd2/controller/api/destination.(*server).Get(0xc0003a8a20, 0xc0009694f0, 0x1b4a800, 0xc0005916d0, 0x0, 0x0)
	/linkerd-build/controller/api/destination/server.go:123 +0x70e
github.com/linkerd/linkerd2-proxy-api/go/destination._Destination_Get_Handler(0x16aaf00, 0xc0003a8a20, 0x1b42280, 0xc0006a4ec0, 0xc0009694a0, 0x20)
	/go/pkg/mod/github.com/linkerd/linkerd2-proxy-api@v0.1.9/go/destination/destination.pb.go:1823 +0x109
github.com/grpc-ecosystem/go-grpc-prometheus.StreamServerInterceptor(0x16aaf00, 0xc0003a8a20, 0x1b424c0, 0xc0002de3c0, 0xc0006a4d60, 0x192c7d0, 0x1b30440, 0xc000329c20)
	/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v0.0.0-20160910222444-6b7015e65d36/server.go:40 +0xe3
google.golang.org/grpc.(*Server).processStreamingRPC(0xc0000ad980, 0x1b55a20, 0xc000766480, 0xc0004aa900, 0xc0004069c0, 0x294db00, 0x0, 0x0, 0x0)
	/go/pkg/mod/google.golang.org/grpc@v1.22.0/server.go:1209 +0x462
google.golang.org/grpc.(*Server).handleStream(0xc0000ad980, 0x1b55a20, 0xc000766480, 0xc0004aa900, 0x0)
	/go/pkg/mod/google.golang.org/grpc@v1.22.0/server.go:1282 +0xd3f
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc000322710, 0xc0000ad980, 0x1b55a20, 0xc000766480, 0xc0004aa900)
	/go/pkg/mod/google.golang.org/grpc@v1.22.0/server.go:717 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/go/pkg/mod/google.golang.org/grpc@v1.22.0/server.go:715 +0xa1

@JohannesEH
Copy link
Copy Markdown
Contributor Author

I've tried getting a linkerd test environment up and running with minikube on my windows machine through wsl, but had no luck getting it to work. So I'm not sure if I can take this change any further as I would essentially be coding "in the dark".

@ihcsim
Copy link
Copy Markdown
Contributor

ihcsim commented Oct 2, 2019

Looks like the panic is caused by an Address struct built without the Pod field, as part of the new conditional added in this PR. And the panic happens inside the toWeightedAddr function.

It seems to me that we shouldn't be calling the toWeightedAddr function for endpoints (addresses) without an associated pods. @adleong does this sound right?

@JohannesEH The following diff works for me, when tested with a headless service. Can you verify if it works for you?

diff --git a/controller/api/destination/endpoint_translator.go b/controller/api/destination/endpoint_translator.go
index 9ed2087e..5e337163 100644
--- a/controller/api/destination/endpoint_translator.go
+++ b/controller/api/destination/endpoint_translator.go
@@ -47,7 +47,23 @@ func newEndpointTranslator(
 func (et *endpointTranslator) Add(set watcher.PodSet) {
 	addrs := []*pb.WeightedAddr{}
 	for _, address := range set {
-		wa, err := et.toWeightedAddr(address)
+		var (
+			wa  *pb.WeightedAddr
+			err error
+		)
+		if address.Pod != nil {
+			wa, err = et.toWeightedAddr(address)
+		} else {
+			// handling address with no associated pod, such as those
+			// belonging to headless services
+			var addr *net.TcpAddress
+			addr, err = et.toAddr(address)
+			wa = &pb.WeightedAddr{
+				Addr:   addr,
+				Weight: defaultWeight,
+			}
+		}
+
 		if err != nil {
 			et.log.Errorf("Failed to translate endpoints to weighted addr: %s", err)
 			continue
diff --git a/controller/api/destination/watcher/endpoints_watcher.go b/controller/api/destination/watcher/endpoints_watcher.go
index c3095fd1..c145fd12 100644
--- a/controller/api/destination/watcher/endpoints_watcher.go
+++ b/controller/api/destination/watcher/endpoints_watcher.go
@@ -421,7 +421,7 @@ func (pp *portPublisher) endpointsToAddresses(endpoints *corev1.Endpoints) PodSe
 				continue
 			}
 			if endpoint.TargetRef == nil {
-				id := PodID{
+				id := ServiceID{ // I think using ServiceID here makes it clearer that we aren't dealing with pods
 					Name: strings.Join([]string{
 						endpoints.ObjectMeta.Name,
 						endpoint.IP,

YAML I used for testing:

apiVersion: v1
kind: Service
metadata:
  name: some-external-svc
spec:
  clusterIP: None
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
---
apiVersion: v1
kind: Endpoints
metadata:
  name: some-external-svc
subsets:
- addresses:
  - ip: 52.86.78.202 # postman-echo.com
  ports:
  - port: 80

Using a curl container:

✗ k run curl --image=appropriate/curl --restart=Never --generator=run-pod/v1 --command -- sleep 3600 
✗ k exec -it curl -- /bin/sh 
/ # curl -4 -v "some-external-svc/get?foo1=bar1&foo2=bar2"
*   Trying 52.86.78.202...
* TCP_NODELAY set
* Connected to some-external-svc (52.86.78.202) port 80 (#0)
> GET /get?foo1=bar1&foo2=bar2 HTTP/1.1
> Host: some-external-svc
> User-Agent: curl/7.59.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/json; charset=utf-8
< date: Wed, 02 Oct 2019 20:55:58 GMT
< etag: W/"127-b7yb7plbgUmi0PtqWNltq5LT/yI"
< server: nginx
< set-cookie: sails.sid=s%3AWhDZkwCRgiSzOwXrPCyiN1zWep-ORrIM.osc4%2BVy%2Fe0XNXhmsSWGSCTnTwoJ1BtI58DVbUrU4nQg; Path=/; HttpOnly
< vary: Accept-Encoding
< content-length: 295
<
* Connection #0 to host some-external-svc left intact
{"args":{"foo1":"bar1","foo2":"bar2"},"headers":{"x-forwarded-proto":"https","host":"some-external-svc","accept":"*/*","l5d-dst-canonical":"some-external-svc.temp.svc.cluster.local:80","user-agent":"curl/7.59.0","x-forwarded-port":"80"},"url":"https://some-external-svc/get?foo1=bar1&foo2=bar2"}/

…s not referencing a pod).

Changed endpoints_watcher to also return endpoints with no targetref.

Fixes #3308

Signed-off-by: Johannes Hansen <johannesh1980@gmail.com>
@JohannesEH
Copy link
Copy Markdown
Contributor Author

JohannesEH commented Oct 3, 2019

@ihcsim thanks!

I finally got the test environment working on my windows machine. Basically I had some problems because I fetched the source on windows leading to some bash scripts having CRLF line endings instead of LF (Maybe I should add a note contributing.md to clarify the gotchas on windows). Also, the hyper-v minikube setup script is running kubernetes version 1.8.0 and should be updated to 1.12+. Finally the mkube script is looking for minikube in "Program Files (x86)" instead of "Program Files" which was the minikube install folder on my machine. With all that out of the way I was able to build and run the proxy.

I applied your diff and ran some tests. It works in the simple case with port & targetPort being the same, but not with port remapping (i.e. port != targetPort). I'm not sure how (or if) this should be handled by the linkerd proxy? Maybe it's handled by kube_proxy?

@JohannesEH
Copy link
Copy Markdown
Contributor Author

JohannesEH commented Oct 3, 2019

@ihcsim I also have some problems getting port remapping on headless services to work without linkerd so I'm not sure the issues lies with the proxy.

According to this https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-mapping-external-services (scenario 3) it should work.

It might be my minikube that have some problem.

Sorry, got it to work now. With and without the linkerd proxy. I'll merge the changes.

Signed-off-by: Johannes Hansen <johannesh1980@gmail.com>
@ihcsim
Copy link
Copy Markdown
Contributor

ihcsim commented Oct 3, 2019

@JohannesEH Have you tried Docker Desktop for Windows? I don't have much experience developing on Windows, but wonder if you will have better luck with Docker Desktop.

@JohannesEH
Copy link
Copy Markdown
Contributor Author

@ihcsim yup I'm using that. However all bash build scripts needs something like wsl to execute. But it wasn't that bad once I got I working, it just took some time. But honestly my plan is to redo my work machine and setup dual boot with windows and some Linux distro. I would much rather do k8s/docker/go development on Linux.

@JohannesEH JohannesEH requested a review from adleong October 4, 2019 07:12
@JohannesEH
Copy link
Copy Markdown
Contributor Author

@adleong or @ihcsim Will you review/merge this after the 2.6 release or is there something more I can do to move this along?

@ihcsim
Copy link
Copy Markdown
Contributor

ihcsim commented Oct 9, 2019

@JohannesEH Yeah, this PR will go into 2.7. btw, I haven't had a chance to look into the port != targetPort case you mentioned. Have you had a chance to investigate further.

@JohannesEH
Copy link
Copy Markdown
Contributor Author

@ihcsim Cool. :) Regarding the error I reported with port remapping I wasn't able to repro the problem in my following tests. So I wrote of the initial test result as a fluke.

Copy link
Copy Markdown
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐ 🚫 💇‍♂

This looks great to me! Thanks!

Copy link
Copy Markdown
Contributor

@ihcsim ihcsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohannesEH LGTM! Thanks again for the PR 👍!

If you haven't done so, please fill up this form so that we can send you some awesome Linkerd swags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't connect to headless service

4 participants