Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Broker does not become Ready, 502 #2107

Closed
LeonardAukea opened this issue Apr 20, 2022 · 16 comments · Fixed by #2112
Closed

Simple Broker does not become Ready, 502 #2107

LeonardAukea opened this issue Apr 20, 2022 · 16 comments · Fixed by #2112
Assignees
Labels
area/broker Kafka Broker related issues area/channel Kafka Channel related issues area/control-plane area/sink KafkaSink related issues kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)

Comments

@LeonardAukea
Copy link

LeonardAukea commented Apr 20, 2022

Hi all, we have a wierd issue with a simple broker not becoming ready:

kn broker list
NAME                             URL   AGE   CONDITIONS   READY   REASON
onboarding-ci-kn-kafka-cluster         13m   4 OK / 7     False   ProbeStatus : status: NotReady

Looking at the logs from kafka-controller we see that the Probe fails due to bad gateway 502:

kafka-controller-7bc844bb6b-x4frd controller {"level":"debug","ts":"2022-04-20T11:19:23.200Z","logger":"kafka-broker-controller","caller":"broker/broker.go:223","msg":"Updated dispatcher pod annotation","knative.dev/pod":"kafka-controller-7bc844bb6b-x4frd","knative.dev/controller":"knative.dev.eventing-kafka-broker.control-plane.pkg.reconciler.broker.Reconciler","knative.dev/kind":"eventing.knative.dev.Broker","knative.dev/traceid":"eae1957d-361e-4364-a22b-afa2af7241a2","knative.dev/key":"knative-eventing/onboarding-ci-kn-kafka-cluster","action":"reconcile","uuid":"88c7538e-5722-4d83-88ea-bdf01191af7d"}
kafka-controller-7bc844bb6b-x4frd controller {"level":"info","ts":"2022-04-20T11:19:23.200Z","logger":"kafka-broker-controller","caller":"controller/controller.go:543","msg":"Reconcile succeeded","knative.dev/pod":"kafka-controller-7bc844bb6b-x4frd","knative.dev/controller":"knative.dev.eventing-kafka-broker.control-plane.pkg.reconciler.broker.Reconciler","knative.dev/kind":"eventing.knative.dev.Broker","knative.dev/traceid":"eae1957d-361e-4364-a22b-afa2af7241a2","knative.dev/key":"knative-eventing/onboarding-ci-kn-kafka-cluster","duration":0.337743497}
kafka-controller-7bc844bb6b-x4frd controller {"level":"debug","ts":"2022-04-20T11:19:23.200Z","logger":"kafka-broker-controller","caller":"prober/prober.go:63","msg":"Sending probe request","knative.dev/pod":"kafka-controller-7bc844bb6b-x4frd","scope":"prober","pod.metadata.name":"kafka-broker-receiver-7549f88579-jcmk6","address":"http://100.101.218.144:8080/knative-eventing/onboarding-ci-kn-kafka-cluster"}
kafka-controller-7bc844bb6b-x4frd controller {"level":"info","ts":"2022-04-20T11:19:23.201Z","logger":"kafka-broker-controller","caller":"prober/prober.go:86","msg":"Resource not ready","knative.dev/pod":"kafka-controller-7bc844bb6b-x4frd","scope":"prober","pod.metadata.name":"kafka-broker-receiver-7549f88579-jcmk6","address":"http://100.101.218.144:8080/knative-eventing/onboarding-ci-kn-kafka-cluster","statusCode":502}

So the IP is for kafka-broker-receiver pod. To be honest we have no clue what might be wrong here. Also seems wierd to probe the pod directly instead of a service

Expected behavior
The broker to become ready

To Reproduce
Steps to reproduce the behavior.

Knative release version
1.3.0
Additional context
Add any other context about the problem here such as proposed priority

@LeonardAukea LeonardAukea added the kind/bug Categorizes issue or PR as related to a bug. label Apr 20, 2022
@LeonardAukea
Copy link
Author

This started working when disabling istio-injection. Although, it seems that knative eventing now is incompatible with istio as you will not be allowed to connect via the pod ip directly.

https://knative.slack.com/archives/C02C56QF282/p1650453957212779

@LeonardAukea
Copy link
Author

@sel-vcc

@pierDipi
Copy link
Member

pierDipi commented Apr 20, 2022

Hi @LeonardAukea,
probing pods directly gives the user the benefit of a consistent ready condition, in particular, if you have 3 replicas of the receiver, these replicas can know about a new broker being added to the system at a different time due to different network latencies, etc., so by probing directly pods we know when all of them are ready, so we can propagate that info to the top-level Broker ready condition (which is 1 comparated to N receiver replicas).

The probing happens from the kafka-controller to the kafka-broker-receiver (or kafka-sink-receiver) pods, isn't possible to allow this traffic or perhaps adding kafka-controller to the istio's mesh?

@sel-vcc
Copy link

sel-vcc commented Apr 20, 2022

Hi @pierDipi,
In this case both kafka-controller and kafka-broker-receiver pods were part of the istio mesh (injected with istio-proxy sidecar containers). The issue is that it is not possible to connect to a Pod IP address because there is no VirtualService to define the route.

I have confirmed this behaviour with istio's sleep and httpbin samples deployed to a namespace with istio-injection enabled:-

  • Curl the httpbin service domain name ✅
$ kubectl -n istio-test exec -it svc/sleep -c sleep -- curl -sS -D /dev/stderr -o /dev/null http://httpbin:8000/status/200
HTTP/1.1 200 OK
server: envoy
date: Wed, 20 Apr 2022 12:27:28 GMT
content-type: text/html; charset=utf-8
access-control-allow-origin: *
access-control-allow-credentials: true
content-length: 0
x-envoy-upstream-service-time: 36
  • Curl a httpbin Pod by its IP address ❌
$ kubectl -n istio-test exec -it svc/sleep -c sleep -- curl -sS -D /dev/stderr -o /dev/null http://100.106.34.197:8000/status/200
HTTP/1.1 502 Bad Gateway
date: Wed, 20 Apr 2022 12:27:09 GMT
server: envoy
content-length: 0

@pierDipi
Copy link
Member

pierDipi commented Apr 21, 2022

I've created a patch in #2112, after CI jobs run and they are green, is anyone willing to test the patch with Istio and your setup (I will give you custom manifests unless you want to build the project from source code)?

@pierDipi
Copy link
Member

/cc @LeonardAukea @sel-vcc @markhulia

@sel-vcc
Copy link

sel-vcc commented Apr 21, 2022

I've created a patch in #2112, after CI jobs run and they are green, is anyone willing to test the patch with Istio and your setup (I will give you custom manifests unless you want to build the project from source code)?

Thanks @pierDipi, we can definitely test the patch.

I had a quick look through the PR and if I have understood correctly the probing is still based on the Pod IP addresses? We can certainly test the fix, but I'm fairly sure that we cannot connect to those IPs from within Istio. The reason behind this is that the Envoy config provided by Istio is based on the k8s service DNS address, which Envoy can resolve to an IP and match against an incoming request's authority. However, Envoy does not know about the relationship between the k8s Service and the set of Pods that back it, so Envoy has no upstream config for those Pod IPs and returns a 502 response.

@pierDipi
Copy link
Member

Here's a gist with patched artifacts https://gist.github.com/pierDipi/b584b0a9167dfeeffd0f934847c1dffa (you have to scroll a bit to find all the files you might need, probably it's only eventing-kafka-controller.yaml, eventing-kafka-broker.yaml)

@pierDipi pierDipi added area/control-plane triage/accepted Issues which should be fixed (post-triage) area/broker Kafka Broker related issues area/channel Kafka Channel related issues area/sink KafkaSink related issues labels Apr 21, 2022
@LeonardAukea
Copy link
Author

Here's a gist with patched artifacts https://gist.github.com/pierDipi/b584b0a9167dfeeffd0f934847c1dffa (you have to scroll a bit to find all the files you might need, probably it's only eventing-kafka-controller.yaml, eventing-kafka-broker.yaml)

OK, so I guess it will work since service is exposed via Kubernetes DNS, but I guess it's advised to use VirtualService for kafka-broker-ingress.
@sel-vcc

@sel-vcc
Copy link

sel-vcc commented Apr 21, 2022

OK, so I guess it will work since service is exposed via Kubernetes DNS, but I guess it's advised to use VirtualService for kafka-broker-ingress.

Yes, it should work ok with a k8s Service. As I understand it knative-eventing no-longer has a dependency on Istio, so there is no VirtualService for kafka-broker-ingress.

@LeonardAukea
Copy link
Author

LeonardAukea commented Apr 22, 2022

I think we should go ahead and test this. Moreover, it makes sense for us to have knative-eventing within istio mesh, as the issues will kind of propagate to knative serving, assuming these kservices are sinks etc. and all our knative-serving resources are within istio-mesh. wdyt @sel-vcc @markhulia ?

Also, thanks for the quick fix @pierDipi . We will report back to you to verify that things work as expected

@pierDipi pierDipi self-assigned this Apr 26, 2022
@matzew
Copy link
Contributor

matzew commented Apr 29, 2022

Thanks for filing the issue and willing to test the patch!

@markhulia
Copy link

Hi @matzew ! Sorry for the late reply. The patch worked for us, thanks!

@pierDipi
Copy link
Member

pierDipi commented May 4, 2022

Thank you!

@matzew
Copy link
Contributor

matzew commented May 5, 2022

@markhulia Thanks for testing the patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker Kafka Broker related issues area/channel Kafka Channel related issues area/control-plane area/sink KafkaSink related issues kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants