Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka trigger - recover broken function instances? #93

Closed
jeffhollan opened this issue Apr 19, 2019 · 1 comment
Closed

Kafka trigger - recover broken function instances? #93

jeffhollan opened this issue Apr 19, 2019 · 1 comment

Comments

@jeffhollan
Copy link
Member

jeffhollan commented Apr 19, 2019

Not sure the right fix for this. Was playing with the kafka trigger again today, here's the cycle:

  1. Created a kafka topic with a single partition
  2. Deploy a function with KEDA. KEDA activated, the first function locked the partition
  3. KEDA kept scaling out (which is fine for now) until I had 4 instances. Only 1 was active (the first one). Once it caught up KEDA scaled down to 1 instance.

however at this point the instance that was left remaining was one of the additional instances that never got a lock. When checking the logs for that function it was more or less dead.

info: Host.General[0]
Host lock lease acquired by instance ID '000000000000000000000000448490CC'.
fail: Host.Triggers.Kafka[0]
kafka-cp-kafka-headless:9092/bootstrap: Failed to resolve 'kafka-cp-kafka-headless:9092': Temporary failure in name resolution (after 5298ms in state CONNECT)
fail: Host.Triggers.Kafka[0]
1/1 brokers are down

I'm not sure if I really had a reliability issue, or if this was one of the ones that didn't have an available partition.

In my mind a few thoughts:

  1. Should the Kafka trigger keep retrying to connect if it fails? I assume the runtime in general doesn't do this?
  2. Should Kubernetes know that this function is in a dead state so it can do the CrashBackoffCycle and restart it? If so, is there an existing health probe we should be hooking up?

Realize this isn't really a KEDA issue but didn't know where else to put.

/cc @ahmedelnably @fabiocav would be interested to get your thoughts here

@jeffhollan
Copy link
Member Author

Apologies - these were actually the instances that were running in a virtual node that weren't able to resolve the dns name. That said still interested in what we should do if a function instance gets in a bad state. This is likely the wrong repo for it though

preflightsiren pushed a commit to preflightsiren/keda that referenced this issue Nov 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant