Kafka trigger - recover broken function instances? #93

jeffhollan · 2019-04-19T21:08:40Z

Not sure the right fix for this. Was playing with the kafka trigger again today, here's the cycle:

Created a kafka topic with a single partition
Deploy a function with KEDA. KEDA activated, the first function locked the partition
KEDA kept scaling out (which is fine for now) until I had 4 instances. Only 1 was active (the first one). Once it caught up KEDA scaled down to 1 instance.

however at this point the instance that was left remaining was one of the additional instances that never got a lock. When checking the logs for that function it was more or less dead.

info: Host.General[0]
Host lock lease acquired by instance ID '000000000000000000000000448490CC'.
fail: Host.Triggers.Kafka[0]
kafka-cp-kafka-headless:9092/bootstrap: Failed to resolve 'kafka-cp-kafka-headless:9092': Temporary failure in name resolution (after 5298ms in state CONNECT)
fail: Host.Triggers.Kafka[0]
1/1 brokers are down

I'm not sure if I really had a reliability issue, or if this was one of the ones that didn't have an available partition.

In my mind a few thoughts:

Should the Kafka trigger keep retrying to connect if it fails? I assume the runtime in general doesn't do this?
Should Kubernetes know that this function is in a dead state so it can do the CrashBackoffCycle and restart it? If so, is there an existing health probe we should be hooking up?

Realize this isn't really a KEDA issue but didn't know where else to put.

/cc @ahmedelnably @fabiocav would be interested to get your thoughts here

jeffhollan · 2019-04-19T21:15:19Z

Apologies - these were actually the instances that were running in a virtual node that weren't able to resolve the dns name. That said still interested in what we should do if a function instance gets in a bad state. This is likely the wrong repo for it though

jeffhollan closed this as completed May 3, 2019

preflightsiren pushed a commit to preflightsiren/keda that referenced this issue Nov 7, 2021

Introduce "Troubleshooting" page (kedacore#93)

dc3d058

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka trigger - recover broken function instances? #93

Kafka trigger - recover broken function instances? #93

jeffhollan commented Apr 19, 2019 •

edited

jeffhollan commented Apr 19, 2019

Kafka trigger - recover broken function instances? #93

Kafka trigger - recover broken function instances? #93

Comments

jeffhollan commented Apr 19, 2019 • edited

jeffhollan commented Apr 19, 2019

jeffhollan commented Apr 19, 2019 •

edited