You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure the right fix for this. Was playing with the kafka trigger again today, here's the cycle:
Created a kafka topic with a single partition
Deploy a function with KEDA. KEDA activated, the first function locked the partition
KEDA kept scaling out (which is fine for now) until I had 4 instances. Only 1 was active (the first one). Once it caught up KEDA scaled down to 1 instance.
however at this point the instance that was left remaining was one of the additional instances that never got a lock. When checking the logs for that function it was more or less dead.
info: Host.General[0]
Host lock lease acquired by instance ID '000000000000000000000000448490CC'.
fail: Host.Triggers.Kafka[0]
kafka-cp-kafka-headless:9092/bootstrap: Failed to resolve 'kafka-cp-kafka-headless:9092': Temporary failure in name resolution (after 5298ms in state CONNECT)
fail: Host.Triggers.Kafka[0]
1/1 brokers are down
I'm not sure if I really had a reliability issue, or if this was one of the ones that didn't have an available partition.
In my mind a few thoughts:
Should the Kafka trigger keep retrying to connect if it fails? I assume the runtime in general doesn't do this?
Should Kubernetes know that this function is in a dead state so it can do the CrashBackoffCycle and restart it? If so, is there an existing health probe we should be hooking up?
Realize this isn't really a KEDA issue but didn't know where else to put.
Apologies - these were actually the instances that were running in a virtual node that weren't able to resolve the dns name. That said still interested in what we should do if a function instance gets in a bad state. This is likely the wrong repo for it though
Not sure the right fix for this. Was playing with the kafka trigger again today, here's the cycle:
however at this point the instance that was left remaining was one of the additional instances that never got a lock. When checking the logs for that function it was more or less dead.
I'm not sure if I really had a reliability issue, or if this was one of the ones that didn't have an available partition.
In my mind a few thoughts:
Realize this isn't really a KEDA issue but didn't know where else to put.
/cc @ahmedelnably @fabiocav would be interested to get your thoughts here
The text was updated successfully, but these errors were encountered: