-
Notifications
You must be signed in to change notification settings - Fork 54
Description
The bottom line is, I want to implement an Orchestration semaphore using a Durable Entity, that limits the number of concurrent calls to a certain resource provider (in a real world function app this will limit the number of concurrent LLM calls).
In the demo app I attached, I added a REST function that can be called with a POST request (e.g. http://localhost:7238/api/start_orchestration/123456), and this starts an Orchestration of type MainOrchestrator. That orchestration starts some page processing activities in parallel (but batched to a certain limit), when these are all completed, 2 orchestrations of type SubOrchestrator are started. These also start some text processing activities in parallel (but batched to a certain limit).
Each activity, i.e. the page processing activities and the text processing activities, are guarded by an orchestration semaphore (cfr. helper class GlobalLlmLimiterSemaphore, limits max concurrent activities to 100).
The problem is this: I often notice, in local dev and on Azure, that the orchestrations get stuck awaiting the completions of activities, while all activities DO complete and there are no exceptions of any kind.
It seems like the completion of activities does not trigger the orchestration function to replay at a certain point.
Can someone explain what I am doing wrong, or is this a bug?
The demo solution can be downloaded from https://www.dropbox.com/scl/fi/0l9abvmnec4r5i4bc3e3s/OrchestrationSemaphore.zip?rlkey=hsi4xq9wu0z3jr8aadyf47328&dl=0
I have also attached a screenshot from the Visual Studio Code Durable Functions extension, that shows how one SubOrchestrator completed and another did not; since one SubOrchestrator never completes, this results in the MainOrchestrator to also not complete: https://www.dropbox.com/scl/fi/wyxawmvtvehwdnsmeru5y/2026-03-19_08h42_40.png?rlkey=72q3v6xzgt8obfl221skb1wvk&dl=0
Note that 2 runs with the MainOrchestrator and 2 SubOrchestrators ran without any issues, the 3rd run got stuck. The fact that the main and sub orchestrations can run successfully, already rules out some possible causes.