With many Azure OpenAI Endpoints available to Azure customers, with each having a limited number of Tokens per Minute (and possibly other vendors in the future as well), being limited to one endpoint at a time in SK can lead to an exhaustion of available token quotas.
The idea is to extend the KernelBuilder or Kernel with an attribute named "EnableLoadBalancing". In the Kernel GetRequiredService currently only the last Service is returned. This can easily be modified to return a random service, hence adding load balancing when multiple are registered in the Kernel.
I would take on the development myself
With many Azure OpenAI Endpoints available to Azure customers, with each having a limited number of Tokens per Minute (and possibly other vendors in the future as well), being limited to one endpoint at a time in SK can lead to an exhaustion of available token quotas.
The idea is to extend the KernelBuilder or Kernel with an attribute named "EnableLoadBalancing". In the Kernel GetRequiredService currently only the last Service is returned. This can easily be modified to return a random service, hence adding load balancing when multiple are registered in the Kernel.
I would take on the development myself