-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Is your feature request related to a problem? Please describe.
lightspeed-stack instantiates a LlamaStackClient on each and every incoming request.
Construction of this, when used with use_as_library_client: true is a very expensive operation.
It might be better to instantiate the client once and keep as a singleton for all operations.
Additional context
We're using a stack configuration that loads external providers.
The runtime stack implementation is recreated for every request and causes responses to be unnecessarily slow.
An alternative is to use an external llama-stack service/server however that would require us to maintain another Docker Container Image referencing the llama-stack libraries directly.
We have previously ran our own llama-stack instance however it led to (political) problems relating to ownership of the llama-stack libraries when used in a downstream/product deployment.