[RFE] Instantiating LlamaStackClient on each request is expensive

**Is your feature request related to a problem? Please describe.**

`lightspeed-stack` instantiates a `LlamaStackClient` on each and every incoming request.

Construction of this, when used with `use_as_library_client: true` is a very expensive operation.

It might be better to instantiate the client _once_ and keep as a singleton for all operations.

**Additional context**
We're using a stack configuration that loads external providers.

The runtime stack implementation is recreated for every request and causes responses to be unnecessarily slow.

An alternative is to use an _external_ `llama-stack` service/server however that would require _us_ to maintain another Docker Container Image referencing the `llama-stack` libraries directly.

We have previously ran our own `llama-stack` instance however it led to (political) problems relating to _ownership_ of the `llama-stack` libraries when used in a _downstream_/product deployment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFE] Instantiating LlamaStackClient on each request is expensive #122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFE] Instantiating LlamaStackClient on each request is expensive #122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions