Skip to content

[RFE] Instantiating LlamaStackClient on each request is expensive #122

@manstis

Description

@manstis

Is your feature request related to a problem? Please describe.

lightspeed-stack instantiates a LlamaStackClient on each and every incoming request.

Construction of this, when used with use_as_library_client: true is a very expensive operation.

It might be better to instantiate the client once and keep as a singleton for all operations.

Additional context
We're using a stack configuration that loads external providers.

The runtime stack implementation is recreated for every request and causes responses to be unnecessarily slow.

An alternative is to use an external llama-stack service/server however that would require us to maintain another Docker Container Image referencing the llama-stack libraries directly.

We have previously ran our own llama-stack instance however it led to (political) problems relating to ownership of the llama-stack libraries when used in a downstream/product deployment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions