Native Litert-lm integration with local request queuing and KV cache management

### Is your feature request related to a specific problem?
Yes. When running multi-agent systems locally using `Litert-lm`, concurrent agents compete simultaneously for limited local inference resources. Currently, the standard workaround is to stand up an external serving layer (like the `lit serve` method) and interact with it by reusing the standard `Gemini` or `LiteLlm` classes. This adds unnecessary overhead and removes fine-grained control over local resources.

### Describe the Solution You'd Like
A native `LitertModel` integration that extends the `BaseLLM` class specifically optimized for local orchestration. The two critical requirements are:
1. **Built-in Request Queuing:** A mechanism to queue requests at the model instantiation level so multiple local agents don't blast the local inference engine simultaneously and cause resource starvation.
2. **KV Cache Management:** Exposed controls to manage the KV cache specifically for instruction-based agents to optimize context switching and memory usage.

### Impact on your work
This is critical for building robust, offline multi-agent setups. It allows developers to run complex agent hierarchies locally without hitting OOM errors or having to maintain a separate, heavy serving infrastructure alongside ADK.

### Willingness to contribute
Yes. I already have a working implementation utilizing a custom `BaseLLM` subclass to handle the queueing and cache management, and I would be happy to submit a PR.

---
## 🟡 Recommended Information

**### Describe Alternatives You've Considered**
1. Using the `lit serve` method to create a local endpoint and pointing a standard ADK cloud model class at it. This works but adds unnecessary latency, architectural complexity, and abstracts away direct cache control.
2. Relying on the standard `LiteLlm` wrapper, which does not inherently solve the multi-agent concurrent queuing problem for constrained local resources.

### Proposed API / Implementation
```python
from google.adk.models import BaseLLM
import queue

class LiteRTModel(BaseLLM):
    def __init__(self, model_path: str, manage_kv_cache: bool = True):
        super().__init__()
        # Internal queue to manage concurrent agent requests
        self._request_queue = queue.Queue()
        self.manage_kv_cache = manage_kv_cache
        # ... Litert-lm initialization ...
        
    # Implementation overriding standard generation to utilize the queue
    # and handle context swapping via the KV cache for different instruction agents.
```

### Additional Context
This aligns with ADK's goal of being deployment-agnostic by making local-first, offline execution a first-class citizen for complex multi-agent workflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Litert-lm integration with local request queuing and KV cache management #5575

Is your feature request related to a specific problem?

Describe the Solution You'd Like

Impact on your work

Willingness to contribute

🟡 Recommended Information

Proposed API / Implementation

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Native Litert-lm integration with local request queuing and KV cache management #5575

Description

Is your feature request related to a specific problem?

Describe the Solution You'd Like

Impact on your work

Willingness to contribute

🟡 Recommended Information

Proposed API / Implementation

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions