Skip to content

Implement safe MLX KV reuse with scoped GPU cache eviction#147

Merged
mattt merged 4 commits intomainfrom
mattt/mlx-kv-cache
Mar 23, 2026
Merged

Implement safe MLX KV reuse with scoped GPU cache eviction#147
mattt merged 4 commits intomainfrom
mattt/mlx-kv-cache

Conversation

@mattt
Copy link
Owner

@mattt mattt commented Mar 23, 2026

Alternative to #139

@mattt mattt force-pushed the mattt/mlx-kv-cache branch from d8c6072 to 770e3bf Compare March 23, 2026 11:12
@mattt mattt marked this pull request as ready for review March 23, 2026 11:12
@mattt mattt requested a review from Copilot March 23, 2026 11:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends MLXLanguageModel with session-scoped KV cache reuse and introduces a scoped GPU buffer-cache limit/eviction mechanism to improve multi-turn stability and reduce redundant prefill work in MLX-backed sessions.

Changes:

  • Add MLX-specific GenerationOptions customizations (KV cache sizing + quantization knobs) and map them into MLX generate parameters.
  • Implement per-LanguageModelSession KV cache storage/reuse plus a global “one active generation per session” gate and GPU cache-limit scoping.
  • Add tests for multi-turn behavior and cache-clearing safety; document new MLX configuration in the README.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
Sources/AnyLanguageModel/Models/MLXLanguageModel.swift Adds MLX custom options, session KV reuse, concurrency gate, and GPU cache-limit scoping/eviction.
Tests/AnyLanguageModelTests/MLXLanguageModelTests.swift Adds multi-turn same-session test and cache-clear-then-respond test.
Tests/AnyLanguageModelTests/CustomGenerationOptionsTests.swift Adds tests for MLXLanguageModel.CustomGenerationOptions initialization, integration, and Codable.
README.md Documents MLX KV-cache tuning via GenerationOptions and GPU cache configuration via MLXLanguageModel init.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mattt mattt force-pushed the mattt/mlx-kv-cache branch from 27f0292 to 3cf7548 Compare March 23, 2026 12:43
@mattt mattt merged commit 0392da5 into main Mar 23, 2026
3 checks passed
@mattt mattt deleted the mattt/mlx-kv-cache branch March 23, 2026 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants