Implement safe MLX KV reuse with scoped GPU cache eviction by mattt · Pull Request #147 · mattt/AnyLanguageModel

mattt · 2026-03-23T10:36:34Z

Alternative to #139

Copilot

Pull request overview

This PR extends MLXLanguageModel with session-scoped KV cache reuse and introduces a scoped GPU buffer-cache limit/eviction mechanism to improve multi-turn stability and reduce redundant prefill work in MLX-backed sessions.

Changes:

Add MLX-specific GenerationOptions customizations (KV cache sizing + quantization knobs) and map them into MLX generate parameters.
Implement per-LanguageModelSession KV cache storage/reuse plus a global “one active generation per session” gate and GPU cache-limit scoping.
Add tests for multi-turn behavior and cache-clearing safety; document new MLX configuration in the README.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`Sources/AnyLanguageModel/Models/MLXLanguageModel.swift`	Adds MLX custom options, session KV reuse, concurrency gate, and GPU cache-limit scoping/eviction.
`Tests/AnyLanguageModelTests/MLXLanguageModelTests.swift`	Adds multi-turn same-session test and cache-clear-then-respond test.
`Tests/AnyLanguageModelTests/CustomGenerationOptionsTests.swift`	Adds tests for `MLXLanguageModel.CustomGenerationOptions` initialization, integration, and Codable.
`README.md`	Documents MLX KV-cache tuning via `GenerationOptions` and GPU cache configuration via `MLXLanguageModel` init.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sources/AnyLanguageModel/Models/MLXLanguageModel.swift

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sources/AnyLanguageModel/Models/MLXLanguageModel.swift

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sources/AnyLanguageModel/Models/MLXLanguageModel.swift

mattt mentioned this pull request Mar 23, 2026

Feature/mlx kv cache #139

Closed

mattt force-pushed the mattt/mlx-kv-cache branch from d8f85d7 to d8c6072 Compare March 23, 2026 10:54

mattt mentioned this pull request Mar 23, 2026

Add additionalContext support to MLXLanguageModel #145

Open

Implement safe MLX KV reuse with scoped GPU cache eviction

770e3bf

mattt force-pushed the mattt/mlx-kv-cache branch from d8c6072 to 770e3bf Compare March 23, 2026 11:12

mattt marked this pull request as ready for review March 23, 2026 11:12

mattt requested a review from Copilot March 23, 2026 11:13

Copilot started reviewing on behalf of mattt March 23, 2026 11:13 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Incorporate feedback from review

3a2b9ea

mattt requested a review from Copilot March 23, 2026 11:25

Copilot started reviewing on behalf of mattt March 23, 2026 11:25 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

mattt requested a review from Copilot March 23, 2026 12:27

Copilot started reviewing on behalf of mattt March 23, 2026 12:28 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Sources/AnyLanguageModel/Models/MLXLanguageModel.swift Outdated Show resolved Hide resolved

Sources/AnyLanguageModel/Models/MLXLanguageModel.swift Outdated Show resolved Hide resolved

Incorporate feedback from review

3cf7548

mattt force-pushed the mattt/mlx-kv-cache branch from 27f0292 to 3cf7548 Compare March 23, 2026 12:43

Incorporate feedback from second round of review

38ed0df

mattt merged commit 0392da5 into main Mar 23, 2026
3 checks passed

mattt deleted the mattt/mlx-kv-cache branch March 23, 2026 13:03

mattt mentioned this pull request Mar 23, 2026

Fixes for Issue #112 concerning errors with tool calling with jinja templates and MLX #140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement safe MLX KV reuse with scoped GPU cache eviction#147

Implement safe MLX KV reuse with scoped GPU cache eviction#147
mattt merged 4 commits intomainfrom
mattt/mlx-kv-cache

mattt commented Mar 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mattt commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mattt commented Mar 23, 2026 •

edited

Loading