Skip to content

Conversation

@davidkoski
Copy link
Collaborator

Support for mlx-swift 0.29.1. In particular this is updates for changes in quantization.

@davidkoski davidkoski requested a review from awni October 13, 2025 20:32
Package.swift Outdated
],
dependencies: [
.package(url: "https://github.com/ml-explore/mlx-swift", .upToNextMinor(from: "0.25.5")),
.package(url: "https://github.com/ml-explore/mlx-swift", branch: "mlx-0291"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now just point to the branch.

static public let gpt_oss_20b_MXFP4_Q8 = ModelConfiguration(
id: "mlx-community/gpt-oss-20b-MXFP4-Q8",
defaultPrompt: "Why is the sky blue?"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MXFP4 quantization is now supported. This model was used to test that and the quantized kvcache.

class QuantizedSwitchLinear: SwitchLinear, Quantized {
@ModuleInfo(key: "scales") var scales: MLXArray
@ModuleInfo(key: "biases") var biases: MLXArray
@ModuleInfo(key: "biases") var biases: MLXArray?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

biases are now optional

public let bits: Int
public var quantMethod: String? = nil
public var linearClass: String? = nil
public var quantizationMode: String? = nil
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were defined so that they could be skipped (below). We can just skip them directly.


// additional keys that are not layer instructions, see
// mlx-community/bitnet-b1.58-2B-4T-4bit
case "quant_method", "linear_class", "quantization_mode": continue
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip directly.

/// - Returns: Quantized tuples (keys, values) as ((weight, scales, biases), (weight, scales, biases))
func updateQuantized(keys: MLXArray, values: MLXArray) -> (
(MLXArray, MLXArray, MLXArray), (MLXArray, MLXArray, MLXArray)
(MLXArray, MLXArray, MLXArray?), (MLXArray, MLXArray, MLXArray?)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for optional biases.

case groupSize = "group_size"
case bits
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just use the standard configuration for this

Comment on lines -460 to 462
let fullPath = "language_model.\(path)"
if weights["\(fullPath).scales"] != nil && weights["\(fullPath).biases"] != nil
if weights["\(fullPath).scales"] != nil
&& weights["\(fullPath).weight"]?.dtype == .uint32
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

biases is now optional -- handle that case. I am not sure this is strictly required as the load() method handles this, though not with this exact logic.

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. You added mode support in the QKV cache which is a step beyond mlx-lm 😄

- handle changes in quantization
@davidkoski davidkoski merged commit 9bff95c into main Oct 16, 2025
2 checks passed
@davidkoski davidkoski deleted the mlx-0291 branch October 16, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants