-
Couldn't load subscription status.
- Fork 329
prep for mlx-swift 0.29.1 #411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Package.swift
Outdated
| ], | ||
| dependencies: [ | ||
| .package(url: "https://github.com/ml-explore/mlx-swift", .upToNextMinor(from: "0.25.5")), | ||
| .package(url: "https://github.com/ml-explore/mlx-swift", branch: "mlx-0291"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now just point to the branch.
| static public let gpt_oss_20b_MXFP4_Q8 = ModelConfiguration( | ||
| id: "mlx-community/gpt-oss-20b-MXFP4-Q8", | ||
| defaultPrompt: "Why is the sky blue?" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MXFP4 quantization is now supported. This model was used to test that and the quantized kvcache.
| class QuantizedSwitchLinear: SwitchLinear, Quantized { | ||
| @ModuleInfo(key: "scales") var scales: MLXArray | ||
| @ModuleInfo(key: "biases") var biases: MLXArray | ||
| @ModuleInfo(key: "biases") var biases: MLXArray? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
biases are now optional
| public let bits: Int | ||
| public var quantMethod: String? = nil | ||
| public var linearClass: String? = nil | ||
| public var quantizationMode: String? = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were defined so that they could be skipped (below). We can just skip them directly.
|
|
||
| // additional keys that are not layer instructions, see | ||
| // mlx-community/bitnet-b1.58-2B-4T-4bit | ||
| case "quant_method", "linear_class", "quantization_mode": continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip directly.
| /// - Returns: Quantized tuples (keys, values) as ((weight, scales, biases), (weight, scales, biases)) | ||
| func updateQuantized(keys: MLXArray, values: MLXArray) -> ( | ||
| (MLXArray, MLXArray, MLXArray), (MLXArray, MLXArray, MLXArray) | ||
| (MLXArray, MLXArray, MLXArray?), (MLXArray, MLXArray, MLXArray?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support for optional biases.
| case groupSize = "group_size" | ||
| case bits | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just use the standard configuration for this
| let fullPath = "language_model.\(path)" | ||
| if weights["\(fullPath).scales"] != nil && weights["\(fullPath).biases"] != nil | ||
| if weights["\(fullPath).scales"] != nil | ||
| && weights["\(fullPath).weight"]?.dtype == .uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
biases is now optional -- handle that case. I am not sure this is strictly required as the load() method handles this, though not with this exact logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. You added mode support in the QKV cache which is a step beyond mlx-lm 😄
- handle changes in quantization
Support for mlx-swift 0.29.1. In particular this is updates for changes in quantization.