-
Notifications
You must be signed in to change notification settings - Fork 12
feat: exact weight-byte accounting from safetensors metadata #53
Copy link
Copy link
Closed
Labels
area:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layersarea:modelsModel architectures, weights, loading, metadataModel architectures, weights, loading, metadatapriority:mediumMedium priorityMedium prioritystatus:doneCompletedCompletedtype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additions
Metadata
Metadata
Assignees
Labels
area:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layersarea:modelsModel architectures, weights, loading, metadataModel architectures, weights, loading, metadatapriority:mediumMedium priorityMedium prioritystatus:doneCompletedCompletedtype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additions
Type
Fields
Give feedbackNo fields configured for issues without a type.
Part of #52
Goal
Compute byte-accurate model weight size before load by reading the safetensors header, with the existing analytical estimate (
ModelProfile::total_param_bytes) as a fallback.Why
The most accurate weight number is what is actually on disk.
model.safetensors.index.jsoncarriesmetadata.total_size(exact sum of all tensor bytes), and a single-filemodel.safetensorscarries a header with each tensor's dtype + shape + byte offsets. mlxcel already opens the index (parse_shard_indexinsrc/lib/mlxcel-core/src/weights.rs) but extracts only shard filenames and discardstotal_size. The analytical estimate (build_profile_from_json) is only "few % accurate" and currently lives only in the distributed path.Scope / implementation
weights::weight_footprint_bytes(model_dir) -> Option<u64>) that:metadata.total_sizefrommodel.safetensors.index.jsonwhen present;model.safetensors, reads the 8-byte header length + JSON header and sums per-tensor byte sizes (dtype itemsize × shape product) without loading tensor data;Nonewhen neither is available (caller falls back to analytical).weights.rsrather than adding a second JSON reader.Integration (required for completion — not a standalone helper)
--recommend-quant(src/execution/quant_advisor.rs) and the new estimator (sub-issue D), so the analytical path is only a fallback.--recommend-quant's "Model size" line must reflect exact bytes when a safetensors header is available.Acceptance criteria
index.json/total_size) and single-file (model.safetensorsheader) layouts;Nonetriggers the analytical fallback path.ModelProfile::total_param_bytesand ≤ the on-disk weight file size.