Skip to content

feat: exact weight-byte accounting from safetensors metadata #53

@inureyes

Description

@inureyes

Part of #52

Goal

Compute byte-accurate model weight size before load by reading the safetensors header, with the existing analytical estimate (ModelProfile::total_param_bytes) as a fallback.

Why

The most accurate weight number is what is actually on disk. model.safetensors.index.json carries metadata.total_size (exact sum of all tensor bytes), and a single-file model.safetensors carries a header with each tensor's dtype + shape + byte offsets. mlxcel already opens the index (parse_shard_index in src/lib/mlxcel-core/src/weights.rs) but extracts only shard filenames and discards total_size. The analytical estimate (build_profile_from_json) is only "few % accurate" and currently lives only in the distributed path.

Scope / implementation

  • Add a function (e.g. weights::weight_footprint_bytes(model_dir) -> Option<u64>) that:
    • prefers metadata.total_size from model.safetensors.index.json when present;
    • else, for a single model.safetensors, reads the 8-byte header length + JSON header and sums per-tensor byte sizes (dtype itemsize × shape product) without loading tensor data;
    • returns None when neither is available (caller falls back to analytical).
  • Reuse / extend the existing index parser in weights.rs rather than adding a second JSON reader.
  • Provide a single accessor in the estimation module so callers don't each pick between exact/analytical.

Integration (required for completion — not a standalone helper)

  • Wire the exact footprint into the shared estimator used by --recommend-quant (src/execution/quant_advisor.rs) and the new estimator (sub-issue D), so the analytical path is only a fallback.
  • At minimum, --recommend-quant's "Model size" line must reflect exact bytes when a safetensors header is available.

Acceptance criteria

  • Exact byte count returned for both sharded (index.json / total_size) and single-file (model.safetensors header) layouts; None triggers the analytical fallback path.
  • Unit tests with fixture headers / index JSON (sharded + single-file + missing).
  • One real-model smoke check: exact footprint is within a few % of ModelProfile::total_param_bytes and ≤ the on-disk weight file size.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:coremlxcel-core: MLX FFI, primitives, KV cache, layersarea:modelsModel architectures, weights, loading, metadatapriority:mediumMedium prioritystatus:doneCompletedtype:enhancementNew features, capabilities, or significant additions

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions