refactor: unify quantization config and integrate quant module into compiler

## Summary

The compiler module has its own internal quantization pipeline (calibrate → qdq stages) that duplicates the quant module's functionality with bugs and config mismatches. This needs to be unified.

## Key Findings

### 1. Compiler's internal quantization is broken
`CalibrateStage` collects calibration ranges via ORT's `MinMaxCalibrater`, but `QDQStage` ignores them entirely — it creates a `PrecomputedCalibrationReader` with all-ones dummy data and passes that to `quantize_static()`. The calibration work is wasted. The quant module's `get_qdq_config()` + `quantize()` does not have this bug.

### 2. Duplicate configs with different defaults
`QDQConfig` (compiler) and `WinMLQuantizationConfig` (quant) overlap on 4 fields:
- `weight_type`: compiler defaulted to `int8`, quant defaults to `uint8` — **caused QNN EP to reject LayerNorm nodes** (30 CPU fallback partitions instead of 1 EPContext). Fixed in #232 but root cause is the duplication.
- `activation_type`, `per_channel`, `symmetric`: same semantics, different classes.
- `samples`: quant defaults to 10, compiler's `CalibrationConfig` defaults to 100.

### 3. `compile.quantize=True` is misleading
`WinMLCompileConfig.to_dict()` always emits `quantize: True` when `qdq_config` is not None. But `DetectStage` silently overrides this when Q/DQ nodes already exist (the normal build pipeline path). The flag appears to control behavior but is actually a no-op in production.

## Proposed Design

### Step 1: Unify quantization config
Make `WinMLQuantizationConfig` the single source of truth. Remove `QDQConfig` + `CalibrationConfig` from compiler. Port missing compiler-only fields (`distribution`, `seed`, `load_path`) to `WinMLQuantizationConfig`.

### Step 2: Replace compiler's calibrate+qdq with quant module call
Replace `CalibrateStage` + `QDQStage` with a single `QuantizeStage` that calls `quantize_onnx()` from the quant module. Fixes the broken calibration pipeline.

### Step 3: Thread `quant_config` through `CompileContext`
Add `quant_config: WinMLQuantizationConfig | None` to `CompileContext`. Update `context.quantize` to check `self.quant_config is not None`.

### Step 4: Fix serialization round-trip
`WinMLCompileConfig.to_dict()` / `from_dict()` should serialize/deserialize via `WinMLQuantizationConfig.to_dict()` instead of the flat field extraction that caused the int8/uint8 mismatch.

### Step 5: Add build config validation
When both `config.quant` and `config.compile.quant_config` are set, raise `ValueError` — the build pipeline runs quant before compile, so they can't both be active.

### Step 6: Propagate `QuantizeResult` into build manifest
Thread `nodes_quantized`, `calibration_time_seconds`, `qdq_insertion_time_seconds` from `QuantizeResult` into the build manifest for observability.

## Additional: `wmk build` for pre-exported ONNX models

The build command currently requires a HuggingFace model ID and runs Export → Optimize → Analyze → Quantize → Compile. We also need a path where:

```bash
# Build from existing ONNX (skip export/optimize/analyze)
wmk build --onnx model.onnx --device qnn --precision uint8 -o output/
```

This would:
- Accept a pre-exported ONNX model as input (`--onnx` flag)
- Skip export, optimize, and analyze stages
- Accept `--device` (qnn, dml, cpu) and `--precision` (uint8, int8, fp16, fp32)
- Map `--device` + `--precision` to the appropriate `WinMLQuantizationConfig` + `WinMLCompileConfig`
- Invoke quantize → compile with the unified config

This enables the "I already have an ONNX model, just quantize and compile it for my target device" workflow.

## Acceptance Criteria

- [ ] `QDQConfig` and `CalibrationConfig` removed from compiler
- [ ] `WinMLQuantizationConfig` is the single quant config used by both modules
- [ ] Compiler uses `quantize_onnx()` internally (no more broken calibrate+qdq)
- [ ] `wmk build --onnx model.onnx --device qnn --precision uint8` works
- [ ] Build manifest records quantization metrics
- [ ] No regression in `wmk perf` for all existing models

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: unify quantization config and integrate quant module into compiler #71

Summary

Key Findings

1. Compiler's internal quantization is broken

2. Duplicate configs with different defaults

3. `compile.quantize=True` is misleading

Proposed Design

Step 1: Unify quantization config

Step 2: Replace compiler's calibrate+qdq with quant module call

Step 3: Thread `quant_config` through `CompileContext`

Step 4: Fix serialization round-trip

Step 5: Add build config validation

Step 6: Propagate `QuantizeResult` into build manifest

Additional: `wmk build` for pre-exported ONNX models

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

refactor: unify quantization config and integrate quant module into compiler #71

Description

Summary

Key Findings

1. Compiler's internal quantization is broken

2. Duplicate configs with different defaults

3. compile.quantize=True is misleading

Proposed Design

Step 1: Unify quantization config

Step 2: Replace compiler's calibrate+qdq with quant module call

Step 3: Thread quant_config through CompileContext

Step 4: Fix serialization round-trip

Step 5: Add build config validation

Step 6: Propagate QuantizeResult into build manifest

Additional: wmk build for pre-exported ONNX models

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

3. `compile.quantize=True` is misleading

Step 3: Thread `quant_config` through `CompileContext`

Step 6: Propagate `QuantizeResult` into build manifest

Additional: `wmk build` for pre-exported ONNX models