bug: --device npu resolves to QNN on AMD machines (should use VitisAI)

## Summary

When running `winml perf --device npu` (or any command that resolves device → EP) on an AMD machine, the tool hardcodes `qnn` as the NPU provider. Since `QNNExecutionProvider` is not present on AMD hardware, inference falls through with a warning and likely silently degrades to CPU.

## Context

The NPU EP is platform-dependent:
- **Qualcomm / Intel NPU** → `QNNExecutionProvider` (short name `qnn`)
- **AMD NPU (Ryzen AI)** → `VitisAIExecutionProvider` (short name `vitisai`)

The current code has `"npu": "qnn"` hardcoded as the single NPU provider, so any AMD NPU machine hits the wrong EP.

## Observed behavior (screenshot)

```
winml perf -m openai/clip-vit-base-patch32 --device npu --iterations 100
...
[2026-04-30T15:16:37] WARNING: EP 'qnn' (QNNExecutionProvider) not found in available devices
```

The model runs but the device shown is `npu` while the QNN warning indicates the session fell back — likely to CPU policy-based selection.

## Root Cause

Three locations share the same hardcoded assumption:

1. **`src/winml/modelkit/config/precision.py:66`** — `_DEVICE_TO_PROVIDER` dict:
   ```python
   _DEVICE_TO_PROVIDER: dict[str, str | None] = {
       "npu": "qnn",   # ← wrong for AMD
       "gpu": "dml",
       "cpu": None,
   }
   ```
   Used at `precision.py:296` to set `compile_provider` during `resolve_precision()`.

2. **`src/winml/modelkit/commands/compile.py:280-283`** — `_resolve_compile_provider()`:
   ```python
   provider = _DEVICE_TO_PROVIDER.get(device.lower())
   if provider is None:
       return "cpu" if device.lower() == "cpu" else "qnn"   # ← also hardcoded
   return provider
   ```

3. **`src/winml/modelkit/session/session.py:438-451`** — once `self._ep = "qnn"` is propagated from the compile config, `_build_session_options` tries to find `QNNExecutionProvider` via `_find_ep_device()`. On AMD it returns `None` → warning fires → falls through to policy-based selection.

The session already has the right mechanism for discovery (`_find_ep_for_device`, `session.py:487-510`), but it is bypassed when `self._ep` is pre-set to `"qnn"`.

## Desired State

When `--device npu` is requested:
- On Qualcomm/Intel: resolves to `qnn` (QNNExecutionProvider) — current behavior, keep
- On AMD Ryzen AI: resolves to `vitisai` (VitisAIExecutionProvider)

The EP selection for NPU should inspect which NPU EP is actually available at runtime (via `ort.get_ep_devices()` or `_get_available_eps()` from `sysinfo/device.py`) rather than hardcoding `qnn`.

## Acceptance Criteria

- [ ] `winml perf --device npu` on an AMD machine uses `VitisAIExecutionProvider` without warnings
- [ ] `winml perf --device npu` on a Qualcomm/Intel machine continues to use `QNNExecutionProvider`
- [ ] `_DEVICE_TO_PROVIDER["npu"]` is no longer a static string — it is either removed or replaced with a runtime lookup
- [ ] The compile command (`winml compile --device npu`) resolves to `vitisai` on AMD
- [ ] No new architecture-specific hardcoding is introduced (per CLAUDE.md Cardinal Rule 1)
- [ ] Existing tests pass; new tests cover AMD-style EP discovery for NPU device

## Technical Notes

- `sysinfo/device.py` already has the right EP-to-device map (`_EP_DEVICE_MAP`) where both `QNNExecutionProvider` and `VitisAIExecutionProvider` map to `"npu"`. The inverse `_DEVICE_EP_MAP["npu"]` already contains both.
- `resolve_device()` (`sysinfo/device.py:146`) uses `_DEVICE_EP_MAP` and `_get_available_eps()` correctly — it returns the right device string. The problem is downstream: `get_provider_for_device()` then re-maps that device back to a single hardcoded EP.
- The fix should replace `get_provider_for_device("npu")` with a function that inspects `_get_available_eps()` and picks the first available NPU EP from `_DEVICE_EP_MAP["npu"]` (priority: `VitisAIExecutionProvider`, `QNNExecutionProvider`, or whatever is present).
- `precision.py` is pure decision logic (no I/O, no imports of sysinfo at module level) — keep sysinfo calls in callers (`build.py`, `compile.py`) rather than inside `precision.py` to preserve testability.

## Related Files

- `src/winml/modelkit/config/precision.py:64-68` — `_DEVICE_TO_PROVIDER` dict (primary fix location)
- `src/winml/modelkit/config/precision.py:296` — `compile_provider` assignment in `resolve_precision()`
- `src/winml/modelkit/commands/compile.py:271-284` — `_resolve_compile_provider()` secondary hardcode
- `src/winml/modelkit/session/session.py:438-451` — EP matching in `_build_session_options()` (symptom surface)
- `src/winml/modelkit/sysinfo/device.py:37-58` — `_EP_DEVICE_MAP` / `_DEVICE_EP_MAP` (already correct, use this)
- `src/winml/modelkit/sysinfo/device.py:112-143` — `_get_available_eps()` (runtime EP discovery)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: --device npu resolves to QNN on AMD machines (should use VitisAI) #429

Summary

Context

Observed behavior (screenshot)

Root Cause

Desired State

Acceptance Criteria

Technical Notes

Related Files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug: --device npu resolves to QNN on AMD machines (should use VitisAI) #429

Description

Summary

Context

Observed behavior (screenshot)

Root Cause

Desired State

Acceptance Criteria

Technical Notes

Related Files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions