Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ We're always looking for your help to improve the product (bug fixes, new featur

## Contribute a code change

* Start by reading the project [README](./README.md) to understand the scope and goals of ModelKit.
* If your change is non-trivial or introduces new public facing APIs, please use the [feature request issue template](https://github.com/microsoft/ModelKit/issues/new) to discuss it with the team first.
* Start by reading the project [README](./README.md) to understand the scope and goals of WinML CLI.
* If your change is non-trivial or introduces new public facing APIs, please use the [feature request issue template](https://github.com/microsoft/WinML-ModelKit/issues/new) to discuss it with the team first.
* For all other changes, you can directly create a pull request (PR) and we'll be happy to take a look.
* Make sure your PR adheres to the coding conventions and standards below.

Expand All @@ -22,7 +22,7 @@ This installs all dependencies and enables [pre-commit hooks](https://pre-commit

### Runtime check rules

When running ModelKit from a source tree (`uv run winml ...`), you need to populate the runtime check rule zips locally. See [`src/winml/modelkit/analyze/rules/runtime_check_rules/README.md`](./src/winml/modelkit/analyze/rules/runtime_check_rules/README.md) for setup options (GitHub release for external contributors, `gim-home` script for Microsoft internal, `MODELKIT_RULES_DIR` override).
When running WinML CLI from a source tree (`uv run winml ...`), you need to populate the runtime check rule zips locally. See [`src/winml/modelkit/analyze/rules/runtime_check_rules/README.md`](./src/winml/modelkit/analyze/rules/runtime_check_rules/README.md) for setup options (GitHub release for external contributors, `gim-home` script for Microsoft internal, `WINMLCLI_RULES_DIR` override).

## Coding conventions and standards

Expand Down
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# ModelKit
# WinML CLI

[![ModelKit CI](https://github.com/microsoft/WinML-ModelKit/actions/workflows/modelkit-ci.yml/badge.svg)](https://github.com/microsoft/WinML-ModelKit/actions/workflows/modelkit-ci.yml)
![Status](https://img.shields.io/badge/status-early%20access-blue)
![Python](https://img.shields.io/badge/python-3.10%2B-blue?logo=python&logoColor=white)
![License](https://img.shields.io/badge/license-MIT-green)

**ModelKit** is a CLI toolkit to build **portable, performant, and high-quality** models for Windows ML. It covers the entire journey from pretrained model to on-device inference — export, optimization, quantization, compilation, and benchmarking — across **all execution providers**, regardless of silicon.
**WinML CLI** is a CLI toolkit to build **portable, performant, and high-quality** models for Windows ML. It covers the entire journey from pretrained model to on-device inference — export, optimization, quantization, compilation, and benchmarking — across **all execution providers**, regardless of silicon.

---

## :dart: ModelKit Is Right for You If
## :dart: WinML CLI Is Right for You If

- [x] You want to build models that run on **any Windows device** — Qualcomm, Intel, AMD, NVIDIA, or CPU
- [x] You want to benchmark a model with **one command** — latency, throughput, and live hardware utilization
Expand All @@ -32,7 +32,7 @@
| **Dml** | Hardware-agnostic GPU backend | 🔶 Planned | `--ep dml` | `--device gpu` |
| **CPU** | Cross-platform fallback | ⚪ Always available | `--ep cpu` | `--device cpu` |

> **Tip:** Use `--device auto` and ModelKit picks the best available device — NPU first, then GPU, then CPU.
> **Tip:** Use `--device auto` and WinML CLI picks the best available device — NPU first, then GPU, then CPU.

---

Expand All @@ -45,19 +45,19 @@
| **Windows 11** (x64 or ARM64) | Windows 11 24H2+ required for NPU support |
| **UV** | Install [UV](https://github.com/astral-sh/uv) |
| **Windows App SDK Runtime 1.8** | [Latest Windows App SDK downloads](https://learn.microsoft.com/en-us/windows/apps/windows-app-sdk/downloads) |
| **ModelKit** (Python wheel) | See release instructions |
| **WinML CLI** (Python wheel) | See release instructions |

### Required Hardware

**ModelKit targets NPU.** We recommend testing on one of the following NPU devices:
**WinML CLI targets NPU.** We recommend testing on one of the following NPU devices:

| Device | EP | Flag |
|--------|-----|------|
| Snapdragon X Elite (Qualcomm) | QNN | `--ep qnn --device npu` |
| Intel AI Boost (Meteor Lake / Lunar Lake) | OpenVINO | `--ep openvino --device npu` |
| AMD Ryzen AI (Phoenix / Hawk Point / Strix) | VitisAI | `--ep vitisai --device npu` |

**No NPU?** Use `--device auto` — ModelKit will fall back to the best available device (GPU → CPU). Note that `winml compile` requires NPU and cannot run without one.
**No NPU?** Use `--device auto` — WinML CLI will fall back to the best available device (GPU → CPU). Note that `winml compile` requires NPU and cannot run without one.

### Accepted Inputs

Expand All @@ -78,7 +78,7 @@ If `inspect` prints an error or shows `Unsupported`, **skip that model**. Only m

## :package: Installation

ModelKit requires **Python 3.10** and is distributed as a Python wheel. We recommend [uv](https://docs.astral.sh/uv/) for fast, reproducible environment setup.
WinML CLI requires **Python 3.10** and is distributed as a Python wheel. We recommend [uv](https://docs.astral.sh/uv/) for fast, reproducible environment setup.

**1. Create a Python 3.10 environment**

Expand Down Expand Up @@ -114,7 +114,7 @@ Confirm that your target device and EP appear in the output:
- **Intel AI Boost** — look for `OpenVINOExecutionProvider`
- **AMD Ryzen AI** — look for `VitisAIExecutionProvider`

If no NPU is detected, you can still use ModelKit with `--device auto` for most commands. The only exception is `winml compile`, which requires an NPU device.
If no NPU is detected, you can still use WinML CLI with `--device auto` for most commands. The only exception is `winml compile`, which requires an NPU device.

---

Expand Down Expand Up @@ -177,7 +177,7 @@ If no NPU is detected, you can still use ModelKit with `--device auto` for most

**`winml doctor`** — Diagnose environment issues. Checks runtimes, execution providers, and dependencies to identify configuration problems.

**`winml setting`** — Configure ModelKit preferences. Set default EPs, output directories, and other global options.
**`winml setting`** — Configure WinML CLI preferences. Set default EPs, output directories, and other global options.

**`winml sys`** — System information and capability reporting. Prints detected hardware, available EPs, Python version, and installed package versions.

Expand Down Expand Up @@ -300,15 +300,15 @@ The simplest way to evaluate a model — one command, zero setup:
winml perf -m facebook/convnext-base-224 --device npu --monitor
```

ModelKit handles everything behind the scenes: download the model from Hugging Face, export to ONNX, optimize the graph, and run the benchmark on your NPU. The `--monitor` flag enables live hardware monitoring — real-time CPU utilization, RAM usage, and NPU activity alongside the latency results.
WinML CLI handles everything behind the scenes: download the model from Hugging Face, export to ONNX, optimize the graph, and run the benchmark on your NPU. The `--monitor` flag enables live hardware monitoring — real-time CPU utilization, RAM usage, and NPU activity alongside the latency results.

This is ideal for quick smoke tests: does the model run on this device, and how fast is it?

---

## :arrows_counterclockwise: The BYOM Workflow

The **Build Your Own Model** (BYOM) workflow is the philosophy behind ModelKit. It defines how a source model becomes a production-ready, device-optimized artifact.
The **Build Your Own Model** (BYOM) workflow is the philosophy behind WinML CLI. It defines how a source model becomes a production-ready, device-optimized artifact.

### The Pipeline

Expand All @@ -318,7 +318,7 @@ Source Model --> Export --> Analyze --> Optimize --> Quantize --> Compile --> Be

![BYOM Workflow](docs/assets/workflow-only.svg)

Each arrow is a ModelKit command. You can enter the pipeline at any stage (for example, start with a local ONNX file and skip export), exit early (stop after optimization if you do not need quantization), or loop back to repeat a stage with different settings.
Each arrow is a WinML CLI command. You can enter the pipeline at any stage (for example, start with a local ONNX file and skip export), exit early (stop after optimization if you do not need quantization), or loop back to repeat a stage with different settings.

### Primitive Commands vs. Config-Driven Pipeline

Expand Down Expand Up @@ -361,17 +361,17 @@ Run `winml catalog` to browse the full catalog interactively.

</details>

These models are verified against ModelKit's full pipeline and serve as reliable starting points. You are not limited to this list — any Hugging Face model that passes `winml inspect` is a valid input.
These models are verified against WinML CLI's full pipeline and serve as reliable starting points. You are not limited to this list — any Hugging Face model that passes `winml inspect` is a valid input.

For models not in this table, run `winml inspect -m <model-id>` to verify support before proceeding.

---

## :warning: Scope & Limitations

### What ModelKit supports
### What WinML CLI supports

ModelKit targets **classic deep learning models** — CNNs, encoders, vision transformers, NLP classifiers, token classifiers, object detection models, and segmentation models.
WinML CLI targets **classic deep learning models** — CNNs, encoders, vision transformers, NLP classifiers, token classifiers, object detection models, and segmentation models.

Supported tasks include:
- Image classification (ResNet, ViT, Swin, ConvNeXT)
Expand All @@ -380,9 +380,9 @@ Supported tasks include:
- Object detection (Table Transformer)
- Image segmentation (SegFormer)

### What ModelKit does not support
### What WinML CLI does not support

**LLMs and generative models are not in scope.** Do not use ModelKit with GPT, LLaMA, Phi, Mistral, Stable Diffusion, or any model with a decoder-only or sequence-to-sequence generative architecture. LLM support (with LoRA) is planned for Q3-Q4 2026.
**LLMs and generative models are not in scope.** Do not use WinML CLI with GPT, LLaMA, Phi, Mistral, Stable Diffusion, or any model with a decoder-only or sequence-to-sequence generative architecture. LLM support (with LoRA) is planned for Q3-Q4 2026.

### Known constraints

Expand Down Expand Up @@ -432,7 +432,7 @@ Supported tasks include:

## :lock: Data / Telemetry

Official ModelKit releases can collect anonymous usage telemetry to
Official WinML CLI releases can collect anonymous usage telemetry to
help improve the product. Telemetry is classified as **Optional**. A
one-time prompt on your first run asks for consent (default: accept —
press Enter to enable, type `n` to decline).
Expand All @@ -459,7 +459,7 @@ locations.

We welcome contributions! Please see the [contribution guidelines](CONTRIBUTING.md).

For feature requests or bug reports, please file a [GitHub Issue](https://github.com/microsoft/ModelKit/issues).
For feature requests or bug reports, please file a [GitHub Issue](https://github.com/microsoft/WinML-ModelKit/issues).

---

Expand Down
2 changes: 1 addition & 1 deletion SUPPORT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## How to file issues and get help

This project uses [GitHub Issues](https://github.com/microsoft/ModelKit/issues) to track bugs and feature requests. Please search the existing
This project uses [GitHub Issues](https://github.com/microsoft/WinML-ModelKit/issues) to track bugs and feature requests. Please search the existing
issues before filing new issues to avoid duplicates. For new issues, file your bug or
feature request as a new Issue.

Expand Down
22 changes: 11 additions & 11 deletions docs/Privacy.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# ModelKit Privacy Statement
# WinML CLI Privacy Statement

ModelKit collects limited, anonymous telemetry to help improve the
WinML CLI collects limited, anonymous telemetry to help improve the
product. This page describes exactly what is collected, what is not,
and how to control it.

## Data category

All ModelKit telemetry is classified as **Optional** under Microsoft's
All WinML CLI telemetry is classified as **Optional** under Microsoft's
data categorization model. None of it is required to run any feature;
it exists solely to support product improvement.

Expand All @@ -20,15 +20,15 @@ see the prompt and default to off.

## Events collected

When telemetry is enabled, ModelKit emits three event types:
When telemetry is enabled, WinML CLI emits three event types:

### ModelKitHeartbeat
### WinMLCLIHeartbeat

Sent once per CLI invocation, just before the requested command runs.
Carries only context attributes (OS, architecture, app version, device
ID) — no per-event payload.

### ModelKitAction
### WinMLCLIAction

Sent once per command completion.

Expand All @@ -41,7 +41,7 @@ Sent once per command completion.
| `duration_ms` | Wall-clock execution time in milliseconds. |
| `success` | Whether the command completed without raising. |

### ModelKitError
### WinMLCLIError

Sent only when a command raises an unhandled exception.

Expand All @@ -61,7 +61,7 @@ not by the command code):
| `device_id` | SHA256 hash of a randomly generated UUID, persisted per machine. Enables counting distinct users without identifying them. |
| `id_status` | `EXISTING`, `NEW`, or `FAILED` — how the device ID was obtained on this run. |
| `os.name`, `os.version`, `os.release`, `os.arch` | Operating system and architecture (e.g., `Windows`, `10.0.26200`, `11`, `AMD64`). |
| `app_version` | ModelKit package version. |
| `app_version` | WinML CLI package version. |
| `app_instance_id` | A random UUID generated for this process only; not persisted. |
| `initTs` | Epoch timestamp when telemetry was initialized. |

Expand All @@ -80,7 +80,7 @@ not by the command code):

### Consent

On the first run of any command, ModelKit prompts:
On the first run of any command, WinML CLI prompts:

```
Enable telemetry? [Y/n]
Expand Down Expand Up @@ -125,15 +125,15 @@ variables are set, and no prompt is shown:
Events that fail to send (e.g., transient network errors) are cached
locally and retried on the next run. The cache file lives at:

`%USERPROFILE%\.winml\telemetry\modelkit.cache`
`%USERPROFILE%\.winml\telemetry\winmlcli.cache`

The cache is append-only on failure and drain-then-resend on recovery.
When telemetry is disabled, the cache is cleared so a disabled session
never resends events the user has since opted out of.

## Dev installs

ModelKit installed from source (`pip install -e .`) or run directly
WinML CLI installed from source (`pip install -e .`) or run directly
from a checkout never sends telemetry. The InstrumentationKey is blank
in source and is only populated by the official build pipeline. Only
official binary releases are capable of sending telemetry, and only
Expand Down
4 changes: 2 additions & 2 deletions docs/naming-convention.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ModelKit Naming Convention
# WinML CLI Naming Convention

This document defines the naming rules for the ModelKit codebase. All new code and refactored code must follow these conventions.
This document defines the naming rules for the WinML CLI codebase. All new code and refactored code must follow these conventions.

## 1. Acronyms in Class Names

Expand Down
8 changes: 4 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,10 @@ optional-dependencies.openvino = [ "openvino>=2023" ]
optional-dependencies.qnn = [
"onnxruntime-qnn>=1.24.1; python_version>='3.11'",
]
urls."Bug Tracker" = "https://github.com/microsoft/ModelKit/issues"
urls.Documentation = "https://github.com/microsoft/ModelKit/blob/main/README.md"
urls.Homepage = "https://github.com/microsoft/ModelKit"
urls.Repository = "https://github.com/microsoft/ModelKit.git"
urls."Bug Tracker" = "https://github.com/microsoft/WinML-ModelKit/issues"
urls.Documentation = "https://github.com/microsoft/WinML-ModelKit/blob/main/README.md"
urls.Homepage = "https://github.com/microsoft/WinML-ModelKit"
urls.Repository = "https://github.com/microsoft/WinML-ModelKit.git"
# =============================================================================
# SETUPTOOLS - Package Configuration (Flat Layout with Namespace Prefix)
# =============================================================================
Expand Down
2 changes: 1 addition & 1 deletion scripts/e2e_eval/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# E2E Evaluation Scripts

Batch-evaluate ModelKit's `winml perf` pipeline against a curated set of HuggingFace models.
Batch-evaluate WinML CLI's `winml perf` pipeline against a curated set of HuggingFace models.
Captures pass/fail, failure classification, and generates interactive reports.

## Quick Start
Expand Down
14 changes: 7 additions & 7 deletions scripts/e2e_eval/run_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,9 @@ def _get_timeout_skip_reason(hf_id: str, task: str) -> str:
)

_HF_CACHE = Path.home() / ".cache" / "huggingface"
_WML_CACHE = Path.home() / ".cache" / "winml"
_WINML_CACHE = Path.home() / ".cache" / "winml"
_TEMP_DIR = Path(os.environ.get("TEMP", os.environ.get("TMP", tempfile.gettempdir())))
_TEMP_PREFIXES = ("wmk_", "modelkit_compat_")
_TEMP_PREFIXES = ("winmlcli_", "winmlcli_compat_")


def _is_no_space_error(proc: dict) -> bool:
Expand All @@ -129,8 +129,8 @@ def _is_no_space_error(proc: dict) -> bool:


def _clear_disk_caches() -> None:
"""Delete HuggingFace, WML cache directories and leaked temp files."""
for cache_dir in (_HF_CACHE, _WML_CACHE):
"""Delete HuggingFace, WinML cache directories and leaked temp files."""
for cache_dir in (_HF_CACHE, _WINML_CACHE):
if cache_dir.exists():
safe_print(f" [cleanup] Removing cache: {cache_dir}")
try:
Expand All @@ -139,7 +139,7 @@ def _clear_disk_caches() -> None:
except OSError as exc:
safe_print(f" [cleanup] Warning: could not remove {cache_dir}: {exc}")

# Clean leaked temp directories/files (wmk_*, modelkit_compat_*, tmp*.onnx*)
# Clean leaked temp directories/files (winmlcli_*, winmlcli_compat_*, tmp*.onnx*)
if _TEMP_DIR.is_dir():
cleaned = 0
for entry in _TEMP_DIR.iterdir():
Expand Down Expand Up @@ -363,8 +363,8 @@ def _run_build(
config_path = model_dir / "build_config.json"
model_dir.mkdir(parents=True, exist_ok=True)

# Remove any stale suffixed sub-configs BEFORE `wmk config` runs.
# For composite models `wmk config` writes files matching {stem}_*.json
# Remove any stale suffixed sub-configs BEFORE `winml config` runs.
# For composite models `winml config` writes files matching {stem}_*.json
# (e.g., build_config_encoder.json); cleaning those AFTER the command would
# delete the freshly-written configs and silently degrade composite builds
# to single-model. Running cleanup first removes prior-run artifacts without
Expand Down
4 changes: 2 additions & 2 deletions scripts/e2e_eval/run_pytorch_baseline.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

Dataset config is read from ``utils/dataset_config.py`` — the authoritative
source shared with run_eval.py. When ``winml eval`` is implemented inside
ModelKit, it should import from the same location.
WinML CLI, it should import from the same location.

Output: prints a single JSON object as the last line on stdout:
{"metric": "<name>", "value": <float>, "num_samples": <int>}
Expand Down Expand Up @@ -59,8 +59,8 @@ def _emit_result(metric: str, value: float, num_samples: int) -> None:
def _load_pytorch_model(model_id: str, task: str, device_str: str):
"""Load a native PyTorch model with the task-appropriate AutoModel class."""
import torch

from transformers import AutoConfig

from winml.modelkit.loader.task import resolve_task_and_model_class

config = AutoConfig.from_pretrained(model_id)
Expand Down
Loading
Loading