Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
1b8da4c
re init branch
Apr 14, 2026
eed03eb
feat: fp16 KV cache support and session_options passthrough
Apr 14, 2026
d406434
revert: remove subfolder from config command (not part of this PR)
Apr 14, 2026
4488fd2
refactor: WinMLCache hierarchy with polymorphic interface
Apr 15, 2026
4ab05cd
docs: consolidate pipeline design docs into source file docstrings
Apr 15, 2026
cd8125e
refactor: sliding window cache outputs new-token KV (not full buffer)
Apr 15, 2026
d10a9e8
WinMLPipelineModel -> WinMLCompositeModel
Apr 15, 2026
2f0474c
Merge branch 'main' into reny/multi_model
Apr 15, 2026
cac83d8
feat: sliding window KV cache for Qwen3 + refactor cache interface
Apr 15, 2026
6551adb
refactor: polymorphic KV cache for decoder-only prefill + gen
Apr 16, 2026
989dd94
fix: remove unused _pad_inputs from decoder_only.py
Apr 16, 2026
9e42116
docs: add static cache switching instructions to mu2.py
Apr 16, 2026
0973dbd
feat: WinMLCompositeModel.from_onnx + from_pretrained composite routing
Apr 16, 2026
0d9d10e
feat: composite model support in run_eval.py + T5 summarization
Apr 16, 2026
a4cdecc
feat: remove_isnan_in_attention_mask surgery + optim configs for T5/Q…
Apr 16, 2026
e57b337
fix: enable DepthPro ONNX registration + update timeout skip list
Apr 16, 2026
d4be9dc
test: add unit tests for polymorphic KV cache and composite from_onnx
Apr 16, 2026
ad2f624
Merge remote-tracking branch 'origin/main' into reny/multi_model
Apr 16, 2026
80cb39a
Potential fix for pull request finding 'CodeQL / Cyclic import'
vortex-captain Apr 16, 2026
ebcb916
fix: resolve ruff F821/UP037 for WinMLCompositeModel type annotation
Apr 16, 2026
2000b69
refactor: move encoder_decoder and kv_cache from hf/ to winml/
Apr 17, 2026
b47d983
revert: undo clip.py change that slipped into refactor commit
Apr 17, 2026
44f1f20
refactor: make hf_config and sub_model_kwargs explicit params in from…
Apr 17, 2026
7b42d5b
refactor: move _pad_inputs to utils/data_utils.py with mode param
Apr 17, 2026
54bc808
Merge branch 'main' into reny/multi_model
Apr 17, 2026
ff3a8cd
add comment on winml build
Apr 17, 2026
50d29ce
fix naming
Apr 17, 2026
ede3764
refactor: _run_build returns onnx_paths as dict {label: path}
Apr 17, 2026
c90ac1d
feat: add google/flan-t5-base to e2e eval registry (translation + sum…
Apr 17, 2026
e970c8f
Revert "feat: add google/flan-t5-base to e2e eval registry (translati…
Apr 17, 2026
e11b953
Merge remote-tracking branch 'origin/main' into reny/multi_model
Apr 17, 2026
dfbc65e
Merge branch 'main' into reny/multi_model
Apr 20, 2026
3c76601
refactor: T5 uses WinMLSlidingWindowCache; const-fold relative-positi…
Apr 20, 2026
d28efe8
fix(models): defensive fixes from PR #334 review
tezheng Apr 22, 2026
e317536
fix(models): Phase 1 follow-up — 6 review findings + 2 critic fixes
tezheng Apr 22, 2026
3b1a983
fix(models): Phase 2 follow-up — 9 review findings + critic regression
tezheng Apr 22, 2026
5d2ce5f
Merge branch 'main' into reny/multi_model
Apr 23, 2026
4fa8c56
fix(models): I8 — widen WinMLCompositeModel.from_onnx onnx_path type
tezheng Apr 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 20 additions & 15 deletions scripts/e2e_eval/cache/timeout_skip_list.json
Original file line number Diff line number Diff line change
Expand Up @@ -29,21 +29,6 @@
"task": "translation",
"reason": "Build hangs >14000s, m2m_100 enc-dec export issue"
},
{
"hf_id": "google-t5/t5-3b",
"task": "translation",
"reason": "Build hangs >14000s, t5 enc-dec + network download issue"
},
{
"hf_id": "google-t5/t5-base",
"task": "summarization",
"reason": "Build hangs >2000s, t5 enc-dec export issue"
},
{
"hf_id": "google-t5/t5-small",
"task": "translation",
"reason": "Build hangs >8000s, t5 enc-dec export issue"
},
{
"hf_id": "knkarthick/MEETING_SUMMARY",
"task": "summarization",
Expand All @@ -58,5 +43,25 @@
"hf_id": "philschmid/bart-large-cnn-samsum",
"task": "summarization",
"reason": "Build hangs >10000s, bart enc-dec export issue"
},
{
"hf_id": "apple/DepthPro-hf",
"task": "depth-estimation",
"reason": "OOM in quantization (model too large for in-process quantize)"
},
{
"hf_id": "Qwen/Qwen3-0.6B",
Comment thread
vortex-captain marked this conversation as resolved.
Comment thread
vortex-captain marked this conversation as resolved.
"task": "text-generation",
"reason": "OOM in quantization (segfault during quantize, 2.9GB model)"
},
{
"hf_id": "Qwen/Qwen3-1.7B",
"task": "text-generation",
"reason": "OOM in quantization (model too large for in-process quantize)"
},
{
"hf_id": "Qwen/Qwen3-8B",
"task": "text-generation",
"reason": "OOM in quantization (model too large for in-process quantize)"
}
]
Loading
Loading