-
Notifications
You must be signed in to change notification settings - Fork 1
feat: multi-model support with KV cache (T5, Qwen, Mu2) #334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
1b8da4c
re init branch
eed03eb
feat: fp16 KV cache support and session_options passthrough
d406434
revert: remove subfolder from config command (not part of this PR)
4488fd2
refactor: WinMLCache hierarchy with polymorphic interface
4ab05cd
docs: consolidate pipeline design docs into source file docstrings
cd8125e
refactor: sliding window cache outputs new-token KV (not full buffer)
d10a9e8
WinMLPipelineModel -> WinMLCompositeModel
2f0474c
Merge branch 'main' into reny/multi_model
cac83d8
feat: sliding window KV cache for Qwen3 + refactor cache interface
6551adb
refactor: polymorphic KV cache for decoder-only prefill + gen
989dd94
fix: remove unused _pad_inputs from decoder_only.py
9e42116
docs: add static cache switching instructions to mu2.py
0973dbd
feat: WinMLCompositeModel.from_onnx + from_pretrained composite routing
0d9d10e
feat: composite model support in run_eval.py + T5 summarization
a4cdecc
feat: remove_isnan_in_attention_mask surgery + optim configs for T5/Q…
e57b337
fix: enable DepthPro ONNX registration + update timeout skip list
d4be9dc
test: add unit tests for polymorphic KV cache and composite from_onnx
ad2f624
Merge remote-tracking branch 'origin/main' into reny/multi_model
80cb39a
Potential fix for pull request finding 'CodeQL / Cyclic import'
vortex-captain ebcb916
fix: resolve ruff F821/UP037 for WinMLCompositeModel type annotation
2000b69
refactor: move encoder_decoder and kv_cache from hf/ to winml/
b47d983
revert: undo clip.py change that slipped into refactor commit
44f1f20
refactor: make hf_config and sub_model_kwargs explicit params in from…
7b42d5b
refactor: move _pad_inputs to utils/data_utils.py with mode param
54bc808
Merge branch 'main' into reny/multi_model
ff3a8cd
add comment on winml build
50d29ce
fix naming
ede3764
refactor: _run_build returns onnx_paths as dict {label: path}
c90ac1d
feat: add google/flan-t5-base to e2e eval registry (translation + sum…
e970c8f
Revert "feat: add google/flan-t5-base to e2e eval registry (translati…
e11b953
Merge remote-tracking branch 'origin/main' into reny/multi_model
dfbc65e
Merge branch 'main' into reny/multi_model
3c76601
refactor: T5 uses WinMLSlidingWindowCache; const-fold relative-positi…
d28efe8
fix(models): defensive fixes from PR #334 review
tezheng e317536
fix(models): Phase 1 follow-up — 6 review findings + 2 critic fixes
tezheng 3b1a983
fix(models): Phase 2 follow-up — 9 review findings + critic regression
tezheng 5d2ce5f
Merge branch 'main' into reny/multi_model
4fa8c56
fix(models): I8 — widen WinMLCompositeModel.from_onnx onnx_path type
tezheng File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.