Skip to content

Skip loading external weight data during static analysis.#379

Merged
chinazhangchao merged 18 commits into
mainfrom
chao/largemodel
Apr 28, 2026
Merged

Skip loading external weight data during static analysis.#379
chinazhangchao merged 18 commits into
mainfrom
chao/largemodel

Conversation

@chinazhangchao

@chinazhangchao chinazhangchao commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Problem

Running winml analyze on large models with external data (e.g., Qwen3-8B with a 30.5 GB .data sidecar) causes the process to consume all available memory and disk, hanging indefinitely. The analyzer called onnx.load(path, load_external_data=True), loading the entire weight file into RAM despite never inspecting weight values.

Root Cause

The static analyzer only needs graph structure (operator types, shapes, connectivity, and small embedded constants) to perform op-support checks. Three call sites were loading or attempting to access the full weight tensors unnecessarily:

  1. ONNXStaticAnalyzer.analyze() — explicitly passed load_external_data=True
  2. ONNXLoader.load() — called bare onnx.load() which defaults to load_external_data=True
  3. RuntimeCheckerQuery — called numpy_helper.to_array() on every initializer and embedded full TensorProtos into single-node models

Changes

File Change
analyzer.py load_external_data=TrueFalse
onnx_loader.py Add explicit load_external_data=False
runtime_checker_query.py For external-data initializers: extract shape from dims instead of to_array(), and emit graph inputs instead of embedding empty tensors in single-node models

Impact

  • Models with external data (Qwen3-8B, Llama, etc.) now use ~MB instead of ~30+ GB RAM
  • No behavior change for models with inline weights (the data_location != EXTERNAL path is unchanged)
  • 849 unit tests pass, 0 failures

Performance (Improve 39.65%)

Model Operators Time
Qwen/Qwen3-8B 5333 57.36s
dbmdz/bert-large-cased-finetuned-conll03-english 2663 28.395s

@chinazhangchao chinazhangchao changed the title Do not load external data when analyze Skip loading external weight data during static analysis. Apr 22, 2026
@chinazhangchao chinazhangchao marked this pull request as ready for review April 22, 2026 07:37
@chinazhangchao chinazhangchao requested a review from a team as a code owner April 22, 2026 07:37
Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py Outdated
Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py Outdated
Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good fix for a real and impactful problem — static analysis never needs multi-GB weight tensors, and the three-site fix is comprehensive. A few items to address before merging.

Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py
Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py Outdated
Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py
Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py
chinazhangchao and others added 5 commits April 23, 2026 14:03
Co-authored-by: vortex-captain <75063846+vortex-captain@users.noreply.github.com>
Co-authored-by: vortex-captain <75063846+vortex-captain@users.noreply.github.com>
Comment thread tests/unit/analyze/core/test_runtime_checker.py Fixed
Comment thread tests/unit/analyze/core/test_runtime_checker_query_helpers.py Dismissed
chinazhangchao and others added 8 commits April 23, 2026 03:05
…ith 'import' and 'import from''

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ith 'import' and 'import from''

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work addressing the previous round of feedback — the is_constant=False fix, test coverage additions, and the raw_data comment all look solid. The three-site fix is correct and comprehensive. A few new items for this round:

Non-inline note — _collect_node_tags / ALL_INPUTS_CONSTANT:
External-data initializers are still in self.initializers, so a node whose inputs are all unloaded external weights would be tagged ALL_INPUTS_CONSTANT even though weight data is unavailable. This is informational-only and doesn't affect runtime check results, but could confuse future debugging. Consider filtering external-data initializers without loaded data in that check.

Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py
Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py
Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py
@chinazhangchao chinazhangchao enabled auto-merge (squash) April 28, 2026 02:51
@chinazhangchao chinazhangchao disabled auto-merge April 28, 2026 02:51
@chinazhangchao chinazhangchao enabled auto-merge (squash) April 28, 2026 02:55
@chinazhangchao chinazhangchao merged commit 1877cd4 into main Apr 28, 2026
9 checks passed
@chinazhangchao chinazhangchao deleted the chao/largemodel branch April 28, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants