Skip to content

feat: Named graph specializations in specializations.json (Prefill/Decode/Vision/Encoder/Embedding)#894

Merged
quic-rishinr merged 9 commits intoquic:mainfrom
vbaddi:feat/enabling_nested_new_spec_format
Apr 2, 2026
Merged

feat: Named graph specializations in specializations.json (Prefill/Decode/Vision/Encoder/Embedding)#894
quic-rishinr merged 9 commits intoquic:mainfrom
vbaddi:feat/enabling_nested_new_spec_format

Conversation

@vbaddi
Copy link
Copy Markdown
Contributor

@vbaddi vbaddi commented Mar 27, 2026

Summary

The backend compiler team requested a new specializations.json format where each entry carries a meaningful graph name (e.g. "Prefill", "Decode")

Changes

  • QEfficient/utils/_utils.py — new _infer_specialization_name() and to_named_specializations() helpers
  • QEfficient/base/modeling_qeff.py_compile() uses new format
  • QEfficient/compile/qnn_compiler.py — QNN path uses new format
  • QEfficient/compile/compile_helper.py — legacy create_and_dump_specializations() uses new format

Name inference rules

Keys present Assigned name
vision_size / img_size / grid_*, no seq_len Vision
encoder_ctx_len, no seq_len Encoder
sequence_length, no seq_len Embedding
seq_len != 1 Prefill
seq_len == 1 Decode
anything else Graph_N

Testing

21-unit tests added to tests/unit_test/models/test_model_quickcheck.py covering causal LM, continuous batching, VLM vision/language, Whisper, encoder/decoder, text embedding, and end-to-end JSON roundtrip.

cc: @anujgupt-github

@vbaddi vbaddi self-assigned this Mar 27, 2026
@vbaddi vbaddi added the enhancement New feature or request label Mar 27, 2026
@vbaddi vbaddi requested a review from quic-rishinr March 31, 2026 10:33
vbaddi added 9 commits April 1, 2026 20:51
Add to_named_specializations() helper that converts flat specialization dicts to the {name, symbols} format requested by the backend compiler team.
Names are inferred from dict keys: Prefill/Decode (seq_len), Vision (vision_size/img_size/grid_*), Encoder (encoder_ctx_len), Embedding (sequence_length), with Graph_N as fallback.
Updated all three serialization sites: modeling_qeff.py (_compile), qnn_compiler.py, and compile_helper.py (create_and_dump_specializations).

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…ore reading fields

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…JSON write block, fixing this

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…he generic seq_len==1 → Decode rule

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Replace brittle key-sniffing heuristics with a _graph_name tag set at
the point of specialization creation, where the semantic context is known with certainty.

Changes:
  - Add _graph_name tag in build_prefill/decode_specialization (Prefill/Decode)
  - Tag vision _compile call site in modeling_auto.py (Vision)
  - Tag Whisper get_specializations entries (Encoder/Decode)
  - Tag QEFFAutoModel, QEFFAutoModelForSequenceClassification,
    QEFFAutoModelForCTC compile() with Embedding/SeqClassification/CTC;
    multi-seq_len lists get Embedding_0..N to avoid duplicate graph names
  - Tag diffusers pipeline_utils with module name; Wan model_type entries
    get transformer_model_type_1/2
  - to_named_specializations reads _graph_name first, strips it from
    symbols before serialization; seq_len heuristic retained as fallback
    for raw user-supplied dicts only
  - Remove specialization_module_name kwarg and all key-sniffing logic

  All model families covered with no Graph_N in any supported path.
  41 unit tests passing.

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@quic-rishinr quic-rishinr force-pushed the feat/enabling_nested_new_spec_format branch from 2d09ed1 to 7822563 Compare April 1, 2026 15:21
@quic-rishinr quic-rishinr merged commit cc07ab0 into quic:main Apr 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants