feat: Named graph specializations in specializations.json (Prefill/Decode/Vision/Encoder/Embedding) by vbaddi · Pull Request #894 · quic/efficient-transformers

vbaddi · 2026-03-27T07:08:50Z

Summary

The backend compiler team requested a new specializations.json format where each entry carries a meaningful graph name (e.g. "Prefill", "Decode")

Changes

QEfficient/utils/_utils.py — new _infer_specialization_name() and to_named_specializations() helpers
QEfficient/base/modeling_qeff.py — _compile() uses new format
QEfficient/compile/qnn_compiler.py — QNN path uses new format
QEfficient/compile/compile_helper.py — legacy create_and_dump_specializations() uses new format

Name inference rules

Keys present	Assigned name
`vision_size` / `img_size` / `grid_*`, no `seq_len`	`Vision`
`encoder_ctx_len`, no `seq_len`	`Encoder`
`sequence_length`, no `seq_len`	`Embedding`
`seq_len != 1`	`Prefill`
`seq_len == 1`	`Decode`
anything else	`Graph_N`

Testing

21-unit tests added to tests/unit_test/models/test_model_quickcheck.py covering causal LM, continuous batching, VLM vision/language, Whisper, encoder/decoder, text embedding, and end-to-end JSON roundtrip.

cc: @anujgupt-github

QEfficient/utils/_utils.py

QEfficient/generation/text_generation_inference.py

QEfficient/compile/compile_helper.py

Add to_named_specializations() helper that converts flat specialization dicts to the {name, symbols} format requested by the backend compiler team. Names are inferred from dict keys: Prefill/Decode (seq_len), Vision (vision_size/img_size/grid_*), Encoder (encoder_ctx_len), Embedding (sequence_length), with Graph_N as fallback. Updated all three serialization sites: modeling_qeff.py (_compile), qnn_compiler.py, and compile_helper.py (create_and_dump_specializations). Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

…ore reading fields Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

…JSON write block, fixing this Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

…he generic seq_len==1 → Decode rule Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Replace brittle key-sniffing heuristics with a _graph_name tag set at the point of specialization creation, where the semantic context is known with certainty. Changes: - Add _graph_name tag in build_prefill/decode_specialization (Prefill/Decode) - Tag vision _compile call site in modeling_auto.py (Vision) - Tag Whisper get_specializations entries (Encoder/Decode) - Tag QEFFAutoModel, QEFFAutoModelForSequenceClassification, QEFFAutoModelForCTC compile() with Embedding/SeqClassification/CTC; multi-seq_len lists get Embedding_0..N to avoid duplicate graph names - Tag diffusers pipeline_utils with module name; Wan model_type entries get transformer_model_type_1/2 - to_named_specializations reads _graph_name first, strips it from symbols before serialization; seq_len heuristic retained as fallback for raw user-supplied dicts only - Remove specialization_module_name kwarg and all key-sniffing logic All model families covered with no Graph_N in any supported path. 41 unit tests passing. Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi requested review from quic-hemagnih and quic-rishinr March 27, 2026 07:08

vbaddi self-assigned this Mar 27, 2026

vbaddi added the enhancement New feature or request label Mar 27, 2026

quic-rishinr requested changes Mar 31, 2026

View reviewed changes

vbaddi requested a review from quic-rishinr March 31, 2026 10:33

vbaddi added 9 commits April 1, 2026 20:51

nit: get_compilation_dims: resolve the symbols dict transparently bef…

cd88c05

…ore reading fields Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: enable the named spec. for diffusers too

b3aefb8

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: extend to all the VLMs

7a40e64

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: pecialization_module_name was popped inside the specializations …

cb6d16d

…JSON write block, fixing this Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: added a feature_len check in _infer_specialization_name before t…

29e97e9

…he generic seq_len==1 → Decode rule Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: update the review comments and add additional test case

7fbdae6

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: skip the tests from the samplers in CI

7822563

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

quic-rishinr force-pushed the feat/enabling_nested_new_spec_format branch from 2d09ed1 to 7822563 Compare April 1, 2026 15:21

quic-rishinr approved these changes Apr 1, 2026

View reviewed changes

quic-rishinr merged commit cc07ab0 into quic:main Apr 2, 2026
5 checks passed

quic-rishinr mentioned this pull request Apr 2, 2026

Revert "feat: Named graph specializations in specializations.json (Prefill/Decode/Vision/Encoder/Embedding)" #902

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Named graph specializations in specializations.json (Prefill/Decode/Vision/Encoder/Embedding)#894

feat: Named graph specializations in specializations.json (Prefill/Decode/Vision/Encoder/Embedding)#894
quic-rishinr merged 9 commits intoquic:mainfrom
vbaddi:feat/enabling_nested_new_spec_format

vbaddi commented Mar 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vbaddi commented Mar 27, 2026

Summary

Changes

Name inference rules

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants