feat: Named graph specializations in specializations.json (Prefill/Decode/Vision/Encoder/Embedding)#894
Merged
quic-rishinr merged 9 commits intoquic:mainfrom Apr 2, 2026
Conversation
quic-rishinr
requested changes
Mar 31, 2026
Add to_named_specializations() helper that converts flat specialization dicts to the {name, symbols} format requested by the backend compiler team.
Names are inferred from dict keys: Prefill/Decode (seq_len), Vision (vision_size/img_size/grid_*), Encoder (encoder_ctx_len), Embedding (sequence_length), with Graph_N as fallback.
Updated all three serialization sites: modeling_qeff.py (_compile), qnn_compiler.py, and compile_helper.py (create_and_dump_specializations).
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…ore reading fields Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…JSON write block, fixing this Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…he generic seq_len==1 → Decode rule Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Replace brittle key-sniffing heuristics with a _graph_name tag set at
the point of specialization creation, where the semantic context is known with certainty.
Changes:
- Add _graph_name tag in build_prefill/decode_specialization (Prefill/Decode)
- Tag vision _compile call site in modeling_auto.py (Vision)
- Tag Whisper get_specializations entries (Encoder/Decode)
- Tag QEFFAutoModel, QEFFAutoModelForSequenceClassification,
QEFFAutoModelForCTC compile() with Embedding/SeqClassification/CTC;
multi-seq_len lists get Embedding_0..N to avoid duplicate graph names
- Tag diffusers pipeline_utils with module name; Wan model_type entries
get transformer_model_type_1/2
- to_named_specializations reads _graph_name first, strips it from
symbols before serialization; seq_len heuristic retained as fallback
for raw user-supplied dicts only
- Remove specialization_module_name kwarg and all key-sniffing logic
All model families covered with no Graph_N in any supported path.
41 unit tests passing.
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
2d09ed1 to
7822563
Compare
quic-rishinr
approved these changes
Apr 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The backend compiler team requested a new specializations.json format where each entry carries a meaningful graph name (e.g. "Prefill", "Decode")
Changes
QEfficient/utils/_utils.py— new_infer_specialization_name()andto_named_specializations()helpersQEfficient/base/modeling_qeff.py—_compile()uses new formatQEfficient/compile/qnn_compiler.py— QNN path uses new formatQEfficient/compile/compile_helper.py— legacycreate_and_dump_specializations()uses new formatName inference rules
vision_size/img_size/grid_*, noseq_lenVisionencoder_ctx_len, noseq_lenEncodersequence_length, noseq_lenEmbeddingseq_len != 1Prefillseq_len == 1DecodeGraph_NTesting
21-unit tests added to
tests/unit_test/models/test_model_quickcheck.pycovering causal LM, continuous batching, VLM vision/language, Whisper, encoder/decoder, text embedding, and end-to-end JSON roundtrip.cc: @anujgupt-github