Question about serving Qwen3.5 text-only SFT model saved as Qwen3_5ForCausalLM #27807
-
|
Hi everyone, I have a question about serving a fine-tuned Qwen3.5 model with SGLang. We are doing text-only SFT on Qwen3.5 using Transformers. During training, we load the base model with: model = AutoModelForCausalLM.from_pretrained(
args.model_name_or_path,
trust_remote_code=True,
dtype=torch.bfloat16 if args.bf16 else None,
attn_implementation=args.attn_implementation,
)After SFT and save_pretrained(), the saved config.json contains: However, when we try to serve this fine-tuned model with SGLang, the server fails with an error like: I checked the SGLang source code and noticed that Qwen3_5ForCausalLM is defined in qwen3_5.py, but it does not seem to be registered as an entry class. The entry classes are: The official Qwen3.5-9B model seems to use: So I am trying to understand the expected behavior here. My questions are:
For now, my understanding is that Qwen3_5ForCausalLM is used internally as the language model body, while Qwen3_5ForConditionalGeneration is the expected full model entry point for serving in SGLang. Therefore, the most reasonable workaround might be to restore the saved config architecture to: assuming the rest of the config and weights are still consistent with the original Qwen3.5 model. Could someone confirm whether this is the recommended approach? Any guidance would be appreciated. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
This is a known issue with how save_pretrained() saves the architecture name. SGLang registers its models under Qwen3_5ForConditionalGeneration, but when you do text-only SFT with AutoModelForCausalLM, the saved config uses Qwen3_5ForCausalLM. Fix: Edit your saved config.json and change the architectures field to Qwen3_5ForConditionalGeneration. This tells SGLang to use its optimized implementation instead of falling back to Transformers. Alternatively, you can launch with --trust-remote-code flag to allow the Transformers fallback, but you lose SGLang-specific performance benefits like FlashInfer attention. For text-only SFT, the model architecture is functionally identical - the only difference is the class name in config.json. |
Beta Was this translation helpful? Give feedback.
This is a known issue with how save_pretrained() saves the architecture name. SGLang registers its models under Qwen3_5ForConditionalGeneration, but when you do text-only SFT with AutoModelForCausalLM, the saved config uses Qwen3_5ForCausalLM.
Fix: Edit your saved config.json and change the architectures field to Qwen3_5ForConditionalGeneration. This tells SGLang to use its optimized implementation instead of falling back to Transformers.
Alternatively, you can launch with --trust-remote-code flag to allow the Transformers fallback, but you lose SGLang-specific performance benefits like FlashInfer attention.
For text-only SFT, the model architecture is functionally identical - the only di…