Skip to content
Discussion options

You must be logged in to vote

This is a known issue with how save_pretrained() saves the architecture name. SGLang registers its models under Qwen3_5ForConditionalGeneration, but when you do text-only SFT with AutoModelForCausalLM, the saved config uses Qwen3_5ForCausalLM.

Fix: Edit your saved config.json and change the architectures field to Qwen3_5ForConditionalGeneration. This tells SGLang to use its optimized implementation instead of falling back to Transformers.

Alternatively, you can launch with --trust-remote-code flag to allow the Transformers fallback, but you lose SGLang-specific performance benefits like FlashInfer attention.

For text-only SFT, the model architecture is functionally identical - the only di…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@nchennnn
Comment options

Answer selected by nchennnn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants