Release v1.4.2 · mohitsoni48/TurboLLM

Bugfix release — vLLM (safetensors) models now load and chat correctly.

Chat on vLLM no longer fails with "Engine returned 400." Tool definitions were attached to every engine, but vLLM rejects a tools array unless launched with --enable-auto-tool-choice + a --tool-call-parser. Tools are now sent only to engines that accept them (the llama.cpp family). Tool-calling on vLLM remains unsupported for now.
Correct quant classification for vLLM/safetensors models. Compressed-tensors checkpoints were mislabeled as MLX fp16; the quant is now read from quantization_config (e.g. w4a16), so the model card shows the real quant instead of "MLX".
The vLLM "Max model length" control is settable again. Multimodal configs nest max_position_embeddings under text_config; the scanner now reads it, so a model's native context length is no longer reported as 0 (which had clamped the input to 0).

Provide feedback