0.2.2

alpayariyak released this 31 Jan 05:35

· 62 commits to main since this release

Worker vLLM 0.2.2 - What's New

Custom Chat Templates: you may now specify a Jinja chat template with an environment variable.
Custom Tokenizer

Fixes:

Tensor Parallel/Multi-GPU Deployment
Baking Model into the image. Previously, the worker would download the model every time, ignoring the baked in model.
Crashes due to MAX_PARALLEL_LOADING_WORKERS

Assets 2