0.2.2
Worker vLLM 0.2.2 - What's New
- Custom Chat Templates: you may now specify a Jinja chat template with an environment variable.
- Custom Tokenizer
Fixes:
- Tensor Parallel/Multi-GPU Deployment
- Baking Model into the image. Previously, the worker would download the model every time, ignoring the baked in model.
- Crashes due to
MAX_PARALLEL_LOADING_WORKERS