Skip to content

0.2.2

Compare
Choose a tag to compare
@alpayariyak alpayariyak released this 31 Jan 05:35
· 62 commits to main since this release

Worker vLLM 0.2.2 - What's New

  • Custom Chat Templates: you may now specify a Jinja chat template with an environment variable.
  • Custom Tokenizer

Fixes:

  • Tensor Parallel/Multi-GPU Deployment
  • Baking Model into the image. Previously, the worker would download the model every time, ignoring the baked in model.
  • Crashes due to MAX_PARALLEL_LOADING_WORKERS