Skip to content

Commit

Permalink
preparation for 1.0.0 release
Browse files Browse the repository at this point in the history
  • Loading branch information
alpayariyak committed Jun 12, 2024
1 parent bad5ddd commit 2fe2f39
Show file tree
Hide file tree
Showing 5 changed files with 9 additions and 11 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
ARG WORKER_CUDA_VERSION=11.8.0
ARG BASE_IMAGE_VERSION=1.0.0preview
ARG BASE_IMAGE_VERSION=1.0.0
FROM runpod/worker-vllm:base-${BASE_IMAGE_VERSION}-cuda${WORKER_CUDA_VERSION} AS vllm-base

RUN apt-get update -y \
Expand Down
11 changes: 5 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,15 @@ Deploy OpenAI-Compatible Blazing-Fast LLM Endpoints powered by the [vLLM](https:
### 1. UI for Deploying vLLM Worker on RunPod console:
![Demo of Deploying vLLM Worker on RunPod console with new UI](media/ui_demo.gif)

### 2. Worker vLLM `1.0.0preview` with vLLM `0.4.2` now available under `stable` tags
Update 1.0.0preview is now available, use the image tag `runpod/worker-vllm:dev-cuda12.1.0` or `runpod/worker-vllm:dev-cuda11.8.0`.
### 2. Worker vLLM `1.0.0` with vLLM `0.4.2` now available under `stable` tags
Update 1.0.0 is now available, use the image tag `runpod/worker-vllm:stable-cuda12.1.0` or `runpod/worker-vllm:stable-cuda11.8.0`.

**Main Changes:**
- vLLM was updated from version `0.3.3` to `0.4.2`, adding compatibility for Llama 3 and other models, as well as increasing performance.
### 3. OpenAI-Compatible [Embedding Worker](https://github.com/runpod-workers/worker-infinity-embedding) Released
Deploy your own OpenAI-compatible Serverless Endpoint on RunPod with multiple embedding models and fast inference for RAG and more!

We will soon be adding more features from the updates, such as multi-LoRA, multi-modality, and more.


### 3. Caching Accross RunPod Machines
### 4. Caching Accross RunPod Machines
Worker vLLM is now cached on all RunPod machines, resulting in near-instant deployment! Previously, downloading and extracting the image took 3-5 minutes on average.


Expand Down
2 changes: 1 addition & 1 deletion docker-bake.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ variable "REPOSITORY" {
}

variable "BASE_IMAGE_VERSION" {
default = "1.0.0preview"
default = "1.0.0"
}

group "all" {
Expand Down
2 changes: 1 addition & 1 deletion vllm-base-image/vllm
3 changes: 1 addition & 2 deletions vllm-base-image/vllm-metadata.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
version: '0.3.3'
version: '0.4.2'
dev_version: '0.4.2'
worker_dev_version: '1.0.0preview'

0 comments on commit 2fe2f39

Please sign in to comment.