update vllm patch to v0.2.6 and add regex support for `/generate` #481

tscholak · 2023-12-24T13:53:12Z

closes #478

update outlines.serve to vllm-v0.2.6 and ray-2.9.0
simplify monkey patching
don't use nest_asyncio with uvloop.Loop (see Can't patch loop of type <class 'uvloop.Loop'> erdewit/nest_asyncio#49)
add regex support for /generate
simplify/deduplicae vllm integration example by using outlines.serve.vllm code
update documentation
integration tests

Here's a Dockerfile that I used for testing:

# syntax=docker/dockerfile:1.2

# Use the vllm/vllm-openai image with the specified tag
ARG BASE_TAG=latest
FROM vllm/vllm-openai:${BASE_TAG}

# Install git and openssh-client, then clean up to keep the image small
RUN apt-get update && \
    apt-get install -y git openssh-client python3.10-venv && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Clone the repository, checkout the specific commit, build the source distribution, install it, and clean up
ARG COMMIT
RUN --mount=type=secret,id=ssh_key \
    mkdir -p -m 0700 ~/.ssh && \
    ssh-keyscan github.com >> ~/.ssh/known_hosts && \
    cp /run/secrets/ssh_key ~/.ssh/id_rsa && \
    chmod 600 ~/.ssh/id_rsa && \
    git clone git@github.com:tscholak/outlines.git && \
    cd outlines && \
    git checkout ${COMMIT} && \
    python3 -m pip install build && \
    python3 -m build && \
    pip install dist/outlines-*.whl && \
    cd .. && \
    rm -rf outlines ~/.ssh /root/.cache/pip

# Install ray with pydantic-2.5 support and clean up
RUN pip install ray==2.9.0 && \
    rm -rf /root/.cache/pip

and the accompanying Makefile:

TAG=v0.2.6
COMMIT=c4e0b5ae2e97ec0377c726aa65a2207c939377de

# Build the Docker image with a configurable tag
build:
	DOCKER_BUILDKIT=1 \
	docker build \
	--secret id=ssh_key,src=$(SSH_KEY) \
	--build-arg BASE_TAG=$(TAG) \
	--build-arg COMMIT=$(COMMIT) \
	-t $(FULL_IMAGE_NAME) \
	.

tscholak · 2023-12-24T15:47:32Z

problem with `nest_asyncio` and uvloop

/usr/local/lib/python3.10/dist-packages/outlines/fsm/regex.py:474: NumbaPendingDeprecationWarning: 
Encountered the use of a type that is scheduled for deprecation: type 'reflected set' found for argument 'fsm_finals' of function '_walk_fsm'.

For more information visit https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types

File "../usr/local/lib/python3.10/dist-packages/outlines/fsm/regex.py", line 226:
@numba.njit(nogil=True, cache=True)
def _walk_fsm(
^

  state_seq = _walk_fsm(
/usr/local/lib/python3.10/dist-packages/numba/core/ir_utils.py:2172: NumbaPendingDeprecationWarning: 
Encountered the use of a type that is scheduled for deprecation: type 'reflected set' found for argument 'fsm_finals' of function 'state_scan_tokens'.

For more information visit https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types

File "../usr/local/lib/python3.10/dist-packages/outlines/fsm/regex.py", line 462:
@numba.njit(cache=True, nogil=True)
def state_scan_tokens(
^

  warnings.warn(NumbaPendingDeprecationWarning(msg, loc=loc))
INFO 12-24 15:45:08 async_llm_engine.py:379] Received request 891d2ad827604ec28c56f22d882ae045: prompt: 'What is the capital of France?', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt token ids: None.
Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7fe7969551b0>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7fe786667520>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7fe7969551b0>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7fe786667520>)>
Traceback (most recent call last):
  File "/workspace/vllm/engine/async_llm_engine.py", line 28, in _raise_exception_on_finish
    task.result()
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 234, in __step
    result = coro.throw(exc)
  File "/workspace/vllm/engine/async_llm_engine.py", line 359, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/workspace/vllm/engine/async_llm_engine.py", line 338, in engine_step
    request_outputs = await self.engine.step_async()
  File "/workspace/vllm/engine/async_llm_engine.py", line 191, in step_async
    output = await self._run_workers_async(
  File "/workspace/vllm/engine/async_llm_engine.py", line 219, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
    future.result()
  File "/usr/lib/python3.10/asyncio/tasks.py", line 234, in __step
    result = coro.throw(exc)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
    return (yield from awaitable.__await__())
INFO 12-24 15:45:08 async_llm_engine.py:134] Aborted request 891d2ad827604ec28c56f22d882ae045.
  File "/usr/lib/python3.10/asyncio/futures.py", line 285, in __await__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
    future.result()
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
ray.exceptions.RayTaskError(RaySystemError): ray::RayWorkerVllm.execute_method() (pid=762, ip=10.210.140.102, actor_id=81008059c117df77dfd1d76001000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7fb61c7604f0>)
  At least one of the input arguments for this task could not be computed:
ray.exceptions.RaySystemError: System error: Can't patch loop of type <class 'uvloop.Loop'>
traceback: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/outlines/__init__.py", line 3, in <module>
INFO:     10.210.93.238:40022 - "POST /generate HTTP/1.1" 500 Internal Server Error
    import outlines.models
  File "/usr/local/lib/python3.10/dist-packages/outlines/models/__init__.py", line 12, in <module>
    from .openai import OpenAI, openai
  File "/usr/local/lib/python3.10/dist-packages/outlines/models/openai.py", line 11, in <module>
    from outlines.base import vectorize
  File "/usr/local/lib/python3.10/dist-packages/outlines/base.py", line 16, in <module>
    nest_asyncio.apply()
  File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 19, in apply
    loop = loop or asyncio.get_event_loop()
  File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 41, in _get_event_loop
    loop = events.get_event_loop_policy().get_event_loop()
  File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 68, in get_event_loop
    _patch_loop(loop)
  File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 184, in _patch_loop
    raise ValueError('Can\'t patch loop of type %s' % type(loop))
ValueError: Can't patch loop of type <class 'uvloop.Loop'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/workspace/vllm/engine/async_llm_engine.py", line 37, in _raise_exception_on_finish
    raise exc
  File "/workspace/vllm/engine/async_llm_engine.py", line 32, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/workspace/vllm/engine/async_llm_engine.py", line 28, in _raise_exception_on_finish
    task.result()
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 234, in __step
    result = coro.throw(exc)
  File "/workspace/vllm/engine/async_llm_engine.py", line 359, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/workspace/vllm/engine/async_llm_engine.py", line 338, in engine_step
    request_outputs = await self.engine.step_async()
  File "/workspace/vllm/engine/async_llm_engine.py", line 191, in step_async
    output = await self._run_workers_async(
  File "/workspace/vllm/engine/async_llm_engine.py", line 219, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
    future.result()
  File "/usr/lib/python3.10/asyncio/tasks.py", line 234, in __step
    result = coro.throw(exc)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/usr/lib/python3.10/asyncio/futures.py", line 285, in __await__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.10/asyncio/tasks.py", line 304, in __wakeup
    future.result()
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
ray.exceptions.RayTaskError(RaySystemError): ray::RayWorkerVllm.execute_method() (pid=762, ip=10.210.140.102, actor_id=81008059c117df77dfd1d76001000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7fb61c7604f0>)
  At least one of the input arguments for this task could not be computed:
ray.exceptions.RaySystemError: System error: Can't patch loop of type <class 'uvloop.Loop'>
traceback: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/outlines/__init__.py", line 3, in <module>
    import outlines.models
  File "/usr/local/lib/python3.10/dist-packages/outlines/models/__init__.py", line 12, in <module>
    from .openai import OpenAI, openai
  File "/usr/local/lib/python3.10/dist-packages/outlines/models/openai.py", line 11, in <module>
    from outlines.base import vectorize
  File "/usr/local/lib/python3.10/dist-packages/outlines/base.py", line 16, in <module>
    nest_asyncio.apply()
  File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 19, in apply
    loop = loop or asyncio.get_event_loop()
  File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 41, in _get_event_loop
    loop = events.get_event_loop_policy().get_event_loop()
  File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 68, in get_event_loop
    _patch_loop(loop)
  File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 184, in _patch_loop
    raise ValueError('Can\'t patch loop of type %s' % type(loop))
ValueError: Can't patch loop of type <class 'uvloop.Loop'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1106, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 274, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/local/lib/python3.10/dist-packages/outlines/serve/serve.py", line 84, in generate
    async for request_output in results_generator:
  File "/workspace/vllm/engine/async_llm_engine.py", line 445, in generate
    raise e
  File "/workspace/vllm/engine/async_llm_engine.py", line 439, in generate
    async for request_output in stream:
  File "/workspace/vllm/engine/async_llm_engine.py", line 70, in __anext__
    raise result
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/workspace/vllm/engine/async_llm_engine.py", line 37, in _raise_exception_on_finish
    raise exc
  File "/workspace/vllm/engine/async_llm_engine.py", line 32, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
(RayWorkerVllm pid=762) Can't patch loop of type <class 'uvloop.Loop'>
(RayWorkerVllm pid=762) Traceback (most recent call last):
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py", line 404, in deserialize_objects
(RayWorkerVllm pid=762)     obj = self._deserialize_object(data, metadata, object_ref)
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py", line 270, in _deserialize_object
(RayWorkerVllm pid=762)     return self._deserialize_msgpack_data(data, metadata_fields)
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py", line 225, in _deserialize_msgpack_data
(RayWorkerVllm pid=762)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py", line 215, in _deserialize_pickle5_data
(RayWorkerVllm pid=762)     obj = pickle.loads(in_band)
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/outlines/__init__.py", line 3, in <module>
(RayWorkerVllm pid=762)     import outlines.models
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/outlines/models/__init__.py", line 12, in <module>
(RayWorkerVllm pid=762)     from .openai import OpenAI, openai
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/outlines/models/openai.py", line 11, in <module>
(RayWorkerVllm pid=762)     from outlines.base import vectorize
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/outlines/base.py", line 16, in <module>
(RayWorkerVllm pid=762)     nest_asyncio.apply()
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 19, in apply
(RayWorkerVllm pid=762)     loop = loop or asyncio.get_event_loop()
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 41, in _get_event_loop
(RayWorkerVllm pid=762)     loop = events.get_event_loop_policy().get_event_loop()
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 68, in get_event_loop
(RayWorkerVllm pid=762)     _patch_loop(loop)
(RayWorkerVllm pid=762)   File "/usr/local/lib/python3.10/dist-packages/nest_asyncio.py", line 184, in _patch_loop
(RayWorkerVllm pid=762)     raise ValueError('Can\'t patch loop of type %s' % type(loop))
(RayWorkerVllm pid=762) ValueError: Can't patch loop of type <class 'uvloop.Loop'>
(RayWorkerVllm pid=762) [W CUDAGraph.cpp:145] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())

tscholak · 2023-12-24T22:02:09Z

some success:

INFO 12-24 21:56:26 llm_engine.py:73] Initializing an LLM engine with config: model='/mnt/llmd/base_models/Mistral-7B-Instruct-v0.2', tokenizer='/mnt/llmd/base_models/Mistral-7B-Instruct-v0.2', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, enforce_eager=False, seed=0)
INFO 12-24 21:57:40 llm_engine.py:223] # GPU blocks: 25213, # CPU blocks: 2048
INFO 12-24 21:57:42 model_runner.py:394] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 12-24 21:57:48 model_runner.py:437] Graph capturing finished in 6 secs.
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
/usr/local/lib/python3.10/dist-packages/outlines/fsm/regex.py:474: NumbaPendingDeprecationWarning: 
Encountered the use of a type that is scheduled for deprecation: type 'reflected set' found for argument 'fsm_finals' of function '_walk_fsm'.

For more information visit https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types

File "../usr/local/lib/python3.10/dist-packages/outlines/fsm/regex.py", line 226:
@numba.njit(nogil=True, cache=True)
def _walk_fsm(
^

  state_seq = _walk_fsm(
/usr/local/lib/python3.10/dist-packages/numba/core/ir_utils.py:2172: NumbaPendingDeprecationWarning: 
Encountered the use of a type that is scheduled for deprecation: type 'reflected set' found for argument 'fsm_finals' of function 'state_scan_tokens'.

For more information visit https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types

File "../usr/local/lib/python3.10/dist-packages/outlines/fsm/regex.py", line 462:
@numba.njit(cache=True, nogil=True)
def state_scan_tokens(
^

  warnings.warn(NumbaPendingDeprecationWarning(msg, loc=loc))
INFO 12-24 21:58:38 async_llm_engine.py:379] Received request c96271e42d38463b91549220e3c26e81: prompt: 'What is the capital of France?', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt token ids: None.
INFO 12-24 21:58:38 llm_engine.py:653] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 12-24 21:58:38 async_llm_engine.py:111] Finished request c96271e42d38463b91549220e3c26e81.
INFO:     10.210.93.238:40684 - "POST /generate HTTP/1.1" 200 OK

At least it doesn't crash anymore.

outlines/base.py

tscholak · 2023-12-26T17:50:50Z

everything works

rlouf

Should we default to an empty list of logits processors in case neither a schema nor a regex string are provided?

We also need to update the documentation to reflect the changes before merging.

Besides this everything looks good.

tscholak · 2023-12-26T20:16:09Z

@rlouf thanks for the review! I made some changes based on your feedback

rlouf · 2023-12-29T17:53:53Z

Great contribution, thank you!

tscholak mentioned this pull request Dec 25, 2023

Refactor FSM idx tracking during batch generation #449

Closed

rlouf reviewed Dec 25, 2023

View reviewed changes

outlines/base.py Show resolved Hide resolved

tscholak changed the title ~~update vllm patch to v0.2.6~~ update vllm patch to v0.2.6 and add regex support for /generate Dec 26, 2023

tscholak mentioned this pull request Dec 26, 2023

Support regexs in vLLM #478

Closed

tscholak requested a review from rlouf December 26, 2023 18:10

rlouf reviewed Dec 26, 2023

View reviewed changes

tscholak marked this pull request as ready for review December 26, 2023 20:14

tscholak added 3 commits December 29, 2023 18:47

Update vllm patch to v0.2.6

cbbbb3a

Add regex support to vLLM endpoint

29b28a0

Update the documentation

845449f

rlouf force-pushed the vllm-v0.2.6 branch from 857fdc4 to 845449f Compare December 29, 2023 17:48

rlouf merged commit 5f6166a into outlines-dev:main Dec 29, 2023
5 checks passed

dzhulgakov mentioned this pull request Jan 10, 2024

Patch nest_asyncio in notebooks only #520

Merged

BenoitHardier mentioned this pull request Jan 11, 2024

VLLM tensor-parallel and RegexLogitsProcessor #524

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update vllm patch to v0.2.6 and add regex support for `/generate` #481

update vllm patch to v0.2.6 and add regex support for `/generate` #481

tscholak commented Dec 24, 2023 •

edited

Loading

tscholak commented Dec 24, 2023 •

edited

Loading

tscholak commented Dec 24, 2023

tscholak commented Dec 26, 2023

rlouf left a comment •

edited

Loading

tscholak commented Dec 26, 2023

rlouf commented Dec 29, 2023

update vllm patch to v0.2.6 and add regex support for /generate #481

update vllm patch to v0.2.6 and add regex support for /generate #481

Conversation

tscholak commented Dec 24, 2023 • edited Loading

tscholak commented Dec 24, 2023 • edited Loading

tscholak commented Dec 24, 2023

tscholak commented Dec 26, 2023

rlouf left a comment • edited Loading

Choose a reason for hiding this comment

tscholak commented Dec 26, 2023

rlouf commented Dec 29, 2023

update vllm patch to v0.2.6 and add regex support for `/generate` #481

update vllm patch to v0.2.6 and add regex support for `/generate` #481

tscholak commented Dec 24, 2023 •

edited

Loading

tscholak commented Dec 24, 2023 •

edited

Loading

rlouf left a comment •

edited

Loading