[MLI-4665] Update http forwarder for model engine #714

meher-m · 2025-09-23T22:00:37Z

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Update the http forwarder for model engine to have a new routes field. For now this is used in the same way as extra_routes and we no longer hard code adding /predict and /stream. The end state will be removing /predict and /stream and using the forwarder just as a passthrough for any endpoint specified in routes (getting rid of extra_routes). This is the first step towards that while maintaining backwards compatibility so we don't force stakeholders to migrate for now.

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

Start the server

(base) ➜  vllm git:(meher-m/vllm-upgrade-http-forwarder) ✗ export TARGET_TAG=0.10.2-test-rc1                                                                                                         
(base) ➜  vllm git:(meher-m/vllm-upgrade-http-forwarder) ✗ export IMAGE=692474966980.dkr.ecr.us-west-2.amazonaws.com/vllm:${TARGET_TAG}                                                              
                                                                                                                                                                                                     
(base) ➜  vllm git:(meher-m/vllm-upgrade-http-forwarder) ✗ export MODEL=meta-llama/Meta-Llama-3.1-8B-Instruct && export MODEL_PATH=/data/model_files/$MODEL                                          
(base) ➜  vllm git:(meher-m/vllm-upgrade-http-forwarder) ✗ export REPO_PATH=/mnt/efs/mehermankikar                                                                                                   
(base) ➜  vllm git:(meher-m/vllm-upgrade-http-forwarder) ✗ docker kill vllm; docker rm vllm;  
(base) ➜  vllm git:(meher-m/vllm-upgrade-http-forwarder) ✗ docker run \
    --runtime nvidia \
    --shm-size=16gb \
    --gpus '"device=0,1,2,3"' \
    -v $MODEL_PATH:/workspace/model_files:ro -v /data/dmchoi:/data:ro \
    -p 5005:5005 \
    --name vllm \
    ${IMAGE} \
    python -m vllm_server --model model_files --served-model-name $MODEL model_files  --tensor-parallel-size 4 --port 5005 --disable-log-requests --uvicorn-log-level info --gpu-memory-utilization 0.8 --enforce-eager

Test 1

start the forwarder using extra_routes

GIT_TAG=test python model_engine_server/inference/forwarding/http_forwarder.py \
    --config model_engine_server/inference/configs/service--http_forwarder.yaml \
    --num-workers 1 \
    --set "forwarder.sync.extra_routes=['/v1/chat/completions','/v1/completions']" \
    --set "forwarder.stream.extra_routes=['/v1/chat/completions','/v1/completions']" \
    --set "forwarder.sync.healthcheck_route=/health" \
    --set "forwarder.stream.healthcheck_route=/health"

Test a curl command

curl -X POST localhost:5000/v1/chat/completions  -H "Content-Type: application/json" -d "{\"args\": {\"model\":\"/data/model_files/$MODEL\", \"messages\":[{\"role\": \"systemr\", \"content\": \"Hey, what's the temperature in Paris right now?\"}],\"max_tokens\":100,\"temperature\":0.2,\"guided_regex\":\"Sean.*\"}}"

works

Test 2

start the forwarder using routes

GIT_TAG=test python model_engine_server/inference/forwarding/http_forwarder.py \
    --config model_engine_server/inference/configs/service--http_forwarder.yaml \
    --num-workers 1 \
    --set "forwarder.sync.routes=['/v1/chat/completions','/v1/completions']" \
    --set "forwarder.stream.routes=['/v1/chat/completions','/v1/completions']" \
    --set "forwarder.sync.healthcheck_route=/health" \
    --set "forwarder.stream.healthcheck_route=/health"

test the same CURL command and it still works.

Test 3

started the forwarder using no routes

GIT_TAG=test python model_engine_server/inference/forwarding/http_forwarder.py \
    --config model_engine_server/inference/configs/service--http_forwarder.yaml \
    --num-workers 1 \
    --set "forwarder.sync.healthcheck_route=/health" \
    --set "forwarder.stream.healthcheck_route=/health"

Confirmed that the same CURL request failed as expected.

(base) ➜  llm-engine git:(meher-m/vllm-upgrade-http-forwarder) ✗ curl -X POST localhost:5000/v1/chat/completions  -H "Content-Type: application/json" -d "{\"args\": {\"model\":\"$MODEL\", \"messages\":[{\"role\": \"systemr\", \"content\": \"Hey, what's the temperature in Paris right now?\"}],\"max_tokens\":100,\"temperature\":0.2,\"guided_regex\":\"Sean.*\"}}"
{"detail":"Not Found"}%

dmchoiboi · 2025-09-23T22:26:05Z

model-engine/model_engine_server/domain/entities/model_bundle_entity.py

    protocol: Literal["http"]  # TODO: add support for other protocols (e.g. grpc)
    readiness_initial_delay_seconds: int = 120
    extra_routes: List[str] = Field(default_factory=list)
+    routes: Optional[List[str]] = None


You can do the same as above with Field(default_factory=list) to make it non-optional in code

dmchoiboi · 2025-09-23T22:26:23Z

model-engine/model_engine_server/inference/configs/service--http_forwarder.yaml

    forward_http_status: true
-    extra_routes: []
+    extra_routes: []  # Legacy field - still supported for backwards compatibility
+    # routes: []     # New field - can be used alongside or instead of extra_routes


don't need to comment

dmchoiboi · 2025-09-23T22:27:31Z

model-engine/model_engine_server/inference/forwarding/http_forwarder.py

+                stream_forwarders[route] = load_streaming_forwarder(route)
+
+        # Add hardcoded routes to forwarders so they get handled consistently
+        sync_forwarders["/predict"] = load_forwarder(None)


gate this on whether sync.predict_route is provided, same with stream

dmchoiboi · 2025-09-23T22:28:35Z

model-engine/model_engine_server/inference/forwarding/http_forwarder.py

        sync_forwarders: Dict[str, Forwarder] = dict()
        stream_forwarders: Dict[str, StreamingForwarder] = dict()
+
+        # Handle legacy extra_routes configuration (backwards compatibility)


Might be a good idea to deduplicate routes (e.g. place them into a set) before initializing the forwarders

dmchoiboi · 2025-09-23T22:29:57Z

model-engine/model_engine_server/inference/forwarding/http_forwarder.py

        passthrough_forwarders: Dict[str, PassthroughForwarder] = dict()
+
+        # Handle legacy extra_routes configuration (backwards compatibility)
        for route in config.get("sync", {}).get("extra_routes", []):


same here, let's deduplicate and aggregate routes before creating the forwarders

meher-m · 2025-09-23T22:36:16Z

model-engine/model_engine_server/inference/forwarding/http_forwarder.py

+        sync_routes_to_add.update(config.get("sync", {}).get("extra_routes", []))
+        sync_routes_to_add.update(config.get("sync", {}).get("routes", []))
+
+        if config.get("sync", {}).get("predict_route", None) is None:


maybe don't need the if statement anymore?

dmchoiboi · 2025-09-23T23:53:39Z

model-engine/model_engine_server/domain/entities/model_bundle_entity.py

    protocol: Literal["http"]  # TODO: add support for other protocols (e.g. grpc)
    readiness_initial_delay_seconds: int = 120
    extra_routes: List[str] = Field(default_factory=list)
+    routes: Optional[List[str]] = Field(default_factory=list)


remove 'Optional'

dmchoiboi · 2025-09-23T23:54:33Z

model-engine/model_engine_server/inference/forwarding/http_forwarder.py

+        sync_passthrough_routes_to_add = set()
+        sync_passthrough_routes_to_add.update(config.get("sync", {}).get("extra_routes", []))
+        sync_passthrough_routes_to_add.update(config.get("sync", {}).get("routes", []))
+        if config.get("sync", {}).get("predict_route", None) != "/predict":


oh, passthrough is a different case. We don't need predict_route for this. same with stream

dmchoiboi · 2025-09-24T17:51:00Z

model-engine/model_engine_server/inference/forwarding/http_forwarder.py

+        sync_routes_to_add.update(config.get("sync", {}).get("extra_routes", []))
+        sync_routes_to_add.update(config.get("sync", {}).get("routes", []))
+
+        if config.get("sync", {}).get("predict_route", None) == "/predict":


I think we want to add config.get("sync", {}).get("predict_route", None) to sync_routes_to_add, and not necessarily only if it's equal to predict

dmchoiboi · 2025-09-24T17:51:38Z

model-engine/model_engine_server/inference/forwarding/http_forwarder.py

-    app.add_api_route(path="/predict", endpoint=predict, methods=["POST"])
-    app.add_api_route(path="/stream", endpoint=stream, methods=["POST"])
+    # app.add_api_route(path="/predict", endpoint=predict, methods=["POST"])
+    # app.add_api_route(path="/stream", endpoint=stream, methods=["POST"])


…eapi/llm-engine into meher-m/vllm-upgrade-http-forwarder

initial code not clean

b173b8d

meher-m self-assigned this Sep 23, 2025

adding /predict and /stream like the other routes?

657d558

dmchoiboi reviewed Sep 23, 2025

View reviewed changes

meher-m added 2 commits September 23, 2025 22:34

revisions

d0ae24b

fix

9841d33

meher-m commented Sep 23, 2025

View reviewed changes

meher-m added 2 commits September 23, 2025 22:38

comments

df4a53d

add default paths in passthrough also

4fbc822

dmchoiboi reviewed Sep 23, 2025

View reviewed changes

meher-m added 4 commits September 24, 2025 16:26

fix and tested

87c5f9f

revisions

9733132

reformat

977dff6

fix for unit test

1b129b4

meher-m requested a review from dmchoiboi September 24, 2025 17:19

dmchoiboi reviewed Sep 24, 2025

View reviewed changes

dmchoiboi approved these changes Sep 24, 2025

View reviewed changes

meher-m added 2 commits September 24, 2025 18:11

remove comments

3002c02

revise

081be7c

meher-m changed the title ~~Update http forwarder for model engine~~ [MLI-4665] Update http forwarder for model engine Sep 24, 2025

dmchoiboi approved these changes Sep 24, 2025

View reviewed changes

Merge branch 'main' into meher-m/vllm-upgrade-http-forwarder

a54eedf

meher-m enabled auto-merge (squash) September 24, 2025 18:29

test update

3748c75

Merge branch 'meher-m/vllm-upgrade-http-forwarder' of github.com:scal…

6718dd1

…eapi/llm-engine into meher-m/vllm-upgrade-http-forwarder

meher-m merged commit 546eeff into main Sep 24, 2025
7 checks passed

meher-m deleted the meher-m/vllm-upgrade-http-forwarder branch September 24, 2025 19:58

meher-m mentioned this pull request Sep 24, 2025

[MLI-4665] Remove comments from http forwarder #717

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLI-4665] Update http forwarder for model engine #714

[MLI-4665] Update http forwarder for model engine #714

meher-m commented Sep 23, 2025 •

edited

Loading

Uh oh!

dmchoiboi Sep 23, 2025

Uh oh!

dmchoiboi Sep 23, 2025

Uh oh!

dmchoiboi Sep 23, 2025

Uh oh!

dmchoiboi Sep 23, 2025

Uh oh!

dmchoiboi Sep 23, 2025

Uh oh!

meher-m Sep 23, 2025

Uh oh!

dmchoiboi Sep 23, 2025

Uh oh!

dmchoiboi Sep 23, 2025

Uh oh!

dmchoiboi Sep 24, 2025 •

edited

Loading

Uh oh!

dmchoiboi Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[MLI-4665] Update http forwarder for model engine #714

[MLI-4665] Update http forwarder for model engine #714

Conversation

meher-m commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Summary

Test Plan and Usage Guide

Test 1

Test 2

Test 3

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmchoiboi Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

meher-m commented Sep 23, 2025 •

edited

Loading

dmchoiboi Sep 24, 2025 •

edited

Loading