-
Notifications
You must be signed in to change notification settings - Fork 67
[MLI-4665] Update http forwarder for model engine #714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| protocol: Literal["http"] # TODO: add support for other protocols (e.g. grpc) | ||
| readiness_initial_delay_seconds: int = 120 | ||
| extra_routes: List[str] = Field(default_factory=list) | ||
| routes: Optional[List[str]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do the same as above with Field(default_factory=list) to make it non-optional in code
| forward_http_status: true | ||
| extra_routes: [] | ||
| extra_routes: [] # Legacy field - still supported for backwards compatibility | ||
| # routes: [] # New field - can be used alongside or instead of extra_routes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need to comment
| stream_forwarders[route] = load_streaming_forwarder(route) | ||
|
|
||
| # Add hardcoded routes to forwarders so they get handled consistently | ||
| sync_forwarders["/predict"] = load_forwarder(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gate this on whether sync.predict_route is provided, same with stream
| sync_forwarders: Dict[str, Forwarder] = dict() | ||
| stream_forwarders: Dict[str, StreamingForwarder] = dict() | ||
|
|
||
| # Handle legacy extra_routes configuration (backwards compatibility) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good idea to deduplicate routes (e.g. place them into a set) before initializing the forwarders
| passthrough_forwarders: Dict[str, PassthroughForwarder] = dict() | ||
|
|
||
| # Handle legacy extra_routes configuration (backwards compatibility) | ||
| for route in config.get("sync", {}).get("extra_routes", []): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, let's deduplicate and aggregate routes before creating the forwarders
| sync_routes_to_add.update(config.get("sync", {}).get("extra_routes", [])) | ||
| sync_routes_to_add.update(config.get("sync", {}).get("routes", [])) | ||
|
|
||
| if config.get("sync", {}).get("predict_route", None) is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe don't need the if statement anymore?
| protocol: Literal["http"] # TODO: add support for other protocols (e.g. grpc) | ||
| readiness_initial_delay_seconds: int = 120 | ||
| extra_routes: List[str] = Field(default_factory=list) | ||
| routes: Optional[List[str]] = Field(default_factory=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove 'Optional'
| sync_passthrough_routes_to_add = set() | ||
| sync_passthrough_routes_to_add.update(config.get("sync", {}).get("extra_routes", [])) | ||
| sync_passthrough_routes_to_add.update(config.get("sync", {}).get("routes", [])) | ||
| if config.get("sync", {}).get("predict_route", None) != "/predict": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, passthrough is a different case. We don't need predict_route for this. same with stream
| sync_routes_to_add.update(config.get("sync", {}).get("extra_routes", [])) | ||
| sync_routes_to_add.update(config.get("sync", {}).get("routes", [])) | ||
|
|
||
| if config.get("sync", {}).get("predict_route", None) == "/predict": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to add config.get("sync", {}).get("predict_route", None) to sync_routes_to_add, and not necessarily only if it's equal to predict
| app.add_api_route(path="/predict", endpoint=predict, methods=["POST"]) | ||
| app.add_api_route(path="/stream", endpoint=stream, methods=["POST"]) | ||
| # app.add_api_route(path="/predict", endpoint=predict, methods=["POST"]) | ||
| # app.add_api_route(path="/stream", endpoint=stream, methods=["POST"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
…eapi/llm-engine into meher-m/vllm-upgrade-http-forwarder
Pull Request Summary
What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.
Update the http forwarder for model engine to have a new
routesfield. For now this is used in the same way asextra_routesand we no longer hard code adding /predict and /stream. The end state will be removing /predict and /stream and using the forwarder just as a passthrough for any endpoint specified inroutes(getting rid ofextra_routes). This is the first step towards that while maintaining backwards compatibility so we don't force stakeholders to migrate for now.Test Plan and Usage Guide
How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.
Start the server
Test 1
start the forwarder using
extra_routesGIT_TAG=test python model_engine_server/inference/forwarding/http_forwarder.py \ --config model_engine_server/inference/configs/service--http_forwarder.yaml \ --num-workers 1 \ --set "forwarder.sync.extra_routes=['/v1/chat/completions','/v1/completions']" \ --set "forwarder.stream.extra_routes=['/v1/chat/completions','/v1/completions']" \ --set "forwarder.sync.healthcheck_route=/health" \ --set "forwarder.stream.healthcheck_route=/health"Test a curl command
works
Test 2
start the forwarder using
routesGIT_TAG=test python model_engine_server/inference/forwarding/http_forwarder.py \ --config model_engine_server/inference/configs/service--http_forwarder.yaml \ --num-workers 1 \ --set "forwarder.sync.routes=['/v1/chat/completions','/v1/completions']" \ --set "forwarder.stream.routes=['/v1/chat/completions','/v1/completions']" \ --set "forwarder.sync.healthcheck_route=/health" \ --set "forwarder.stream.healthcheck_route=/health"test the same CURL command and it still works.
Test 3
started the forwarder using no
routesGIT_TAG=test python model_engine_server/inference/forwarding/http_forwarder.py \ --config model_engine_server/inference/configs/service--http_forwarder.yaml \ --num-workers 1 \ --set "forwarder.sync.healthcheck_route=/health" \ --set "forwarder.stream.healthcheck_route=/health"Confirmed that the same CURL request failed as expected.