-
Notifications
You must be signed in to change notification settings - Fork 67
[MLI-4665] Update vllm upgrade process #713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
model-engine/model_engine_server/inference/vllm/build_and_upload_image.sh
Show resolved
Hide resolved
|
|
||
| def parse_args(parser: FlexibleArgumentParser): | ||
| parser = make_arg_parser(parser) | ||
| parser.add_argument("--attention-backend", type=str, help="The attention backend to use") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove run_server_worker and run_server, and just use run_server from vllm
| FORWARDER_STORAGE_LIMIT=FORWARDER_STORAGE_USAGE, | ||
| USER_CONTAINER_PORT=USER_CONTAINER_PORT, | ||
| FORWARDER_EXTRA_ROUTES=flavor.extra_routes, | ||
| FORWARDER_SYNC_ROUTES=[flavor.predict_route] + flavor.routes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add flavor.extra_routes here for backwards compatibility since the data models are saved to the database
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed a spot for sync routes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this change for CPU sync and stream endpoints as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're missing changes to the domain + database models
https://github.com/scaleapi/llm-engine/blob/55ff1dac87912b68a86c8f8e560a33847a7dcc99/model-engine/model_engine_server/domain/entities/model_bundle_entity.py#L154
https://github.com/scaleapi/llm-engine/blob/55ff1dac87912b68a86c8f8e560a33847a7dcc99/model-engine/model_engine_server/db/models/hosted_model_inference.py#L149
The database model change will require a db migration script to be created. I realize the Readme doesn't have instructions, but you should be able to follow https://alembic.sqlalchemy.org/en/latest/tutorial.html#running-our-second-migration. Could actually add that to db/migrations/README as well
| COMMAND=flavor.streaming_command, | ||
| PREDICT_ROUTE=flavor.predict_route, | ||
| STREAMING_PREDICT_ROUTE=flavor.streaming_predict_route, | ||
| # PREDICT_ROUTE=flavor.predict_route, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
…nhanced routes and we replaced it with ROUTES for the sync and streaming routes
…itonenhanced and lws ones that need it based on type. wont get used though
Pull Request Summary
What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.
Test Plan and Usage Guide
How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.
and then run tests:
and