-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding vLLM inference entrypoints to Flamingo #18
Conversation
7bfa435
to
f5668a1
Compare
tests/unit/conftest.py
Outdated
@@ -22,6 +23,10 @@ def model_config_with_artifact(): | |||
return AutoModelConfig(load_from=artifact, trust_remote_code=True) | |||
|
|||
|
|||
def model_config_with_vllm(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def model_config_with_vllm(): | |
def inference_server_config(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fine, was following the pattern of model_config_with_repo_id
and model_config_with_artifact
@@ -26,6 +24,9 @@ def lm_harness_ray_config(): | |||
) | |||
|
|||
|
|||
""" Test for HuggingFace model""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these docstrings for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it slightly clearer why we're running the same test twice, alternatively I could also fold this into one paramterized test. actually, I think I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine then, I would just add the to within the test function body, not the middle of the file :D
) | ||
|
||
|
||
if request.param == "model_config_with_artifact": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this to be a bit cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, but fine to go into dev branch from this point.
.pre-commit-config.yaml
Outdated
- id: trailing-whitespace | ||
- id: end-of-file-fixer | ||
- id: requirements-txt-fixer | ||
exclude: requirements_lock.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line not necessary, since requirements_lock.txt
doesn't exist in this repo
I would also suggest in the future not having a singular e.g., |
gotcha, makes sense. i think we can adjust as need be based on what patterns we see as issues in future PRs - hopefully they'll become smaller as we go and we can use the dev branch as "staging" for what we want to release as as a whole 🤞 |
This PR adds an additional model loading class for running evaluations against a pre-loaded model available at an OpenAI-style inference endpoint. See code in source here for instructions on how to load a model at a vLLM inference endpoint.
It implements:
vllm
, that allows for local model loadingInferenceServerConfig
that implements a base_url pointing to the loaded inference pointNOTE: This code change does not allow jobs to work yet, as we still need to implement changes to
load_and_evaluate
in the lm_harness entrypoint in order to pass the model parameters correctly, like here:I don't want to block merges and this PR is big enough already , so if this is merged I'll follow up with this next PR.
See: https://mzai.atlassian.net/browse/RD2024-71