Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding vLLM inference entrypoints to Flamingo #18

Merged
merged 16 commits into from
Jan 31, 2024

Conversation

veekaybee
Copy link
Member

@veekaybee veekaybee commented Jan 29, 2024

This PR adds an additional model loading class for running evaluations against a pre-loaded model available at an OpenAI-style inference endpoint. See code in source here for instructions on how to load a model at a vLLM inference endpoint.

It implements:

  • a new set of integrations, vllm, that allows for local model loading
  • a class, InferenceServerConfig that implements a base_url pointing to the loaded inference point
  • Additional unit tests for this new config
  • Some ruff formatting cleanup of spaces and adding a precommit
  • Putting requirements.txt in .gitignore since we only work with them locally, as explained in CONTRIBUTING.md and the examples

NOTE: This code change does not allow jobs to work yet, as we still need to implement changes to load_and_evaluate in the lm_harness entrypoint in order to pass the model parameters correctly, like here:

lm_eval --model local-chat-completions --tasks gsm8k --model_args model=facebook/opt-125m,base_url=http://{yourip}:8000/v1

I don't want to block merges and this PR is big enough already , so if this is merged I'll follow up with this next PR.

See: https://mzai.atlassian.net/browse/RD2024-71

.pre-commit-config.yaml Outdated Show resolved Hide resolved
@veekaybee veekaybee marked this pull request as ready for review January 31, 2024 14:55
.pre-commit-config.yaml Show resolved Hide resolved
examples/configs/lm_harness_vllm_config.yaml Outdated Show resolved Hide resolved
examples/configs/lm_harness_hf_config.yaml Outdated Show resolved Hide resolved
src/flamingo/integrations/vllm/__init__.py Outdated Show resolved Hide resolved
src/flamingo/jobs/lm_harness/entrypoint.py Show resolved Hide resolved
@@ -22,6 +23,10 @@ def model_config_with_artifact():
return AutoModelConfig(load_from=artifact, trust_remote_code=True)


def model_config_with_vllm():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def model_config_with_vllm():
def inference_server_config():

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fine, was following the pattern of model_config_with_repo_id and model_config_with_artifact

@@ -26,6 +24,9 @@ def lm_harness_ray_config():
)


""" Test for HuggingFace model"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these docstrings for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make it slightly clearer why we're running the same test twice, alternatively I could also fold this into one paramterized test. actually, I think I'll do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine then, I would just add the to within the test function body, not the middle of the file :D

@veekaybee veekaybee changed the base branch from main to development January 31, 2024 17:45
)


if request.param == "model_config_with_artifact":
Copy link
Member Author

@veekaybee veekaybee Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this to be a bit cleaner

Copy link
Contributor

@sfriedowitz sfriedowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, but fine to go into dev branch from this point.

- id: trailing-whitespace
- id: end-of-file-fixer
- id: requirements-txt-fixer
exclude: requirements_lock.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line not necessary, since requirements_lock.txt doesn't exist in this repo

@sfriedowitz
Copy link
Contributor

I would also suggest in the future not having a singular development branch for the entire repo, since multiple people working on separate tickets would cause conflicts trying to merge into the dev branch. It's probably best to keep dev branches isolated to features/mini-projects, and merge a handful of related PRs into that branch before finally bringing it together with main.

e.g., dev/vicky/infernce-server and then a series of PRs into that branch as the work evolves

@veekaybee
Copy link
Member Author

gotcha, makes sense. i think we can adjust as need be based on what patterns we see as issues in future PRs - hopefully they'll become smaller as we go and we can use the dev branch as "staging" for what we want to release as as a whole 🤞

@veekaybee veekaybee merged commit ee3bceb into development Jan 31, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants