Adding vLLM inference entrypoints to Flamingo #18

veekaybee · 2024-01-29T21:32:53Z

This PR adds an additional model loading class for running evaluations against a pre-loaded model available at an OpenAI-style inference endpoint. See code in source here for instructions on how to load a model at a vLLM inference endpoint.

It implements:

a new set of integrations, vllm, that allows for local model loading
a class, InferenceServerConfig that implements a base_url pointing to the loaded inference point
Additional unit tests for this new config
Some ruff formatting cleanup of spaces and adding a precommit
Putting requirements.txt in .gitignore since we only work with them locally, as explained in CONTRIBUTING.md and the examples

NOTE: This code change does not allow jobs to work yet, as we still need to implement changes to load_and_evaluate in the lm_harness entrypoint in order to pass the model parameters correctly, like here:

lm_eval --model local-chat-completions --tasks gsm8k --model_args model=facebook/opt-125m,base_url=http://{yourip}:8000/v1

I don't want to block merges and this PR is big enough already , so if this is merged I'll follow up with this next PR.

See: https://mzai.atlassian.net/browse/RD2024-71

src/flamingo/jobs/lm_harness/config.py

src/flamingo/integrations/vllm/model_config.py

tests/integration/conftest.py

.pre-commit-config.yaml

examples/configs/lm_harness_vllm_config.yaml

examples/configs/lm_harness_hf_config.yaml

src/flamingo/integrations/vllm/__init__.py

src/flamingo/integrations/vllm/model_config.py

src/flamingo/jobs/lm_harness/entrypoint.py

sfriedowitz · 2024-01-31T16:35:41Z

tests/unit/conftest.py

@@ -22,6 +23,10 @@ def model_config_with_artifact():
    return AutoModelConfig(load_from=artifact, trust_remote_code=True)


+def model_config_with_vllm():


Suggested change

def model_config_with_vllm():

def inference_server_config():

that's fine, was following the pattern of model_config_with_repo_id and model_config_with_artifact

sfriedowitz · 2024-01-31T16:35:58Z

tests/unit/jobs/test_lm_harness_config.py

@@ -26,6 +24,9 @@ def lm_harness_ray_config():
    )


+""" Test for HuggingFace model"""


What are these docstrings for?

to make it slightly clearer why we're running the same test twice, alternatively I could also fold this into one paramterized test. actually, I think I'll do that.

That's fine then, I would just add the to within the test function body, not the middle of the file :D

examples/configs/lm_harness_vllm_config.yaml

veekaybee · 2024-01-31T17:48:26Z

tests/unit/jobs/test_lm_harness_config.py

-    )
-
-
+    if request.param == "model_config_with_artifact":


changed this to be a bit cleaner

sfriedowitz

Minor comment, but fine to go into dev branch from this point.

sfriedowitz · 2024-01-31T19:01:12Z

.pre-commit-config.yaml

+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: requirements-txt-fixer
+        exclude: requirements_lock.txt


This line not necessary, since requirements_lock.txt doesn't exist in this repo

sfriedowitz · 2024-01-31T19:08:38Z

I would also suggest in the future not having a singular development branch for the entire repo, since multiple people working on separate tickets would cause conflicts trying to merge into the dev branch. It's probably best to keep dev branches isolated to features/mini-projects, and merge a handful of related PRs into that branch before finally bringing it together with main.

e.g., dev/vicky/infernce-server and then a series of PRs into that branch as the work evolves

veekaybee · 2024-01-31T19:15:22Z

gotcha, makes sense. i think we can adjust as need be based on what patterns we see as issues in future PRs - hopefully they'll become smaller as we go and we can use the dev branch as "staging" for what we want to release as as a whole 🤞

veekaybee added 4 commits January 30, 2024 15:49

initial eval server commit

59c7827

Adding Pydantic vllm types

72f0973

example file

0c048fe

simplifying vllm config

f5668a1

veekaybee force-pushed the vicki/flamingo-inference branch from 7bfa435 to f5668a1 Compare January 30, 2024 20:51

veekaybee added 4 commits January 30, 2024 15:52

fix None type

b57df40

vllm conf example

0aa4f85

fix text

117c328

fix test

d946c82

binaryaaron reviewed Jan 31, 2024

View reviewed changes

src/flamingo/jobs/lm_harness/config.py Show resolved Hide resolved

src/flamingo/integrations/vllm/model_config.py Show resolved Hide resolved

veekaybee added 3 commits January 31, 2024 06:21

unit tests

7665c56

adding more unit tests

d565631

requirements and precommit

7b1651f

veekaybee commented Jan 31, 2024

View reviewed changes

tests/integration/conftest.py Outdated Show resolved Hide resolved

fix eval name

a14250c

veekaybee commented Jan 31, 2024

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

cleanup

3d40a37

veekaybee marked this pull request as ready for review January 31, 2024 14:55

veekaybee requested a review from sfriedowitz January 31, 2024 14:56

sfriedowitz reviewed Jan 31, 2024

View reviewed changes

examples/configs/lm_harness_vllm_config.yaml Outdated Show resolved Hide resolved

refactor unit test and precommit

a50d6e6

veekaybee changed the base branch from main to development January 31, 2024 17:45

veekaybee commented Jan 31, 2024

View reviewed changes

change inference server to accept tokenizer

402f2f7

sfriedowitz approved these changes Jan 31, 2024

View reviewed changes

precommit

91caf14

veekaybee merged commit ee3bceb into development Jan 31, 2024
1 check passed

veekaybee deleted the vicki/flamingo-inference branch January 31, 2024 19:15

sfriedowitz mentioned this pull request Jan 31, 2024

Wire up artifact support for Inference Server lm-harness #22

Closed

This was referenced Jan 31, 2024

Enable vLLM inference in Flamingo Part 2 #23

Merged

Fully enabling inference server evaluation with vLLM in Flamingo #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding vLLM inference entrypoints to Flamingo #18

Adding vLLM inference entrypoints to Flamingo #18

veekaybee commented Jan 29, 2024 •

edited

sfriedowitz Jan 31, 2024

veekaybee Jan 31, 2024

sfriedowitz Jan 31, 2024

veekaybee Jan 31, 2024

sfriedowitz Jan 31, 2024

veekaybee Jan 31, 2024 •

edited

sfriedowitz left a comment

sfriedowitz Jan 31, 2024

sfriedowitz commented Jan 31, 2024

veekaybee commented Jan 31, 2024

		@@ -22,6 +23,10 @@ def model_config_with_artifact():
		return AutoModelConfig(load_from=artifact, trust_remote_code=True)


		def model_config_with_vllm():

		@@ -26,6 +24,9 @@ def lm_harness_ray_config():
		)


		""" Test for HuggingFace model"""

Adding vLLM inference entrypoints to Flamingo #18

Adding vLLM inference entrypoints to Flamingo #18

Conversation

veekaybee commented Jan 29, 2024 • edited

sfriedowitz Jan 31, 2024

Choose a reason for hiding this comment

veekaybee Jan 31, 2024

Choose a reason for hiding this comment

sfriedowitz Jan 31, 2024

Choose a reason for hiding this comment

veekaybee Jan 31, 2024

Choose a reason for hiding this comment

sfriedowitz Jan 31, 2024

Choose a reason for hiding this comment

veekaybee Jan 31, 2024 • edited

Choose a reason for hiding this comment

sfriedowitz left a comment

Choose a reason for hiding this comment

sfriedowitz Jan 31, 2024

Choose a reason for hiding this comment

sfriedowitz commented Jan 31, 2024

veekaybee commented Jan 31, 2024

veekaybee commented Jan 29, 2024 •

edited

veekaybee Jan 31, 2024 •

edited