Implement ChatModel (pyfunc subclass) #10820

daniellok-db · 2024-01-15T05:36:05Z

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10820/merge

Checkout with GitHub CLI

gh pr checkout 10820

Related Issues/PRs

What changes are proposed in this pull request?

This PR adds the ChatModel subclass to make it more seamless for users to implement and serve chat models. The ChatModel class requires users to fill out a predict method of the following type (corresponding to the OpenAI chat request format):

class MyChatModel(mlflow.pyfunc.ChatModel):
    def predict(self, context, messages: List[ChatMessage], params: ChatParams) -> ChatResponse:
        # user-defined behavior

This makes it so that the user doesn't have to implement any parsing logic, and can directly work with the pydantic objects that are passed in. Additionally, input/output signatures and an input example are automatically provided.

To support this, we implement a new custom loader for these types of models, defined in mlflow.pyfunc.loaders.chat_model. This loader wraps the ChatModel in a _ChatModelPyfuncWrapper class that accepts the standard chat request format, and breaks it up into messages and params for the user.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Ran the following to create a chat model:

class TestChatModel(mlflow.pyfunc.ChatModel):
    def predict(self, context, messages: List[ChatMessage], params: ChatParams) -> ChatResponse:
        mock_response = {
            "id": "123",
            "object": "chat.completion",
            "created": 1677652288,
            "model": "MyChatModel",
            "choices": [
                {
                    "index": 0,
                    "message": {
                        "role": "assistant",
                        "content": json.dumps([m.model_dump(exclude_none=True) for m in messages]),
                    },
                    "finish_reason": "stop",
                },
                {
                    "index": 1,
                    "message": {
                        "role": "user",
                        "content": params.model_dump_json(exclude_none=True),
                    },
                    "finish_reason": "stop",
                },
            ],
            "usage": {
                "prompt_tokens": 10,
                "completion_tokens": 10,
                "total_tokens": 20,
            },
        }
        return ChatResponse(**mock_response)

mlflow.pyfunc.save_model(
    path="chat-model",
    python_model=TestChatModel(),
)

Then on the command line:

$ mlflow models serve -m chat-model

$ curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{ "messages": [ { "role": "system", "content": "You are a helpful assistant" }, { "role": "user", "content": "Hello!" } ] }' | jq

{
  "id": "123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "MyChatModel",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "[{\"role\": \"system\", \"content\": \"You are a helpful assistant\"}, {\"role\": \"user\", \"content\": \"Hello!\"}]"
      },
      "finish_reason": "stop"
    },
    {
      "index": 1,
      "message": {
        "role": "user",
        "content": "{\"temperature\":1.0,\"n\":1,\"stream\":false}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 10,
    "total_tokens": 20
  }
}

Also tried viewing the model in MLflow UI:

Validate that the MLmodel file looks as expected

Validate that the signature looks correct:

Screen.Recording.2024-01-15.at.12.57.57.PM.mov

Does this PR require documentation update?

Requires a tutorial, but we can work on this in a follow-up PR

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Added the ChatModel pyfunc class, which allows for more convenient definition of chat models conforming to the OpenAI request/response format.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

github-actions · 2024-01-15T05:36:24Z

Documentation preview for 4075860 will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/7752490505.

mlflow/types/llm.py

mlflow/pyfunc/loaders/chat_model.py

mlflow/types/llm.py

harupy · 2024-01-25T06:00:41Z

mlflow/types/llm.py

+    usage: TokenUsageStats
+    object: str = "chat.completion"
+    created: int = field(default_factory=lambda: int(time.time()))
+    id: str = field(default_factory=lambda: str(uuid.uuid4()))


Is there any constraint for id (e.g. must start with "chatcmpl-") in OpenAI?

No, but I don't think we should make up random IDs here that don't have meaning. Can we leave this as None for now?

changed it to None! my initial thought was that if people want it to have meaning, they can specify the ID directly when instantiating ChatRequest, e.g. ChatRequest(id=meaningful_id, ...) still works, but for people who just want it to be a UUID, this saves them a couple of lines of code

daniellok-db · 2024-01-26T03:15:48Z

mlflow/types/llm.py

+#       is not supported, so the code here is a little ugly.
+
+
+class _BaseDataclass:


all this validation logic is mainly to support the output validation done here.

input validation shouldn't really be an issue, because it's handled by signature validation.

daniellok-db · 2024-01-26T03:17:07Z

mlflow/types/llm.py

+    :param role:    The role of the entity that sent the message (e.g. ``"user"``, ``"system"``).
+    :type role:     str
+    :param content: The content of the message.
+    :type content:  str
+    :param name:    The name of the entity that sent the message. **Optional**
+    :type name:     str


i can unindent all of this stuff to be consistent with the rest of the codebase, but i like the way docstrings looked when they're aligned haha

Can we use the google docstring style?

daniellok-db · 2024-01-26T03:20:08Z

mlflow/pyfunc/model.py

+def _get_pyfunc_loader_module(python_model):
+    if isinstance(python_model, ChatModel):
+        return mlflow.pyfunc.loaders.chat_model.__name__
+    return __name__


what do we think of adding new pyfunc loaders to the mlflow.pyfunc.loaders module? i think it would be a clean way for us to implement future custom loaders (e.g. for RAGModel, CompletionModel).

Sounds good to me.

daniellok-db · 2024-01-26T03:20:58Z

mlflow/pyfunc/loaders/__init__.py

@@ -0,0 +1 @@
+import mlflow.pyfunc.loaders.chat_model  # noqa: F401


is this necessary? or does python load all files in the subdirectory into the module by default?

Python doesn't. If we want to do from mlflow.pyfunc.loaders import chat_model, we need this line, otherwise we don't.

daniellok-db · 2024-01-26T03:23:15Z

mlflow/pyfunc/__init__.py

+            # output is not coercable to ChatResponse
+            messages = [ChatMessage(**m) for m in input_example["messages"]]
+            params = ChatParams(**{k: v for k, v in input_example.items() if k != "messages"})
+            output = python_model.predict(None, messages, params)


is it a problem to perform inference during saving? i saw we do it when trying to infer output signature, but since this is kind of an LLM-specific API, inference can be kind of expensive. the input example specifies max_tokens=10, so hopefully it isn't too bad.

if it is a concern, maybe we can just skip output validation entirely (as far as i can tell, there wouldn't be another way to ensure the return type of the predict() method is actually a ChatResponse).

I think there are some risks:

It may take a while (e.g. a few seconds) for the API request to finish.

No guarantee that the LLM service is healthy. If OpenAI is down, this line would throw.

+1 We shouldn't predict while saving the model, the error message would be confusing.

from discussion offline, we'll keep the predict since we do it in transformers/other places already for output signature inference. i'll do some more testing here to make sure it's not a confusing experience

serena-ruan · 2024-01-30T08:18:36Z

mlflow/pyfunc/__init__.py

+            # output is not coercable to ChatResponse
+            messages = [ChatMessage(**m) for m in input_example["messages"]]
+            params = ChatParams(**{k: v for k, v in input_example.items() if k != "messages"})
+            output = python_model.predict(None, messages, params)


+1 We shouldn't predict while saving the model, the error message would be confusing.

serena-ruan · 2024-01-30T08:24:16Z

mlflow/pyfunc/loaders/chat_model.py

+from mlflow.utils.model_utils import _get_flavor_configuration
+
+
+def _load_pyfunc(model_path: str, model_config: Optional[Dict[str, Any]] = None):


This looks the same as PythonModel's _load_pyfunc function (except the wrapper it returned), could we reuse the function and extract the final class as a parameter?

refactored the common part to _load_context_model_and_signature

serena-ruan · 2024-01-30T08:35:29Z

mlflow/pyfunc/loaders/chat_model.py

+
+    def _convert_input(self, model_input):
+        # model_input should be correct from signature validation, so just convert it to dict here
+        dict_input = {key: value[0] for key, value in model_input.to_dict(orient="list").items()}


Does to_dict accept orient param?

it seems so: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html#pandas.DataFrame.to_dict

but i'm kind of new to pandas—is there something else i should use?

serena-ruan · 2024-01-30T08:41:06Z

mlflow/types/llm.py

+            elif all(isinstance(v, cls) for v in values):
+                pass
+            else:


Suggested change

elif all(isinstance(v, cls) for v in values):

pass

else:

elif any(not isinstance(v, cls) for v in values):

serena-ruan · 2024-01-30T08:41:57Z

mlflow/types/llm.py

+        if not isinstance(self.message, ChatMessage):
+            self.message = ChatMessage(**self.message)


This might encounter error if self.message is not a dictionary

serena-ruan · 2024-01-30T08:42:26Z

mlflow/types/llm.py

+        self._validate_field("model", str, True)
+        self._convert_dataclass_list("choices", ChatChoice)
+        if not isinstance(self.usage, TokenUsageStats):
+            self.usage = TokenUsageStats(**self.usage)


changed this to check for dict and throw ValueError after if the field is not an instance of the expected type

BenWilson2 · 2024-01-31T02:26:25Z

mlflow/types/llm.py

+    total_tokens: int
+
+    def __post_init__(self):
+        self._validate_field("prompt_tokens", int, True)


Is defining this as a required set of fields going to preclude using this interface in transformers?

i don't think it will preclude that, because we can generate these stats automatically for the user using the transformer's tokenizer. however, i can make it not required if it's a concern! it was unclear from the spec which fields are required and not.

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

B-Step62 · 2024-02-02T01:19:29Z

mlflow/pyfunc/__init__.py

@@ -1999,6 +2009,25 @@ def predict(model_input: List[str]) -> List[str]:
                python_model, input_arg_index, input_example=input_example
            ):
                mlflow_model.signature = signature
+        elif isinstance(python_model, ChatModel):


Can we do any validation/warning if customer specifies custom signature with ChatModel? If it doesn't comply our pydantic schema, we may want to reject here rather than at runtime.

oh yes that's true, i'll throw a warning to say that the signature will be overridden and that it must conform to the spec

woah actually this brought up a bug in my implementation—if the user specifies a signature, the model actually doesn't get saved as a ChatModel due to the elif in line 2005 above. i guess it's elif because this block contains a lot of validation/signature inference logic that we can skip if the user provides the signature themself. however, for ChatModel we always want to do these validations (e.g. output validation)

cc @B-Step62 what do you think about raising an exception when trying to save a ChatModel subclass with a signature, e.g:

if signature is not None: if isinstance(python_model, ChatModel): raise MlflowException("ChatModel subclasses specify a signature automatically, please remove the provided signature from the log_model() or save_model() call.") mlflow_model.singature = signature elif python_model is not None: # no change from this PR

another way is making a separate block for ChatModels, e.g:

if isinstance(python_model, ChatModel): # move ChatModel logic to this block ... elif signature is not None: # no change ... elif python_model is not None: # no change ...

Nice finding! I agree with throwing. Warning on happy path can be easily overlooked and almost invisible in automated environment.

B-Step62 · 2024-02-02T01:28:19Z

mlflow/pyfunc/loaders/chat_model.py

+        if isinstance(response, ChatResponse):
+            return response.to_dict()
+
+        # shouldn't happen since there is validation at save time ensuring that


Should we raise instead? I'm not sure ignoring unexpected behavior is beneficial.

B-Step62 · 2024-02-02T01:28:36Z

mlflow/pyfunc/loaders/chat_model.py

+
+        return messages, params
+
+    def predict(self, model_input: ChatRequest, params: Optional[Dict[str, Any]] = None):


Suggested change

def predict(self, model_input: ChatRequest, params: Optional[Dict[str, Any]] = None):

def predict(self, model_input: Dict[str, Any], params: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:

super-nit

ah yes that's true haha, it won't be a ChatRequest when coming in

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

B-Step62

Left one very tiny comment, but otherwise LGTM! Awesome idea, it's always better to have typed object than handling dict everywhere:)

B-Step62 · 2024-02-02T05:29:09Z

mlflow/pyfunc/loaders/chat_model.py

+        messages, params = self._convert_input(model_input)
+        response = self.chat_model.predict(self.context, messages, params)
+
+        if isinstance(response, ChatResponse):


if not isinstance(response, ChatResponse): raise MLflowException(...) return response.to_dict()

super-minor thing but probably more common way to structure the block

harupy · 2024-02-02T06:38:54Z

tests/pyfunc/test_chat_model_validation.py

+    assert isinstance(response.choices[0].message, ChatMessage)
+
+
+def to_dict_converts_nested_dataclasses():


Suggested change

def to_dict_converts_nested_dataclasses():

def test_to_dict_converts_nested_dataclasses():

harupy · 2024-02-02T06:39:11Z

tests/pyfunc/test_chat_model_validation.py

+    assert not isinstance(response["choices"][0]["message"], ChatMessage)
+
+
+def to_dict_excludes_nones():


Suggested change

def to_dict_excludes_nones():

def test_to_dict_excludes_nones():

harupy · 2024-02-02T06:39:42Z

tests/pyfunc/test_chat_model_validation.py

+
+def to_dict_converts_nested_dataclasses():
+    response = ChatResponse(**MOCK_RESPONSE).to_dict()
+    assert not isinstance(response["choices"][0], ChatChoice)


what's the expected class? dict?

yup it should be dict, i guess i should just assert that haha

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

Signed-off-by: Daniel Lok <daniel.lok@databricks.com> Signed-off-by: ernestwong-db <ernest.wong@databricks.com>

Signed-off-by: Daniel Lok <daniel.lok@databricks.com> Signed-off-by: lu-wang-dl <lu.wang@databricks.com>

github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/feature Mention under Features in Changelogs. labels Jan 15, 2024

daniellok-db force-pushed the chat-model branch from 51635a5 to cb17a59 Compare January 25, 2024 00:54

dbczumar self-requested a review January 25, 2024 02:08

harupy reviewed Jan 25, 2024

View reviewed changes

mlflow/types/llm.py Outdated Show resolved Hide resolved

harupy reviewed Jan 25, 2024

View reviewed changes

mlflow/pyfunc/loaders/chat_model.py Outdated Show resolved Hide resolved

harupy reviewed Jan 25, 2024

View reviewed changes

mlflow/types/llm.py Outdated Show resolved Hide resolved

harupy reviewed Jan 25, 2024

View reviewed changes

daniellok-db commented Jan 26, 2024

View reviewed changes

daniellok-db requested a review from BenWilson2 January 30, 2024 00:44

serena-ruan reviewed Jan 30, 2024

View reviewed changes

BenWilson2 reviewed Jan 31, 2024

View reviewed changes

daniellok-db added 10 commits January 31, 2024 13:50

initial commit

cfdea18

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

make sure it works in serving

a2c32d7

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

add test

e91b96e

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

format

206d4ea

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

refactor to not use pydantic

20bf01d

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

refactor and add tests

a74e30b

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

update

b1c03b6

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

docstrings

f18a57e

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

address comments

fc274bb

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

address comments

c87f3dc

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

daniellok-db force-pushed the chat-model branch from cac0cda to c87f3dc Compare January 31, 2024 11:42

B-Step62 reviewed Feb 2, 2024

View reviewed changes

add some throws

0d5ab57

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

B-Step62 approved these changes Feb 2, 2024

View reviewed changes

harupy reviewed Feb 2, 2024

View reviewed changes

comments

4075860

Signed-off-by: Daniel Lok <daniel.lok@databricks.com>

daniellok-db merged commit bf141c7 into mlflow:master Feb 2, 2024
36 checks passed

ernestwong-db pushed a commit to ernestwong-db/mlflow that referenced this pull request Feb 6, 2024

Implement ChatModel (pyfunc subclass) (mlflow#10820)

95e4729

Signed-off-by: Daniel Lok <daniel.lok@databricks.com> Signed-off-by: ernestwong-db <ernest.wong@databricks.com>

lu-wang-dl pushed a commit to lu-wang-dl/mlflow that referenced this pull request Feb 6, 2024

Implement ChatModel (pyfunc subclass) (mlflow#10820)

ed038cd

Signed-off-by: Daniel Lok <daniel.lok@databricks.com> Signed-off-by: lu-wang-dl <lu.wang@databricks.com>

		# is not supported, so the code here is a little ugly.


		class _BaseDataclass:

		@@ -0,0 +1 @@
		import mlflow.pyfunc.loaders.chat_model # noqa: F401

		from mlflow.utils.model_utils import _get_flavor_configuration


		def _load_pyfunc(model_path: str, model_config: Optional[Dict[str, Any]] = None):

		if not isinstance(self.message, ChatMessage):
		self.message = ChatMessage(**self.message)


		return messages, params

		def predict(self, model_input: ChatRequest, params: Optional[Dict[str, Any]] = None):

	def predict(self, model_input: ChatRequest, params: Optional[Dict[str, Any]] = None):
	def predict(self, model_input: Dict[str, Any], params: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:

		assert isinstance(response.choices[0].message, ChatMessage)


		def to_dict_converts_nested_dataclasses():

	def to_dict_converts_nested_dataclasses():
	def test_to_dict_converts_nested_dataclasses():

		assert not isinstance(response["choices"][0]["message"], ChatMessage)


		def to_dict_excludes_nones():

	def to_dict_excludes_nones():
	def test_to_dict_excludes_nones():

Implement ChatModel (pyfunc subclass) #10820

Implement ChatModel (pyfunc subclass) #10820

Conversation

daniellok-db commented Jan 15, 2024 • edited by github-actions bot

Install mlflow from this PR

Checkout with GitHub CLI

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

github-actions bot commented Jan 15, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniellok-db Jan 26, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniellok-db Jan 26, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

B-Step62 Feb 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

B-Step62 Feb 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

B-Step62 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniellok-db commented Jan 15, 2024 •

edited by github-actions bot

github-actions bot commented Jan 15, 2024 •

edited

daniellok-db Jan 26, 2024 •

edited

daniellok-db Jan 26, 2024 •

edited

B-Step62 Feb 2, 2024 •

edited

B-Step62 Feb 2, 2024 •

edited

B-Step62 left a comment •

edited