Support log_model with code model in langchain #11817

annzhang-db · 2024-04-24T23:53:00Z

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/11817/merge

Checkout with GitHub CLI

gh pr checkout 11817

Related Issues/PRs

Work to follow:

Introduce ModelConfig class Introduce ModelConfig #11841
Using model_config instead of code_paths[0] for the config file path, move model_config to be under model_code Support file paths for model_config in langchain #11843
Code validation on the provided file path: warn on magic commands, port over dbutils logic Add validation for model code #11844
mlflow.models.set_model(): validate object type [MLflow] Add ability to set_model for pyfunc and langchain model #11842

What changes are proposed in this pull request?

Remove restriction for file to be called "chain.py"
Add logic to read from file or databricks notebook
Introduce model_code as separate from code

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

github-actions · 2024-04-24T23:53:16Z

Documentation preview for 22af799 will be available when this CircleCI job
completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/8855806429.

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db · 2024-04-25T07:59:58Z

tests/langchain/test_langchain_model_export.py

@@ -2615,7 +2601,6 @@ def test_save_load_chain_as_code_optional_code_path():
            artifact_path="model_path",
            signature=signature,
            input_example=input_example,
-            code_paths=[],


test code_paths=None

mlflow/langchain/utils.py

WeichenXu123 · 2024-04-25T09:41:35Z

mlflow/langchain/__init__.py

+                f"If the provided model '{lc_model}' is a string, it must be a valid python "
                "file path containing the code for defining the chain instance."


For the case, can we log it as a separate model metadata like model_code_path instead of reusing "code_paths" ?
Because it has differences with existing code_paths files (its code is not a common module, but it should contain set_chain method, and it might come from databricks notebook),
and recently we are expanding MLflow code_paths functionality (e.g. auto inferring code_paths), and the newly added model_code_path can't support the expanded code_paths functionality and make code messy.

I have some related discussion in this doc:
https://docs.google.com/document/d/144wAwgXsQ40C3dDsoObX0LfXp33aRbvEVCCq-VYgJqw/edit#bookmark=id.d1xwn4gq68ce

https://docs.google.com/document/d/144wAwgXsQ40C3dDsoObX0LfXp33aRbvEVCCq-VYgJqw/edit#bookmark=id.w62724445pwk

CC @BenWilson2

Yes, we should keep this logic separate. This main entry point to a chain definition should be handled distinctly to avoid having to add error handling logic that would need to be applied to branching decision logic if this were overloaded into code_paths for directory traversal for dependent relative and absolute import statements with dependency inference.

Regardless of this fact, what is the mechanism for handling dependent imports within this implementation? If a user has external imports to custom code that rely on absolute imports, will this notebook path preserve its directory structure from the workspace root?

Updated the PR to separate this logic. There isn't any mechanism for handling dependent imports right now, and the notebook path does not preserve its directory structure from the workspace root. Should this be a requirement here?

There isn't any mechanism for handling dependent imports right now, and the notebook path does not preserve its directory structure from the workspace root. Should this be a requirement here?

No need to handle this.

mlflow/langchain/utils.py

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

mlflow/langchain/__init__.py

mlflow/langchain/utils.py

tests/langchain/test_langchain_model_export.py

mlflow/langchain/__init__.py

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

mlflow/langchain/__init__.py

annzhang-db · 2024-04-29T18:16:55Z

mlflow/langchain/__init__.py

            )
-
-        if len(code_paths) > 1:
+        if code_paths and len(code_paths) > 1:


#11844 (comment)

Can we delete this check since now we will use model_config instead?

This will get removed in the next PR: #11843, when we start to actually use model_config

annzhang-db · 2024-04-29T18:18:02Z

mlflow/langchain/__init__.py

@@ -256,7 +258,8 @@ def load_retriever(persist_directory):
                f"Current code paths: {code_paths}"
            )

-    code_dir_subpath = _validate_and_copy_code_paths(formatted_code_path, path)
+    code_dir_subpath = _validate_and_copy_code_paths(code_paths, path)
+    model_code_dir_subpath = _validate_and_copy_model_code_path(model_code_path, path)

    if signature is None:


#11844 (comment)

When the signature is None, I am not sure if this code would work: _LangChainModelWrapper(lc_model)
So we need to figure out a way to load the model here and use that as a wrapped model so we can infer_signature

Resolve signature issues with https://github.com/mlflow/mlflow/pull/11817/files#r1583519300 this potentially

Maybe we can move this code on top when the lc_model is a str, we load the model and make that the lc_model. Lot of things would be solved because of that. What do you think?

annzhang-db · 2024-04-29T18:18:57Z

mlflow/langchain/__init__.py

        **model_data_kwargs,
    )

    if Version(langchain.__version__) >= Version("0.0.311"):
        checker_model = lc_model
        if isinstance(lc_model, str):
+            # TODO: use model_config instead of code_paths[0]


#11844 (comment)

Maybe we can move this code on top when the lc_model is a str, we load the model and make that the lc_model. Lot of things would be solved because of that. What do you think?

annzhang-db · 2024-04-29T18:19:26Z

mlflow/langchain/utils.py

+                from databricks.sdk import WorkspaceClient
+                from databricks.sdk.service.workspace import ExportFormat
+
+                w = WorkspaceClient()


#11844 (comment)

Nit: This function is already huge :D
Can we extract this out in a function?

annzhang-db · 2024-04-29T18:19:57Z

mlflow/utils/model_utils.py

@@ -162,6 +163,20 @@ def _validate_and_copy_code_paths(code_paths, path, default_subpath="code"):
    return code_dir_subpath


+def _validate_and_copy_model_code_path(code_path, path, default_subpath="model_code"):


#11844 (comment)

default_subpath= FLAVOR_CONFIG_MODEL_CODE
Can we update the above so we don't define this var in 2 places?

This is reworked in the next PR #11843, let's address it there

sunishsheth2009

Approving this so we can merge this and we can address the TODOs so we can do it in small batch

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db added 8 commits April 24, 2024 16:52

initial

767e92e

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

set chain

464d885

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

format

0190276

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

format again

208ab5e

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

update

f417867

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

docstring

c3b769b

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

update

d052868

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

update chain.py

776e425

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

github-actions bot added rn/none List under Small Changes in Changelogs. and removed rn/none List under Small Changes in Changelogs. labels Apr 24, 2024

annzhang-db added 4 commits April 24, 2024 16:57

catch all exceptions

5702feb

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

check code_paths existence

6ced8b2

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

tests

62c38e8

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

exception

5bca18b

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db changed the title ~~log_model~~ Support log_model with file path in langchain Apr 25, 2024

github-actions bot added rn/none List under Small Changes in Changelogs. and removed rn/none List under Small Changes in Changelogs. labels Apr 25, 2024

annzhang-db commented Apr 25, 2024

View reviewed changes

github-actions bot added the rn/none List under Small Changes in Changelogs. label Apr 25, 2024

WeichenXu123 reviewed Apr 25, 2024

View reviewed changes

mlflow/langchain/utils.py Outdated Show resolved Hide resolved

WeichenXu123 reviewed Apr 25, 2024

View reviewed changes

mlflow/langchain/utils.py Show resolved Hide resolved

annzhang-db requested a review from sunishsheth2009 April 25, 2024 16:36

annzhang-db added 3 commits April 25, 2024 23:38

update

f9ac68c

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

use model_code_dir_subpath

e319bb0

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

add test for different name

e8643d8

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db requested a review from WeichenXu123 April 26, 2024 07:52

.py suffix

c7bb50a

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

harupy reviewed Apr 26, 2024

View reviewed changes

mlflow/langchain/__init__.py Outdated Show resolved Hide resolved

harupy reviewed Apr 26, 2024

View reviewed changes

mlflow/langchain/__init__.py Outdated Show resolved Hide resolved

harupy reviewed Apr 26, 2024

View reviewed changes

mlflow/langchain/utils.py Show resolved Hide resolved

harupy reviewed Apr 26, 2024

View reviewed changes

tests/langchain/test_langchain_model_export.py Outdated Show resolved Hide resolved

WeichenXu123 reviewed Apr 26, 2024

View reviewed changes

mlflow/langchain/__init__.py Show resolved Hide resolved

annzhang-db added 4 commits April 26, 2024 09:11

rework temp file

c34d9e0

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

remove import

48fd538

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

remove set_chain

5b10dd5

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

add back code_paths validation

9335648

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db requested review from harupy and WeichenXu123 April 26, 2024 22:09

annzhang-db added 4 commits April 26, 2024 15:21

format

246d963

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

Merge remote-tracking branch 'upstream/master' into langchain-log-model

f588fa8

none check

94d5341

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

format

22af799

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db changed the title ~~Support log_model with file path in langchain~~ Support log_model with code model in langchain Apr 27, 2024

annzhang-db mentioned this pull request Apr 27, 2024

Add validation for model code #11844

Merged

39 tasks

WeichenXu123 reviewed Apr 29, 2024

View reviewed changes

mlflow/langchain/__init__.py Show resolved Hide resolved

WeichenXu123 reviewed Apr 29, 2024

View reviewed changes

mlflow/langchain/__init__.py Show resolved Hide resolved

annzhang-db commented Apr 29, 2024

View reviewed changes

sunishsheth2009 approved these changes Apr 29, 2024

View reviewed changes

sunishsheth2009 reviewed Apr 29, 2024

View reviewed changes

annzhang-db merged commit a990a30 into mlflow:master Apr 29, 2024
56 checks passed

github-actions bot added the patch-2.12.2 label Apr 29, 2024

BenWilson2 pushed a commit that referenced this pull request May 6, 2024

Support log_model with code model in langchain (#11817)

27f134a

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support log_model with code model in langchain #11817

Support log_model with code model in langchain #11817

annzhang-db commented Apr 24, 2024 •

edited

github-actions bot commented Apr 24, 2024 •

edited

annzhang-db Apr 25, 2024

WeichenXu123 Apr 25, 2024 •

edited

BenWilson2 Apr 25, 2024

annzhang-db Apr 26, 2024 •

edited

WeichenXu123 Apr 26, 2024

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024

sunishsheth2009 left a comment

		f"If the provided model '{lc_model}' is a string, it must be a valid python "
		"file path containing the code for defining the chain instance."

		@@ -162,6 +163,20 @@ def _validate_and_copy_code_paths(code_paths, path, default_subpath="code"):
		return code_dir_subpath


		def _validate_and_copy_model_code_path(code_path, path, default_subpath="model_code"):

Support log_model with code model in langchain #11817

Support log_model with code model in langchain #11817

Conversation

annzhang-db commented Apr 24, 2024 • edited

Install mlflow from this PR

Checkout with GitHub CLI

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

github-actions bot commented Apr 24, 2024 • edited

annzhang-db Apr 25, 2024

Choose a reason for hiding this comment

WeichenXu123 Apr 25, 2024 • edited

Choose a reason for hiding this comment

BenWilson2 Apr 25, 2024

Choose a reason for hiding this comment

annzhang-db Apr 26, 2024 • edited

Choose a reason for hiding this comment

WeichenXu123 Apr 26, 2024

Choose a reason for hiding this comment

annzhang-db Apr 29, 2024 • edited

Choose a reason for hiding this comment

annzhang-db Apr 29, 2024 • edited

Choose a reason for hiding this comment

annzhang-db Apr 29, 2024 • edited

Choose a reason for hiding this comment

annzhang-db Apr 29, 2024 • edited

Choose a reason for hiding this comment

annzhang-db Apr 29, 2024 • edited

Choose a reason for hiding this comment

annzhang-db Apr 29, 2024 • edited

Choose a reason for hiding this comment

annzhang-db Apr 29, 2024 • edited

Choose a reason for hiding this comment

annzhang-db Apr 29, 2024

Choose a reason for hiding this comment

sunishsheth2009 left a comment

Choose a reason for hiding this comment

annzhang-db commented Apr 24, 2024 •

edited

github-actions bot commented Apr 24, 2024 •

edited

WeichenXu123 Apr 25, 2024 •

edited

annzhang-db Apr 26, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited

annzhang-db Apr 29, 2024 •

edited