-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support files for pyfunc models and model_config #11951
Conversation
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
Documentation preview for afb8b84 will be available when this CircleCI job More info
|
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harupy can you help review the code as well? Specially pyfunc.load_model.
Other than that, it looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great - would it be possible for you to adjust the error message here as well?
mlflow/mlflow/pyfunc/__init__.py
Lines 2154 to 2155 in 1a665ca
"When `python_model` is a callable object, it must accept exactly one argument. " | |
f"Found {num_args} arguments.", |
could you also fill in the testing section w/ any DB notebooks you tested the loading behavior of paths that lead to db notebooks in? |
if python_model: | ||
model_code_path = None | ||
if isinstance(python_model, str): | ||
model_code_path = _validate_and_get_model_code_path(python_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if the python_model
file depends on other custom modules or external resources? Do they need to be logged as artifacts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly sure how that would work - can we address this in a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if the python_model file depends on other custom modules or external resources? Do they need to be logged as artifacts?
@harupy I think those custom modules and resources can be passed in as pip_requirements or code_paths and those should still work here right? Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@annzhang-db @sunishsheth2009 can we write a test for that case, if we assume it will work? If it doesn't work, we need to either fix it or update documentation to indicate that it isn't supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote a test for this - code_paths works with the functionality 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments that should simplify and derisk these changes by removing lots of if / else logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @annzhang-db ! Left some comments - let me know if you have questions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to this file are all for clarity, no functionality changes.
lc_model = _load_model_code_path(model_code_path, model_config) | ||
_validate_and_copy_file_to_directory(model_code_path, path, "code") | ||
else: | ||
lc_model = lc_model_or_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this point, lc_model is always a langchain model instance - it cannot be a path.
@@ -252,49 +252,32 @@ def load_retriever(persist_directory): | |||
import langchain | |||
from langchain.schema import BaseRetriever | |||
|
|||
lc_model = _validate_and_wrap_lc_model(lc_model, loader_fn) | |||
lc_model_or_path = _validate_and_prepare_lc_model_or_path(lc_model, loader_fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use variable name lc_model_or_path while both a langchain model or file path are possible.
mlflow/langchain/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor - move shared functions from langchain/utils.py to models/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor - moved functions verbatim from langchain/utils.py here to be used for pyfunc as well. No code changes.
@@ -900,49 +882,6 @@ def load_model(model_uri, dst_path=None): | |||
return _load_model_from_local_fs(local_model_path) | |||
|
|||
|
|||
@contextmanager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved code to models/utils.py
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
mlflow/langchain/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved functions to models/utils.py.
@@ -2328,8 +2363,6 @@ def predict(model_input: List[str]) -> List[str]: | |||
) | |||
raise MlflowException(message=msg, error_code=INVALID_PARAMETER_VALUE) | |||
|
|||
_validate_and_prepare_target_save_path(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved up so we can copy the model code file to the save path.
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
Signed-off-by: Ann Zhang <ann.zhang@databricks.com>
from mlflow.langchain import _validate_and_prepare_lc_model_or_path | ||
|
||
# If its not a PyFuncModel, then it should be a Langchain model | ||
_validate_and_wrap_lc_model(model, None) | ||
# If its not a PyFuncModel, then it should be a Langchain model (not a path) | ||
# Check this since the validation function does not | ||
if isinstance(model, str): | ||
raise mlflow.MlflowException( | ||
"Model should either be an instance of PyFuncModel or Langchain type." | ||
) | ||
model = _validate_and_prepare_lc_model_or_path(model, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@annzhang-db @sunishsheth2009 I recall that we talked about making set_model
support a Python function too. This is also supported by mlflow.pyfunc.log_model()
. Can we support this if it's easy or file a follow-up internal ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we have a ticket for that option as well. We will have to do it as a follow up. :)
@@ -839,7 +820,8 @@ def predict( | |||
return result | |||
|
|||
|
|||
def _load_pyfunc(path): | |||
# TODO: Support loading langchain with model_config. For now, this is a no-op. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@annzhang-db Do we have an internal ticket to track this? It would be very useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make a ticket!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@annzhang-db LGTM once remaining comments are addressed. Tried this out manually with a few variations on the following scripts; it works quite well:
- script.py
import mlflow
from mlflow.models import set_model
from mlflow.models import ModelConfig
config = ModelConfig()
class Model(mlflow.pyfunc.PythonModel):
def predict(self, context, model_input, params):
print("CONFIG", config.config)
print("CONFIG", config.get("A"))
print("PARAMS", params)
return model_input.apply(lambda column: column + 1)
set_model(Model())
- logger.py
import mlflow
from mlflow.models import infer_signature
import pandas
df = pandas.DataFrame(data=[[1, 2, 3]], columns=["a", "b", "c"])
sig = infer_signature(df, params={"BAZ": "BAR"})
model_info = mlflow.pyfunc.log_model(
python_model="script.py",
artifact_path="baz",
signature=sig,
model_config={"A": "B"},
)
print(model_info.model_uri)
mod = mlflow.pyfunc.load_model(model_info.model_uri, model_config={"A": "B"})
print(mod)
print(mod.predict(df, params={"BAZ": "BAR"}))
🛠 DevTools 🛠
Install mlflow from this PR
Checkout with GitHub CLI
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yes
should be selected for bug fixes, documentation updates, and other small changes.No
should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.