New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support model evaluation with an endpoint URL #11262
Conversation
Documentation preview for 20a9741 will be available when this CircleCI job More info
|
42937f2
to
d0f261c
Compare
return False | ||
|
||
|
||
def _hash_array_of_dict_as_bytes(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: This hashing support is required for passing a list of dictionary (chat/completion payload) or similar pandas DataFrame as a input dataset. The hashing is required for generating dataset digest, but it doesn't handle a list or array contains dictionary. The implementation is not super clean, just band-aiding without breaking existing use cases.
@@ -1140,7 +1173,9 @@ def _convert_data_to_mlflow_dataset(data, targets=None, predictions=None): | |||
from pyspark.sql import DataFrame as SparkDataFrame | |||
|
|||
if isinstance(data, list): | |||
return mlflow.data.from_numpy(np.array(data), targets=np.array(targets)) | |||
return mlflow.data.from_numpy( | |||
np.array(data), targets=np.array(targets) if targets else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: This is small but important change. In the original logic, targets
param is always wrapped by numpy array when the input data is list, then if targets is None
doesn't work in downstream. Concretely, when we pass a list as an input data and leave targets None
, it will raise a weird error.
This is not specific to the endpoint evaluation, however, implicitly requiring targets only for list input is confusing, so I think we should fix it in this PR.
def _hash_array_of_dict_as_bytes(data): | ||
# NB: If an array or list contains dictionary element, it can't be hashed with | ||
# pandas.util.hash_array. Hence we need to manually hash the elements here. This is | ||
# particularly for the LLM use case where the input can be a list of dictionary | ||
# (chat/completion payloads), so doesn't handle more complex case like nested lists. | ||
result = b"" | ||
for elm in data: | ||
if isinstance(elm, (list, np.ndarray)): | ||
result += _hash_array_of_dict_as_bytes(elm) | ||
elif isinstance(elm, dict): | ||
result += _hash_dict_as_bytes(elm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move all of these hashing methods into a separate module, ideally located in mlflow.data
? since these are all about hashing for use with MLflow Datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this should be in a separate module, but for the location, I would still put this under evaluation
module. These methods are used for hashing various input like list, numpy, dataframe, and only used for class EvaluationDataset
digest.
Now that the digesting method for each flavor e.g. TensorflowDataset is moved to respective module (ref), so I think we can have evaluation/dataset.py
and put all these methods. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to put all of EvaluationDataset
in mlflow/data
to maintain consistency with other datasets - e.g. we don't put TensorflowDataset
in mlflow/tensorflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to tackle this in a follow-up ticket as long as we track it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to put all of EvaluationDataset in mlflow/data
Makes sense.
Feel free to tackle this in a follow-up ticket as long as we track it :)
Yeah let me do that, created a JIRA in next sprint (ML-39039), but may be able to pick up if no fun surprise happens in other tasks:)
ca89855
to
a62c2c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with docs nits! Thanks @B-Step62
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
1. Add support for list input 2. Fix issue of None targets handling to unblock 1 3. Add support for a dictionary element in input data, to directly pass chat/completion payload 4. Fix issue of dictionary hashing to unblock 2 5. Other comments Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
e59739b
to
20a9741
Compare
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> Signed-off-by: Arthur Jenoudet <arthur.jenoudet@databricks.com>
🛠 DevTools 🛠
Install mlflow from this PR
Checkout with GitHub CLI
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
This change enhances mlflow.evaluate() function to accepts an endpoint URL to
model
argument. We start to get frequent requests for this from users who want to evaluate the proprietary model endpoint or Databricks served models. Without this feature, they need to implement a custom predict function, which is not super difficult but less convenient.Important notes
inference_params
to the API, but it is only used for endpoint evaluation, not for other type of models. We can do follow-up to support inference params for all model types.Example
How is this PR tested?
Tested in Databricks. Sample code for one test case:
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
The
mlflow.evaluate()
API supports a model endpoint URL as a evaluation subject. This is particularly useful when you want to evaluate deployed models or proprietary APIs such as Databricks Foundation Model APIs.What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes