Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Automatically split metrics, tags, etc into smaller chunks to avoid request limit error #6049

Closed
1 of 20 tasks
nzw0301 opened this issue Jun 12, 2022 · 2 comments · Fixed by #6052
Closed
1 of 20 tasks
Labels
area/tracking Tracking service, tracking client APIs, autologging enhancement New feature or request

Comments

@nzw0301
Copy link
Contributor

nzw0301 commented Jun 12, 2022

Willingness to contribute

Yes. I would be willing to contribute this feature with guidance from the MLflow community.

Proposal Summary

As in the title, MLflow might be able to avoid request limits by automatically splitting data that contains too many elements into smaller ones when we call mlflow.log_metrics, mlflow.log_params, and mlflow.set_tags (possible other API, too).

Motivation

What is the use case for this feature?

Suppose mlflow.log_metrics case. We would like to save many parameters at once as follows:

import mlflow


metrics = {f"param_{i}": i for i in range(mlflow.utils.validation.MAX_METRICS_PER_BATCH+1)}

# Log a batch of metrics
with mlflow.start_run():
    mlflow.log_metrics(metrics)

However, mlflow yields the following error:

File /opt/homebrew/Caskroom/miniconda/base/envs/optuna/lib/python3.9/site-packages/mlflow/utils/validation.py:296, in _validate_batch_limit(entity_name, limit, length)
    290 if length > limit:
    291     error_msg = (
    292         "A batch logging request can contain at most {limit} {name}. "
    293         "Got {count} {name}. Please split up {name} across multiple requests and try "
    294         "again."
    295     ).format(name=entity_name, count=length, limit=limit)
--> 296     raise MlflowException(error_msg, error_code=INVALID_PARAMETER_VALUE)

MlflowException: A batch logging request can contain at most 1000 metrics. Got 1001 metrics. Please split up metrics across multiple requests and try again.

Why is this use case valuable to support for MLflow users in general?

To avoid this, users need to split metrics into smaller subsets like

from itertools import islice
import mlflow


metrics = {f"param_{i}": i for i in range(mlflow.utils.validation.MAX_METRICS_PER_BATCH+1)}


# Log a batch of metrics
with mlflow.start_run():
    if len(metrics) > mlflow.utils.validation.MAX_METRICS_PER_BATCH:
        it = iter(metrics)
        for _ in range(0, len(metrics), mlflow.utils.validation.MAX_METRICS_PER_BATCH):
            sub_metrics = {k: metrics[k] for k in islice(it, mlflow.utils.validation.MAX_METRICS_PER_BATCH)}
            mlflow.log_metrics(sub_metrics)
    else:
        mlflow.log_metrics(metrics)

I suppose this can handle in mlflow.log_metrics. By doing so, users do not need to care about the number of elements of metrics.

Why is this use case valuable to support for your project(s) or organization?

I'm from optuna, a black-box optimisation framework library, community. Optuna provides an MLFlow callback that enables us to save optimisation results by using MLFlow API. If this feature request can be done by MLFlow side, we do not need the added changes by optuna/optuna#3651.

Why is it currently difficult to achieve this use case?

N/A

Details

No response

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@nzw0301 nzw0301 added the enhancement New feature or request label Jun 12, 2022
@github-actions github-actions bot added the area/tracking Tracking service, tracking client APIs, autologging label Jun 12, 2022
@dbczumar
Copy link
Collaborator

@nzw0301 This is an excellent idea, and we would be excited to review a PR that implements this capability. Thank you in advance for your contribution!

@nzw0301
Copy link
Contributor Author

nzw0301 commented Jun 13, 2022

@dbczumar Thank you for your comments! I've sent a PR to resolve this issue as linked above.

@BenWilson2 BenWilson2 added this to the MLflow Roadmap milestone Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants