Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Setting up async logging flag is not taking effect. #11518

Open
3 of 23 tasks
sagarsumant opened this issue Mar 25, 2024 · 5 comments
Open
3 of 23 tasks

[BUG]: Setting up async logging flag is not taking effect. #11518

sagarsumant opened this issue Mar 25, 2024 · 5 comments
Assignees
Labels
area/tracking Tracking service, tracking client APIs, autologging bug Something isn't working integrations/azure Azure and Azure ML integrations

Comments

@sagarsumant
Copy link
Contributor

Issues Policy acknowledgement

  • I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

  • Client: 1.x.y
  • Tracking server: 1.x.y

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Python version:
  • yarn version, if running the dev UI:

Describe the problem

When I run below code in AzureML context, i expect it to call async APIs on backend but it does not, it still calls the regular LogBatch sync API.
So i see some 429s being thrown and also i want this flow to work and call async APIs in backend.

Tracking information

import mlflow

mlflow.config.enable_async_logging(enable=True)

import mlflow
import pandas as pd
import numpy as np
import xgboost

data = pd.DataFrame({'a': np.arange(10000), 'b': np.arange(10000) * 10, 'c': np.arange(10000) * 100})
X = data[['a', 'b']]
y = data['c']

mlflow.set_experiment(experiment_name="mlflow-async-logging")

with mlflow.start_run():
    mlflow.xgboost.autolog()
    reg = xgboost.XGBRegressor(n_estimators=700)
    reg.fit(X, y, eval_set=[(X, y)])

Code to reproduce issue

N/A

Stack trace

N/A

Other info / logs

REPLACE_ME

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@sagarsumant sagarsumant added the bug Something isn't working label Mar 25, 2024
@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging integrations/azure Azure and Azure ML integrations labels Mar 25, 2024
@WeichenXu123
Copy link
Collaborator

Q:

The xgboost uses async in all places if async is feasible:

param_logging_operations = autologging_client.flush(synchronous=False)

But the log_artifact is synced, this might be the cause.

@WeichenXu123
Copy link
Collaborator

@sagarsumant

We are considering to make mlflow log_artifact supporting async, after it is supported, you can start work for addressing this ticket :)

@sagarsumant
Copy link
Contributor Author

@sagarsumant

We are considering to make mlflow log_artifact supporting async, after it is supported, you can start work for addressing this ticket :)

Hi @WeichenXu123 , thank you for your reply.

The typical complain that i am seeing is where customers are not directly making calls to mlflow and are setting the env level flag to turn on async flow, it is not working for them.

Let me know what could be missing here so as to make it work for all such wrappers for different providers.

Ex.

For pytorch lightning

class MLFlowLogger(OSSMLFlowLogger):
    def __init__(self):
        try:  # pragma: no cover
            run = Run.get_context()
            mlflow_url = run.experiment.workspace.get_mlflow_tracking_uri()
            experiment_name = run.experiment.name

            super().__init__(
                experiment_name=experiment_name,
                tracking_uri=mlflow_url,
            )
            
            # Customer tried this approach, which did not work
            self.experiment.enable_async_logging(enable=True)             

              # Nor - below approach work for enabling async logging flow
             import mlflow
             mlflow.enable_async_logging(enable=True)
       
            self._run_id = run.id
        except AttributeError:
            mlflow_url = None
            experiment_name = "lightning_logs"
            info("MLFlowLogger: Offline run, No MLFlow tracking uri found")
            super().__init__(
                experiment_name=experiment_name,
                tracking_uri=mlflow_url,
            )

    @rank_zero_only
    def log_metrics(self, metrics: Dict[str, float], step: Optional[int] = None) -> None:
        info(f"MLFlowLogger log metrics on step: {step}, metrics: {metrics}.")
         # ... Do something 

         # This is not working as expected.
        self.experiment.log_batch(self.run_id, metrics=metrics_to_log)

@WeichenXu123
Copy link
Collaborator

self.experiment what's this type ?

is it a MLflowClient ?

if so, then self.experiment.log_batch(self.run_id, metrics=metrics_to_log) causes the issue, it hardcoded synchronous default value to True instead of checking MLFLOW_ENABLE_ASYNC_LOGGING environment variable.

We can fix it.

Copy link

@BenWilson2 Please reply to comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging bug Something isn't working integrations/azure Azure and Azure ML integrations
Projects
None yet
Development

No branches or pull requests

3 participants