[BUG]: Setting up async logging flag is not taking effect. #11518

sagarsumant · 2024-03-25T18:16:14Z

Issues Policy acknowledgement

I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

Client: 1.x.y
Tracking server: 1.x.y

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Python version:
yarn version, if running the dev UI:

Describe the problem

When I run below code in AzureML context, i expect it to call async APIs on backend but it does not, it still calls the regular LogBatch sync API.
So i see some 429s being thrown and also i want this flow to work and call async APIs in backend.

Tracking information

import mlflow

mlflow.config.enable_async_logging(enable=True)

import mlflow
import pandas as pd
import numpy as np
import xgboost

data = pd.DataFrame({'a': np.arange(10000), 'b': np.arange(10000) * 10, 'c': np.arange(10000) * 100})
X = data[['a', 'b']]
y = data['c']

mlflow.set_experiment(experiment_name="mlflow-async-logging")

with mlflow.start_run():
    mlflow.xgboost.autolog()
    reg = xgboost.XGBRegressor(n_estimators=700)
    reg.fit(X, y, eval_set=[(X, y)])

Code to reproduce issue

N/A

Stack trace

N/A

Other info / logs

REPLACE_ME

What component(s) does this bug affect?

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

WeichenXu123 · 2024-03-26T10:35:32Z

Q:

The xgboost uses async in all places if async is feasible:

mlflow/mlflow/xgboost/__init__.py

Line 617 in 83b6c5f

param_logging_operations = autologging_client.flush(synchronous=False)

But the log_artifact is synced, this might be the cause.

WeichenXu123 · 2024-03-27T05:48:53Z

@sagarsumant

We are considering to make mlflow log_artifact supporting async, after it is supported, you can start work for addressing this ticket :)

sagarsumant · 2024-04-06T00:30:14Z

@sagarsumant

We are considering to make mlflow log_artifact supporting async, after it is supported, you can start work for addressing this ticket :)

Hi @WeichenXu123 , thank you for your reply.

The typical complain that i am seeing is where customers are not directly making calls to mlflow and are setting the env level flag to turn on async flow, it is not working for them.

Let me know what could be missing here so as to make it work for all such wrappers for different providers.

Ex.

For pytorch lightning

class MLFlowLogger(OSSMLFlowLogger):
    def __init__(self):
        try:  # pragma: no cover
            run = Run.get_context()
            mlflow_url = run.experiment.workspace.get_mlflow_tracking_uri()
            experiment_name = run.experiment.name

            super().__init__(
                experiment_name=experiment_name,
                tracking_uri=mlflow_url,
            )
            
            # Customer tried this approach, which did not work
            self.experiment.enable_async_logging(enable=True)             

              # Nor - below approach work for enabling async logging flow
             import mlflow
             mlflow.enable_async_logging(enable=True)
       
            self._run_id = run.id
        except AttributeError:
            mlflow_url = None
            experiment_name = "lightning_logs"
            info("MLFlowLogger: Offline run, No MLFlow tracking uri found")
            super().__init__(
                experiment_name=experiment_name,
                tracking_uri=mlflow_url,
            )

    @rank_zero_only
    def log_metrics(self, metrics: Dict[str, float], step: Optional[int] = None) -> None:
        info(f"MLFlowLogger log metrics on step: {step}, metrics: {metrics}.")
         # ... Do something 

         # This is not working as expected.
        self.experiment.log_batch(self.run_id, metrics=metrics_to_log)

WeichenXu123 · 2024-04-08T06:15:34Z

self.experiment what's this type ?

is it a MLflowClient ?

if so, then self.experiment.log_batch(self.run_id, metrics=metrics_to_log) causes the issue, it hardcoded synchronous default value to True instead of checking MLFLOW_ENABLE_ASYNC_LOGGING environment variable.

We can fix it.

github-actions · 2024-04-23T00:13:17Z

@BenWilson2 Please reply to comments.

sagarsumant added the bug Something isn't working label Mar 25, 2024

github-actions bot added area/tracking Tracking service, tracking client APIs, autologging integrations/azure Azure and Azure ML integrations labels Mar 25, 2024

WeichenXu123 assigned sagarsumant Mar 26, 2024

WeichenXu123 assigned BenWilson2 Mar 27, 2024

santiagxf mentioned this issue Mar 27, 2024

[BUG] Too many 429 error responses from Azure mlflow API with XGBoost autolog and n_estimators > ~700 #11462

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Setting up async logging flag is not taking effect. #11518

[BUG]: Setting up async logging flag is not taking effect. #11518

sagarsumant commented Mar 25, 2024

WeichenXu123 commented Mar 26, 2024

WeichenXu123 commented Mar 27, 2024

sagarsumant commented Apr 6, 2024

WeichenXu123 commented Apr 8, 2024

github-actions bot commented Apr 23, 2024

[BUG]: Setting up async logging flag is not taking effect. #11518

[BUG]: Setting up async logging flag is not taking effect. #11518

Comments

sagarsumant commented Mar 25, 2024

Issues Policy acknowledgement

Where did you encounter this bug?

Willingness to contribute

MLflow version

System information

Describe the problem

Tracking information

Code to reproduce issue

Stack trace

Other info / logs

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

WeichenXu123 commented Mar 26, 2024

WeichenXu123 commented Mar 27, 2024

sagarsumant commented Apr 6, 2024

WeichenXu123 commented Apr 8, 2024

github-actions bot commented Apr 23, 2024