Add timeout for signature/requirement inference during Transformer model logging. #11037

B-Step62 · 2024-02-06T11:11:26Z

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/11037/merge

Checkout with GitHub CLI

gh pr checkout 11037

What changes are proposed in this pull request?

Problem

Transformers model saving involves a few model predictions if an input example is provided.

Prediction to generate model output for inferring model signature.
Prediction to track imported module to infer pip requirements.

However, this can take significant long for huge models like LLM, because we don't support saving models with optimized device mapping i.e. distributed to multiple devices. This is indeed a limitation of Transformers save_pretrained method we are using while logging context). As a result, users can only save models on single CPU/GPU, which can take hours or more for huge models.

What did't work

Initially, I tried to solve this problem by allowing saving model with device_mapping, by porting model to single CPU/GPU just before calling save_pretrained. If this is possible, we can run prediction for signature/requirement inference with optimized device setting, which makes the latency acceptable short.

However, it turns out moving models to different device is challenging, when it is originally distributed to multiple devices. For example, running model.to(torch.device("cpu")) will raise RuntimeError: You can't move a model that has some modules offloaded to cpu or disk..

We could implement device handling using native torch/tf libraries, but just found that there is a PR for extending save_pretrained() to support models loaded with device mapping. Hence, I think it we can wait this PR to be merged.

What I did eventually

To mitigate the "stuck" issue, I just added timeout to those predictions.

add 60 sec timeout to signature inference, with fallback to default signature for each pipeline type.
add 60 sec timeout to requirements inference, with fallback to default pip requirements.

This PR also includes small refactoring - extracting signature related logic to a separate file.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Tested serving via Docker container (with accelerate)

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Update the Transformers log_model() API to enforce timeout for signature and requirement inference, with a static fallback, to avoid the hanging issue when saving large models.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

github-actions · 2024-02-06T11:11:47Z

Documentation preview for c92e64b will be available when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/7827545773.

mlflow/transformers/signature.py

mlflow/utils/environment.py

daniellok-db · 2024-02-07T00:40:36Z

overall it looks good to me, i think this is a great idea especially if the default signatures are good enough for typical use-cases!

i agree that we probably wait for someone to raise a feature request if we want the timeout to be configurable, but i do think 60 seconds is a bit short based on my own usage of transformers (though not sure if i've been doing things in an optimized way). maybe we can increase the timeout?

it looks like there are some test failures but happy to accept after those are resolved and if nobody else has any concerns!

B-Step62 · 2024-02-07T02:20:15Z

hmmm the failure of test_transformers_tf_model_log_without_conda_env_uses_default_env_with_expected_dependencies is so weird, I can't reproduce it with the same package versions.

tests/transformers/test_transformers_model_export.py::test_transformers_tf_model_log_without_conda_env_uses_default_env_with_expected_dependencies PASSED | MEM 1.6/61.8 GB | DISK 205.3/484.6 GB [100%]

BenWilson2 · 2024-02-07T02:35:29Z

mlflow/environment_variables.py

+#: Specifies the timeout for model inference with input example(s) when logging/saving a model.
+#: MLflow runs a few inference requests against the model to infer model signature and pip
+#: requirements. Sometimes the prediction hangs for a long time, especially for a large model.
+#: This timeout avoid the hanging and fall back to the default signature and pip requirements.


Suggested change

#: This timeout avoid the hanging and fall back to the default signature and pip requirements.

#: This timeout limits the allowable time for performing a prediction for signature inference and will abort the prediction, falling back to the default signature and pip requirements.

How can we disable timeout? Setting this environment variable to 0?

we can set empty string or extremely large number like 99999

mlflow/transformers/signature.py

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

mlflow/transformers/__init__.py

mlflow/utils/timeout.py

mlflow/transformers/signature.py

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

harupy · 2024-02-07T08:15:45Z

mlflow/utils/timeout.py

+@contextmanager
+def run_with_timeout(seconds):
+    """
+    Context manager to runs a block of code with a timeout. If the block of code takes longer
+    than `seconds` to execute, a `TimeoutError` is raised.
+    NB: This function uses Unix signals to implement the timeout, so it is not thread-safe.
+    Also it does not work on non-Unix platforms such as Windows.
+
+    E.g.
+        ```
+        with run_with_timeout(5):
+            model.predict(data)
+        ```
+    """
+    assert _IS_UNIX, "Timeouts are not implemented yet for non-Unix platforms"
+
+    def signal_handler(signum, frame):
+        raise MLflowTimeoutError(f"Operation timed out after {seconds} seconds")
+
+    signal.signal(signal.SIGALRM, signal_handler)
+    signal.alarm(seconds)
+
+    try:
+        yield
+    finally:
+        signal.alarm(0)  # Disable the alarm after the operation completes or times out


/ai are there any risks when we forcibly terminate operations like this function does?

@harupy

Yes, there are some risks when we forcibly terminate operations using a function like this.

Data Consistency: If the block of code being executed involves altering some shared resource or global state, forcing a termination might leave that in an inconsistent state.

Resource Leaks: Sometimes, a block of code may need to clean up resources (like open files, network connections, etc.) at the end of its execution. If it's forcibly terminated, these cleanup steps might be skipped, leading to resource leakages.

Unpredictability: Since the block being forcefully terminated can contain any code, there's a risk that the handling of the TimeoutError may not be as intended. The programmer might not have considered the case where an operation could be terminated midway.

Thread Safety: As the function documentation states, this implementation relies on Unix signals which is not thread-safe. If the context manager wraps code executing in multiple threads, the signal could be delivered to a thread other than the one which set the timer, leading to undefined behavior.

These and possibly more problems make the forcible termination of any operation a decision that should be considered carefully.

Click here to see the usage data

{ "prompt_tokens": 249, "completion_tokens": 236, "total_tokens": 485, "estimated_cost_in_usd": 0.02163 }

/ai Does this disable the signal.alarm when the user interrupts while waiting?

@B-Step62

The provided code snippet doesn't provide a complete context.

import signal

It's just importing the signal module in Python, which provides mechanisms to handle different types of signals, including the SIGALRM signal for setting an alarm, and the SIGINT signal, which is typically generated by the user interrupting a process.

If you want to disable the signal.alarm() when the user interrupts while waiting, you should catch the SIGINT signal and disable the alarm in the signal handling function.

However, since no additional code/context is given, I cannot provide a specific answer to your question. Please provide more context or the complete piece of code that you're working with.

Click here to see the usage data

{ "prompt_tokens": 314, "completion_tokens": 140, "total_tokens": 454, "estimated_cost_in_usd": 0.01782 }

/ai Does this code disable the signal.alarm when the user interrupts while waiting operation inside the context manager?

@contextmanager def run_with_timeout(seconds): def signal_handler(signum, frame): raise MLflowTimeoutError(f"Operation timed out after {seconds} seconds") signal.signal(signal.SIGALRM, signal_handler) signal.alarm(seconds) try: yield finally: signal.alarm(0) # Disable the alarm after the operation completes or times out

@B-Step62

Yes, this code snippet does disable the alarm if the user interrupts the operation inside the context manager.

When the signal.alarm(0) function is executed in the finally block, it disables the alarm. The finally block is executed when the code within the try block either completes execution normally, or throws an exception due to user interruption or any other reason. So, even if a user interrupts the operation, the alarm would be disabled.

So in summary, regardless of how the operation within the context manager ends (even with an interruption), the alarm will be disabled.

Click here to see the usage data

{ "prompt_tokens": 569, "completion_tokens": 121, "total_tokens": 690, "estimated_cost_in_usd": 0.02433 }

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

B-Step62 · 2024-02-07T09:44:38Z

Oookie, it appears that I finally resolved the failed test test_transformers_tf_model_log_without_conda_env_uses_default_env_with_expected_dependencies... Basically there are two layers of bugs. One caused the test failure in this PR, and the other has been hiding the bug - making the test pass until now even tho it shouldn't.

Prerequisite Knowledge: How we capture dependencies for Transformer models

Requirement inference is done by monitoring import event during the model loading (or model prediction if input_example is provided). For Transformer specifically, we repeat this for 3 times, to determine the model uses Tensorflow or Pytorch, or both (context). This is done by trial-and-error approach like following steps:

Try loading model with setting USE_TORCH = True and validate if there is no tensorflow import => If this passes, we can say the model only depends on Pytorch.
Then try loading with setting USE_TF = True and validate if there is no torch import => If this passes, we can say the model is only depends on Tensorflow.
Finally, if both failed, we record both Tensorflow and Pytorch as required.

This seems to work, but the import capturing is not as straightforward as it looks, caused a few bugs.

Bug 1. Environment variable `USE_TF` / `USE_TORCH` is set after Transformers initiate the `_torch_available` flag.

Setting these environment variable is very important, as it does not only instruct the model to be loaded in the specified framework, but also prevent Transformers from importing the other librariy. For example, Transformer manages the binary state _torch_available that is used as a switch for many logic requireing Pytorch. This flag is set to True when Pytorch is installed, and USE_TF is not set to "True". As our test environment installs both Tensorflow and Pytorch, the USE_TF env variable is necessary to override this flag to False.

However, the issue was that the flag is set only once when the Transformer library is imported first time. Hence, it won't be flipped when we set the env var after first import. At present we are setting the env var when starting the import capturing (code), but indeed Transformer is imported earlier than that, and the flag is not set correctly.

Solution
To resolve the issue, this latest revision in this PR modifies the logic to set the environment variable when starting the subprocess for model loading.

However, then the question is why the test has not failed until this PR, which relates to the next bug.

Bug 2. Accelerate is installed for Tensorflow model and hides Pytorch in the logged requirements.

After MLflow captures all imported packages, MLflow doesn't use the list as they are. Instead, it trims down the list by remove packages that are installed by other packages anyway. For example, if the captured packges are ["scikit-learn", "numpy"], this pruning removes numpy because it is installed as a part of scikit-learn anyway.

What happened for the test before this PR is that, accelerate was captured as a model dependencies as well, and sinceit
has torch as its core dependency, torch is pruned and not listed in the final requirements.txt. As a result, the assertion assert "torch" not in ... did not fail so far. However, the model requirements actually include Pytorch indirectly via accelerate.

Solution
On the latest revions in this PR, somehow accelerate is no longer captured as dependencies for Tensorflow model. I couldn't spot out what was the trigger of this change, but most likely the similar internal state handling of Transformers. While it leaves a bit of ambiguity, the new behavior is correct because accelerate only support Pytorch models i.e. should not be logged for Tensorflow model. Pytorch model still logs accelerate as dependency when it is installed (validates with test_transformers_pt_model_save_without_conda_env_uses_default_env_with_expected_dependencies).

BenWilson2

Great investigation and fixes! LGTM!
As a final round of checks, let's validate model serving of a small toy TF version of a pipeline and a Torch with accelerate version, just to ensure that the modifications to inferred requirements work seamlessly with inference container build logic (it should work just fine, let's just make sure)

mlflow/utils/_capture_transformers_modules.py

mlflow/utils/requirements_utils.py

serena-ruan · 2024-02-08T01:47:34Z

mlflow/transformers/signature.py

+                "Attempted to generate a signature for the saved model or pipeline "
+                f"but encountered an error: {e}"
+            )
+            raise


Should we raise the exception or just return None?

I think we should raise, cuz this case highly likely be an critical issue of model prediction that would cause same issue after loaded/served?

Good point. Can we attempt to use the fallback in the case of any failure that occurs and only raise if a signature cannot be generated at all?

remove the raise

modify the warning in line 125 to raise an MLflowException

The reason being is that if a signature is not generated for these models on Databricks, they won't be eligible for registration in UC and won't be able to be submitted to model serving.

I see, but should we allow all errors during signature inference? For example, we raise MlflowException when the given model is not Pipeline instance (L145). Also whatever happens in this prediction will happen in production after serving I guess. Solving an issue in model serving is kinda hard

_TransformersWrapper( pipeline=pipeline, model_config=model_config, flavor_config=flavor_config ).predict(data, params=params)

What about blocking for those exceptions at least to fail first, while allowing fall back for any errors from our code i.e. signature inference logic?

I think if there is any errors during prediction result generation, the input example might be wrong (or the model has some problem), while the signature doesn't necessarily requires an output schema.

the input example might be wrong (or the model has some problem)

Yeah this is what I'm worried about, it's better telling users that "hey sth is wrong with your model or example". But I agree that the signature itself doesn't necessary need the output, so probably such validation is beyond the responsibility of this function. Will update to fallback not throwing (which I realize that same as what we do for requirement inference as well).

Reverted change for the case where no default fallback signature is found. If we raise an exception for such case, it prevent customers from saving custom pipeline class (and also caused failure for test case like test_invalid_task_inference_raises_error). While it might be not ideal for UC experience, I keep the original behavior i.e. just warn and return no signature, in the scope of this PR. I can do follow-up if necessary.

harupy

LGTM!

mlflow/transformers/signature.py

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

serena-ruan

LGTM!

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

…del logging. (mlflow#11037) Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

B-Step62 requested a review from BenWilson2 February 6, 2024 11:11

github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/bug-fix Mention under Bug Fixes in Changelogs. labels Feb 6, 2024

B-Step62 requested a review from mlflow-automation February 6, 2024 11:11

mlflow-automation requested review from daniellok-db, harupy, serena-ruan and WeichenXu123 and removed request for mlflow-automation February 6, 2024 11:12

BenWilson2 reviewed Feb 6, 2024

View reviewed changes

mlflow/transformers/signature.py Outdated Show resolved Hide resolved

daniellok-db reviewed Feb 7, 2024

View reviewed changes

mlflow/utils/environment.py Outdated Show resolved Hide resolved

BenWilson2 reviewed Feb 7, 2024

View reviewed changes

mlflow/transformers/signature.py Show resolved Hide resolved

B-Step62 added 11 commits February 7, 2024 14:11

Add timeout to signature inference for Transformer models

205e53b

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

Fix for signature inference

164c857

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

Add timeout to pip requirement inference for Transformer models

29855c8

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

Update comment

4fec32c

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

lint

11660c1

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

lint

956255c

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

fix

fb61909

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

fix rebase issue

05de8b7

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

make timeout comfigurable

64d60b3

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

increase default timeout

88c360b

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

lint

114a6eb

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

harupy reviewed Feb 7, 2024

View reviewed changes

mlflow/transformers/__init__.py Outdated Show resolved Hide resolved

harupy reviewed Feb 7, 2024

View reviewed changes

mlflow/utils/timeout.py Outdated Show resolved Hide resolved

harupy reviewed Feb 7, 2024

View reviewed changes

mlflow/utils/timeout.py Outdated Show resolved Hide resolved

harupy reviewed Feb 7, 2024

View reviewed changes

mlflow/transformers/signature.py Outdated Show resolved Hide resolved

Set env var earlier

186d493

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

harupy reviewed Feb 7, 2024

View reviewed changes

feedback

ca93374

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

B-Step62 force-pushed the fix-transformer-save-stuck branch from 1d05dbe to ca93374 Compare February 7, 2024 09:06

B-Step62 requested review from BenWilson2, harupy and daniellok-db February 7, 2024 10:07

BenWilson2 approved these changes Feb 7, 2024

View reviewed changes

serena-ruan reviewed Feb 8, 2024

View reviewed changes

mlflow/utils/_capture_transformers_modules.py Outdated Show resolved Hide resolved

mlflow/utils/_capture_transformers_modules.py Outdated Show resolved Hide resolved

mlflow/utils/requirements_utils.py Show resolved Hide resolved

serena-ruan reviewed Feb 8, 2024

View reviewed changes

harupy approved these changes Feb 8, 2024

View reviewed changes

serena-ruan reviewed Feb 8, 2024

View reviewed changes

mlflow/transformers/signature.py Outdated Show resolved Hide resolved

serena-ruan reviewed Feb 8, 2024

View reviewed changes

mlflow/transformers/signature.py Outdated Show resolved Hide resolved

change error handling for signature infernece

579bf61

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

serena-ruan approved these changes Feb 8, 2024

View reviewed changes

fix test

c92e64b

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

B-Step62 merged commit 8a72306 into mlflow:master Feb 8, 2024
61 checks passed

B-Step62 deleted the fix-transformer-save-stuck branch February 8, 2024 10:58

sateeshmannar pushed a commit to StateFarmIns/mlflow that referenced this pull request Feb 20, 2024

Add timeout for signature/requirement inference during Transformer mo…

008ba29

…del logging. (mlflow#11037) Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeout for signature/requirement inference during Transformer model logging. #11037

Add timeout for signature/requirement inference during Transformer model logging. #11037

B-Step62 commented Feb 6, 2024 •

edited

github-actions bot commented Feb 6, 2024 •

edited

daniellok-db commented Feb 7, 2024

B-Step62 commented Feb 7, 2024

BenWilson2 Feb 7, 2024

harupy Feb 7, 2024

B-Step62 Feb 7, 2024

harupy Feb 7, 2024

mlflow bot Feb 7, 2024

B-Step62 Feb 7, 2024

mlflow bot Feb 7, 2024

B-Step62 Feb 7, 2024

mlflow bot Feb 7, 2024

B-Step62 commented Feb 7, 2024

BenWilson2 left a comment

serena-ruan Feb 8, 2024

B-Step62 Feb 8, 2024

BenWilson2 Feb 8, 2024

B-Step62 Feb 8, 2024 •

edited

serena-ruan Feb 8, 2024

B-Step62 Feb 8, 2024

B-Step62 Feb 8, 2024 •

edited

harupy left a comment

serena-ruan left a comment

	#: This timeout avoid the hanging and fall back to the default signature and pip requirements.
	#: This timeout limits the allowable time for performing a prediction for signature inference and will abort the prediction, falling back to the default signature and pip requirements.

Add timeout for signature/requirement inference during Transformer model logging. #11037

Add timeout for signature/requirement inference during Transformer model logging. #11037

Conversation

B-Step62 commented Feb 6, 2024 • edited

Install mlflow from this PR

Checkout with GitHub CLI

What changes are proposed in this pull request?

Problem

What did't work

What I did eventually

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

github-actions bot commented Feb 6, 2024 • edited

daniellok-db commented Feb 7, 2024

B-Step62 commented Feb 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlflow bot Feb 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlflow bot Feb 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlflow bot Feb 7, 2024

Choose a reason for hiding this comment

B-Step62 commented Feb 7, 2024

Prerequisite Knowledge: How we capture dependencies for Transformer models

Bug 1. Environment variable USE_TF / USE_TORCH is set after Transformers initiate the _torch_available flag.

Bug 2. Accelerate is installed for Tensorflow model and hides Pytorch in the logged requirements.

BenWilson2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

B-Step62 Feb 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

B-Step62 Feb 8, 2024 • edited

Choose a reason for hiding this comment

harupy left a comment

Choose a reason for hiding this comment

serena-ruan left a comment

Choose a reason for hiding this comment

B-Step62 commented Feb 6, 2024 •

edited

github-actions bot commented Feb 6, 2024 •

edited

Bug 1. Environment variable `USE_TF` / `USE_TORCH` is set after Transformers initiate the `_torch_available` flag.

B-Step62 Feb 8, 2024 •

edited

B-Step62 Feb 8, 2024 •

edited