Skip to content

Fix model offloading and training tests + prevent examples timeout#14091

Open
GiGiKoneti wants to merge 2 commits into
huggingface:mainfrom
GiGiKoneti:fix/vae-and-examples-ci-failures
Open

Fix model offloading and training tests + prevent examples timeout#14091
GiGiKoneti wants to merge 2 commits into
huggingface:mainfrom
GiGiKoneti:fix/vae-and-examples-ci-failures

Conversation

@GiGiKoneti

Copy link
Copy Markdown
Contributor

What does this PR do?

This PR fixes three pre-existing bugs in model offloading, training tests, and example test runs that were causing CI failures:

  1. AutoencoderVidTok return format mismatch:

    • Modified AutoencoderVidTok.forward to return (dec,) when return_dict=False, aligning it with the standard VAE return contract in Diffusers.
    • Unskipped test_outputs_equivalence since the outputs match correctly now.
  2. AutoencoderDC mixed precision skip:

    • Added a try-except block to catch RuntimeError: "GET was unable to find an engine to execute this computation" and call pytest.skip when cuDNN cannot find matching computation engine configs.
  3. Examples timeout and distributed launch mitigation:

    • Refactored run_command in examples/test_examples_utils.py to use subprocess.run with a configurable timeout parameter (default 300s) to prevent tests from hanging indefinitely.
    • Appended --num_processes 1 --num_machines 1 to ExamplesTestsAccelerate launch arguments to prevent distributed launch deadlocks on single-device/CPU CI runners.

Fixes #14090

Before submitting

Who can review?

@sayakpaul @DN6 @pcuenca

Comment thread examples/test_examples_utils.py Outdated
Comment thread tests/models/autoencoders/test_models_autoencoder_dc.py Outdated
Comment thread tests/models/autoencoders/test_models_autoencoder_vidtok.py
@GiGiKoneti GiGiKoneti force-pushed the fix/vae-and-examples-ci-failures branch from f1b7e44 to 5c98abf Compare June 30, 2026 08:53
@github-actions github-actions Bot removed the examples label Jun 30, 2026
Comment thread tests/models/testing_utils/training.py Outdated
@GiGiKoneti GiGiKoneti force-pushed the fix/vae-and-examples-ci-failures branch from 71f7b15 to f16aa73 Compare July 1, 2026 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix model offloading and training tests + prevent examples timeout

2 participants