fix(offline): guard inference paths with HF_HUB_OFFLINE by jamiepine · Pull Request #503 · jamiepine/voicebox

jamiepine · 2026-04-19T23:56:47Z

Summary

Qwen3-TTS and Whisper were still requiring an internet connection for inference even when fully loaded — matching #462's repro (loaded model, disconnect, generate hangs; reconnect, generate completes).

PR #443 wrapped the model load paths with force_offline_if_cached. That context manager sets HF_HUB_OFFLINE=1 on __enter__ and restores it on __exit__ — so inference paths were running without the guard. qwen_tts, mlx_audio, and transformers all perform lazy tokenizer/processor/config lookups during inference. With internet on, those are invisible; with internet off, requests hangs on DNS/connect until the network returns.

Chatterbox/LuxTTS don't have this issue because their engine libs resolve everything through already-cached paths at load time.

Fix

Wrap each inference sync-body with force_offline_if_cached(True, ...). Since inference only runs after a successful load, weights are known to be on disk, so is_cached=True is unconditional.

Paths patched:

PyTorchTTSBackend.create_voice_prompt → create_voice_clone_prompt
PyTorchTTSBackend.generate → generate_voice_clone
PyTorchSTTBackend.transcribe → Whisper generate + get_decoder_prompt_ids
MLXTTSBackend.generate → mlx_audio generate (all three branches)
MLXSTTBackend.transcribe → mlx_audio Whisper generate
QwenCustomVoiceBackend.generate → generate_custom_voice
QwenCustomVoiceBackend._load_model_sync — added the load-time guard that was missing entirely (CustomVoice previously had no offline protection)

Out of scope

The secondary bug mentioned in #462 — CustomVoice download failing with check_model_inputs() missing 1 required positional argument: 'func' — is a separate transformers 5.x version-skew issue on the install path, not a runtime bug. Worth a follow-up issue.

Test plan

Download Qwen3-TTS 1.7B, let it load.
Disable Wi-Fi / enable Airplane mode.
Generate a new clip — should complete offline (was: hangs indefinitely).
Re-enable network, generate again — still works.
Repeat for Qwen 0.6B, Whisper (auto-transcribe), and Qwen CustomVoice (once the transformers install issue is unrelated — use a clean install).
With internet on the whole time, generate with each backend — verify no regressions in throughput or load time (the guard is a no-op when weights are already cached and no lookups escape).
Fresh install with network on: new-model download should still work (offline guard only kicks in after load, when is_cached=True).

Fixes #462.

🤖 Generated with Claude Code

Note

Medium Risk
Wraps multiple TTS/STT inference calls in a context manager that toggles HF_HUB_OFFLINE, which could impact concurrent requests if env vars are shared across threads/processes. Behavior changes are otherwise localized and intended to prevent offline hangs when models are already cached.

Overview
Prevents offline hangs during inference by wrapping cached-generation/transcription code paths with force_offline_if_cached(True, ...), ensuring lazy Hugging Face tokenizer/config lookups don’t attempt network access after models are loaded.

This adds the guard around MLX TTS generate, MLX Whisper transcribe, PyTorch Qwen TTS voice-prompt creation and generate_voice_clone, and PyTorch Whisper transcription (including get_decoder_prompt_ids). It also applies the offline guard to Qwen CustomVoice model loading and generate_custom_voice.

^{Reviewed by Cursor Bugbot for commit f3ed312. Configure here.}

Summary by CodeRabbit

Bug Fixes
- Improved offline reliability for TTS, STT, and custom-voice flows when model artifacts are cached to prevent network stalls during loading, prompt creation, generation, and transcription.
Refactor
- Centralized offline-mode handling with safer, reference-counted context management and rollback on errors; removed prior retry-on-load logic.
Tests
- Added tests covering offline-mode toggling, nesting, environment restoration, and multithreaded coordination.

PR #443 wrapped the model *load* path with `force_offline_if_cached` so cached models don't phone home at startup. The context manager restores `HF_HUB_OFFLINE` on exit, which left inference paths (generate, transcribe, voice-prompt creation) unguarded — and `qwen_tts`, `mlx_audio`, and `transformers` perform lazy tokenizer/processor/config lookups during inference. With internet on, those lookups are near-instant and invisible; with internet off, `requests` hangs on DNS or connect until the network returns. This is exactly what users in #462 describe: model shows "Loaded", internet drops, generation "thinks" forever, internet comes back, generation completes. Chatterbox and LuxTTS don't exhibit this because their engine libs resolve everything through already-cached paths at load time. Fix: wrap each inference-sync body with `force_offline_if_cached(True, ...)`. Since inference only runs after a successful load, weights are known to be on disk, so `is_cached=True` is unconditional. Also adds the load-time guard that was missing from `qwen_custom_voice_backend.py` — CustomVoice previously had no offline protection at all. Paths patched: - PyTorchTTSBackend.create_voice_prompt (create_voice_clone_prompt) - PyTorchTTSBackend.generate (generate_voice_clone) - PyTorchSTTBackend.transcribe (Whisper generate + decoder-prompt-ids) - MLXTTSBackend.generate (mlx_audio generate, all branches) - MLXSTTBackend.transcribe (mlx_audio whisper generate) - QwenCustomVoiceBackend._load_model_sync + generate Does not address the secondary `check_model_inputs() missing 'func'` error reported in the same issue — that's a `transformers` 5.x version-skew bug on the install path, separate concern. Fixes #462. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-19T23:57:00Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 08c629fe-bace-41fb-bd8d-8578fab1d52a

📥 Commits

Reviewing files that changed from the base of the PR and between 79b70b8 and 1024785.

📒 Files selected for processing (1)

backend/tests/test_offline_guard.py

📝 Walkthrough

Walkthrough

Wraps synchronous model-load and generation/transcription calls in a refcounted force_offline_if_cached(...) context across MLX, PyTorch, and Qwen custom-voice backends; reworks the offline helper to coordinate nested and concurrent contexts and adds tests for correctness.

Changes

Cohort / File(s)	Summary
MLX Backend `backend/backends/mlx_backend.py`	Wrapped synchronous TTS generation and STT transcription calls in `force_offline_if_cached(...)`; moved voice-cloning fallback inside the forced-offline context to avoid post-load network lookups/hangs.
PyTorch Backend `backend/backends/pytorch_backend.py`	Wrapped Qwen prompt creation, voice-clone generation, and Whisper transcription steps in `force_offline_if_cached(...)`, deriving deterministic model cache names to force offline tokenizer/config lookups when cached.
Qwen Custom Voice Backend `backend/backends/qwen_custom_voice_backend.py`	Imported `force_offline_if_cached`; wrapped `from_pretrained(...)` and `generate_custom_voice(...)` so tokenizer/config lookups are forced offline when artifacts are cached.
Offline helper & tests `backend/utils/hf_offline_patch.py`, `backend/tests/test_offline_guard.py`	Reworked `force_offline_if_cached` to use a module-level RLock + refcount, atomically set `HF_HUB_OFFLINE` env and transformer's cached offline flags, rollback on partial failures, and restore state only after last ref; added tests for nesting, env/flag restore, and multithreaded coordination.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Backend
    participant OfflineGuard as force_offline_if_cached
    participant HFModules as Transformers/HFHub
    participant Model

    Client->>Backend: request generate/transcribe
    Backend->>OfflineGuard: enter(model_name)
    OfflineGuard->>HFModules: set HF_HUB_OFFLINE + module flags (cached)
    Backend->>Model: synchronous generate/transcribe/load (no network)
    Model-->>Backend: result (audio/text)
    Backend->>OfflineGuard: exit(model_name)
    OfflineGuard->>HFModules: restore env + module flags when last ref
    Backend-->>Client: return result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix(offline): guard inference paths with HF_HUB_OFFLINE #503: Mirrors wrapping of synchronous backend operations and coordination logic for force_offline_if_cached (strong code overlap).
fix: force offline mode when loading cached models (Qwen TTS & Whisper) #318: Earlier use/expansion of force_offline_if_cached across backends; related evolution of helper usage.
Backend refactor: modular architecture, style guide, tooling #285: Prior refactor touching backend model-loading and offline-patch areas modified by this PR.

Poem

🐰 I hopped to the cache where the models reside,
I wrapped every call so the net stayed outside.
With locks and a count, I guarded each thread,
No more hung “thinking” — the audio was fed.
🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix(offline): guard inference paths with HF_HUB_OFFLINE' directly and precisely describes the main change: wrapping inference paths with the HF_HUB_OFFLINE guard to fix offline inference hangs.
Linked Issues check	✅ Passed	The PR comprehensively addresses issue `#462` by wrapping inference paths with force_offline_if_cached across all affected backends (PyTorchTTS/STT, MLXTTS/STT, QwenCustomVoice), ensuring Qwen3-TTS and Whisper can generate offline after model load.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the offline inference regression: wrapping inference bodies with force_offline_if_cached, refactoring the hf_offline_patch utility for concurrent refcounting, and adding comprehensive tests. The separate CustomVoice download issue is appropriately noted as out-of-scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/offline-inference-guard

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit f3ed312. Configure here.}

cursor · 2026-04-20T00:00:55Z

+            # Model is loaded → weights are on disk. Force offline so
+            # lazy tokenizer/config lookups inside mlx_audio don't hang
+            # when the user is disconnected (issue #462).
+            with force_offline_if_cached(True, model_name):


Race condition on os.environ in concurrent inference threads

Medium Severity

force_offline_if_cached saves and restores os.environ["HF_HUB_OFFLINE"] without any synchronization, but the new inference-path usages run concurrently via asyncio.to_thread. Thread A's finally can remove HF_HUB_OFFLINE while Thread B is still mid-inference, stripping Thread B's offline protection and causing the exact hang this PR aims to prevent. The prior load-time usage was safe because loads are effectively serialized per backend instance.

Additional Locations (1)

backend/backends/pytorch_backend.py#L240-L241

^{Reviewed by Cursor Bugbot for commit f3ed312. Configure here.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/backends/pytorch_backend.py`:
- Line 350: The variable sr returned from load_audio is unused and flagged by
Ruff; rename it to a dummy name (e.g., _sr or _) in the assignment where
load_audio is called (audio, sr = load_audio(...)) so it becomes audio, _sr =
load_audio(...), leaving the audio variable unchanged; ensure any other
references to sr in the surrounding scope are updated or removed accordingly to
avoid breaking references.

In `@backend/backends/qwen_custom_voice_backend.py`:
- Around line 108-120: The context manager force_offline_if_cached must be made
thread-safe and must update the HF offline constant in addition to os.environ:
implement a module-level threading.Lock and a refcount integer; on __enter__
increment the refcount inside the lock and set os.environ["HF_HUB_OFFLINE"]="1"
(or remove it per cached flag) and also set
huggingface_hub.constants.HF_HUB_OFFLINE accordingly; on __exit__ decrement the
refcount under the same lock and only restore the original os.environ value and
huggingface_hub.constants.HF_HUB_OFFLINE when refcount reaches zero (store the
original value at first enter); update imports to reference
huggingface_hub.constants.HF_HUB_OFFLINE and ensure the lock/refcount protect
all mutations so overlapping asyncio.to_thread calls won't race (adjust
force_offline_if_cached usage in the Qwen3TTSModel.from_pretrained call sites
accordingly).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0ad3b5b0-ebab-4f4c-8b50-84f514136419

📥 Commits

Reviewing files that changed from the base of the PR and between e3f7cd9 and f3ed312.

📒 Files selected for processing (3)

backend/backends/mlx_backend.py
backend/backends/pytorch_backend.py
backend/backends/qwen_custom_voice_backend.py

Review feedback on the initial fix surfaced two real issues: 1. ``os.environ`` toggles alone don't flip offline mode. ``huggingface_hub.constants.HF_HUB_OFFLINE`` is read once at import time into a module-level bool; ``transformers.utils.hub._is_offline_mode`` mirrors that bool at its own import time. The hot paths (``_http._default_backend_factory`` in huggingface_hub, ``is_offline_mode`` in transformers) read the cached bools — not the env — so mutating only ``os.environ`` was a no-op. 2. Race condition on concurrent inference. Two threads running inside ``force_offline_if_cached`` via ``asyncio.to_thread`` could have thread A's ``finally`` strip thread B's offline protection mid-run. Rewrite the helper to: - mutate ``huggingface_hub.constants.HF_HUB_OFFLINE`` and ``transformers.utils.hub._is_offline_mode`` directly - refcount concurrent users under a single ``threading.RLock`` so a shared offline window is restored only when the last caller exits - still write ``os.environ`` for anything that reads it dynamically Also addresses the unused-variable ruff flag on the Whisper transcribe path (``audio, sr`` → ``audio, _sr``). New unit tests cover the cached-constant mutation, env propagation, no-op on ``is_cached=False``, nested contexts, and a threaded race where a slow thread must retain offline mode after a peer exits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

backend/utils/hf_offline_patch.py (1)
29-30: Nit: None sentinel overloads two distinct states.

_saved_hf_const = None is used both for "import failed, nothing to restore" and as a default before any save. On the restore path if _saved_hf_const is not None: works today only because HF_HUB_OFFLINE / _is_offline_mode are always booleans in the live modules. If a future version of either library ever initializes them lazily to None, this check would silently skip a legitimate restore. A distinct sentinel (e.g., _MISSING = object()) would be more robust.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/hf_offline_patch.py` around lines 29 - 30, The two module-level
variables _saved_hf_const and _saved_transformers_const currently use None as a
sentinel which conflates "not-yet-saved" and a potential real None value from
future lazy initialization; introduce a unique sentinel (e.g., _MISSING =
object()) and initialize both _saved_hf_const and _saved_transformers_const to
that sentinel, then update all save/restore checks to use identity comparisons
against _MISSING (e.g., "is not _MISSING") instead of "is not None"; keep
existing behavior for HF_HUB_OFFLINE and _is_offline_mode by restoring their
saved values when the saved slot is not _MISSING.
backend/tests/test_offline_guard.py (1)
73-105: Thread-race test is timing-dependent.

The assertion that the slow thread still observes offline mode after the fast thread exits relies on time.sleep(0.15) being long enough for fast to complete its context exit before slow observes the constant. On a heavily loaded CI runner (GIL contention, swap, QEMU on ARM), 150ms is tight and can flake. A more deterministic alternative is to replace the sleep with an Event that fast sets immediately before returning from its with, and have slow event.wait() before observing — guaranteeing fast has exited the context.
Proposed fix
-    barrier = threading.Barrier(2)
+    barrier = threading.Barrier(2)
+    fast_exited = threading.Event()
 
     def slow():
         try:
             with force_offline_if_cached(True, "slow"):
-                barrier.wait()  # sync with fast
-                time.sleep(0.15)  # fast will exit during this sleep
+                barrier.wait()  # sync with fast
+                assert fast_exited.wait(timeout=5), "fast never exited"
                 observations.append(_hf_const().HF_HUB_OFFLINE)
         except Exception as exc:  # noqa: BLE001
             errors.append(exc)
 
     def fast():
         try:
             with force_offline_if_cached(True, "fast"):
                 barrier.wait()
+        finally:
+            fast_exited.set()
         except Exception as exc:  # noqa: BLE001
             errors.append(exc)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_offline_guard.py` around lines 73 - 105, The
timing-dependent sleep in test_concurrent_threads_share_offline_window makes the
test flaky; replace the time.sleep(0.15) coordination in the slow/fast threads
with a threading.Event: create an Event (e.g., exit_event) shared by both
threads, have fast set exit_event.set() immediately before it leaves the with
force_offline_if_cached(...) block (so fast signals it has exited), and have
slow call exit_event.wait() after the barrier and before appending to
observations; keep the existing barrier and error collection, and ensure the
event is used to deterministically guarantee fast has exited the context before
slow checks _hf_const().HF_HUB_OFFLINE.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/tests/test_offline_guard.py`:
- Around line 35-105: Swap the Yoda-style comparisons flagged by Ruff (SIM300)
so literals are on the left in the assertions within
test_mutates_cached_huggingface_hub_constant,
test_mutates_cached_transformers_constant, test_sets_env_variable,
test_nested_contexts_respect_refcount, and the final assertion in
test_concurrent_threads_share_offline_window (e.g., change "assert
_hf_const().HF_HUB_OFFLINE is True" to "assert True is
_hf_const().HF_HUB_OFFLINE" or better yet use "assert _hf_const().HF_HUB_OFFLINE
is True" → "assert _hf_const().HF_HUB_OFFLINE is True" corrected to place
literal on left per your linter); additionally add a brief module-level comment
(or a pytestmark like pytest.mark.serial) above these tests warning that they
mutate global state in huggingface_hub.constants and transformers.utils.hub and
may conflict under parallel test execution so they should be run serially.

In `@backend/utils/hf_offline_patch.py`:
- Around line 55-78: The setup inside the _offline_lock is not atomic because
only ImportError is caught, so other exceptions can leave
hf_const.HF_HUB_OFFLINE or tf_hub._is_offline_mode mutated and _offline_refcount
unchanged; update the block that touches _offline_refcount, _saved_env,
_saved_hf_const, _saved_transformers_const, hf_const.HF_HUB_OFFLINE,
tf_hub._is_offline_mode and os.environ["HF_HUB_OFFLINE"] so that any exception
during imports or assignment triggers a rollback: either broaden the except
clauses to catch Exception for the import/assignment sections or (preferably)
wrap the entire initialization in a try/except that on failure restores
hf_const.HF_HUB_OFFLINE and tf_hub._is_offline_mode to their _saved_* values,
restores os.environ["HF_HUB_OFFLINE"] to _saved_env (or deletes it if None),
leaves _offline_refcount at 0, then re-raises the original exception; ensure
this logic references the existing symbols (_offline_lock, _offline_refcount,
_saved_env, _saved_hf_const, _saved_transformers_const, hf_const.HF_HUB_OFFLINE,
tf_hub._is_offline_mode, os.environ["HF_HUB_OFFLINE"]) so the code is atomic on
error.

---

Nitpick comments:
In `@backend/tests/test_offline_guard.py`:
- Around line 73-105: The timing-dependent sleep in
test_concurrent_threads_share_offline_window makes the test flaky; replace the
time.sleep(0.15) coordination in the slow/fast threads with a threading.Event:
create an Event (e.g., exit_event) shared by both threads, have fast set
exit_event.set() immediately before it leaves the with
force_offline_if_cached(...) block (so fast signals it has exited), and have
slow call exit_event.wait() after the barrier and before appending to
observations; keep the existing barrier and error collection, and ensure the
event is used to deterministically guarantee fast has exited the context before
slow checks _hf_const().HF_HUB_OFFLINE.

In `@backend/utils/hf_offline_patch.py`:
- Around line 29-30: The two module-level variables _saved_hf_const and
_saved_transformers_const currently use None as a sentinel which conflates
"not-yet-saved" and a potential real None value from future lazy initialization;
introduce a unique sentinel (e.g., _MISSING = object()) and initialize both
_saved_hf_const and _saved_transformers_const to that sentinel, then update all
save/restore checks to use identity comparisons against _MISSING (e.g., "is not
_MISSING") instead of "is not None"; keep existing behavior for HF_HUB_OFFLINE
and _is_offline_mode by restoring their saved values when the saved slot is not
_MISSING.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 73cffea6-c988-4e97-ac9b-d21e121b8c88

📥 Commits

Reviewing files that changed from the base of the PR and between f3ed312 and de15d8f.

📒 Files selected for processing (3)

backend/backends/pytorch_backend.py
backend/tests/test_offline_guard.py
backend/utils/hf_offline_patch.py

coderabbitai · 2026-04-20T01:29:01Z

+def test_mutates_cached_huggingface_hub_constant():
+    original = _hf_const().HF_HUB_OFFLINE
+    with force_offline_if_cached(True, "t"):
+        assert _hf_const().HF_HUB_OFFLINE is True
+    assert _hf_const().HF_HUB_OFFLINE == original
+
+
+def test_mutates_cached_transformers_constant():
+    original = _tf_hub()._is_offline_mode
+    with force_offline_if_cached(True, "t"):
+        assert _tf_hub()._is_offline_mode is True
+    assert _tf_hub()._is_offline_mode == original
+
+
+def test_sets_env_variable():
+    original = os.environ.get("HF_HUB_OFFLINE")
+    with force_offline_if_cached(True, "t"):
+        assert os.environ.get("HF_HUB_OFFLINE") == "1"
+    assert os.environ.get("HF_HUB_OFFLINE") == original
+
+
+def test_noop_when_not_cached():
+    before = _hf_const().HF_HUB_OFFLINE
+    with force_offline_if_cached(False, "t"):
+        assert _hf_const().HF_HUB_OFFLINE == before
+
+
+def test_nested_contexts_respect_refcount():
+    original = _hf_const().HF_HUB_OFFLINE
+    with force_offline_if_cached(True, "outer"):
+        assert _hf_const().HF_HUB_OFFLINE is True
+        with force_offline_if_cached(True, "inner"):
+            assert _hf_const().HF_HUB_OFFLINE is True
+        # inner exit must not restore while outer is still active
+        assert _hf_const().HF_HUB_OFFLINE is True
+    assert _hf_const().HF_HUB_OFFLINE == original
+
+
+def test_concurrent_threads_share_offline_window():
+    """A slow thread must keep seeing offline mode even if a peer exits first."""
+    original = _hf_const().HF_HUB_OFFLINE
+    observations: list[bool] = []
+    errors: list[Exception] = []
+    barrier = threading.Barrier(2)
+
+    def slow():
+        try:
+            with force_offline_if_cached(True, "slow"):
+                barrier.wait()  # sync with fast
+                time.sleep(0.15)  # fast will exit during this sleep
+                observations.append(_hf_const().HF_HUB_OFFLINE)
+        except Exception as exc:  # noqa: BLE001
+            errors.append(exc)
+
+    def fast():
+        try:
+            with force_offline_if_cached(True, "fast"):
+                barrier.wait()
+        except Exception as exc:  # noqa: BLE001
+            errors.append(exc)
+
+    t_slow = threading.Thread(target=slow)
+    t_fast = threading.Thread(target=fast)
+    t_slow.start()
+    t_fast.start()
+    t_slow.join(timeout=5)
+    t_fast.join(timeout=5)
+
+    assert not errors, errors
+    assert observations == [True], "slow thread lost offline protection"
+    assert _hf_const().HF_HUB_OFFLINE == original


⚠️ Potential issue | 🟡 Minor

Minor: Ruff Yoda warnings + shared-state caveat.

Ruff (SIM300) flags four comparisons (lines 39, 59, 70, 105). Swap the operands to clear them, e.g.:

Proposed fix

- assert _hf_const().HF_HUB_OFFLINE == original + assert original == _hf_const().HF_HUB_OFFLINE

(apply at 39, 59, 70, 105)

Separately, every test here mutates process-global state in huggingface_hub.constants / transformers.utils.hub. If this suite is ever run under pytest-xdist with --dist=loadfile/loadscope or cross-process, tests in different workers can observe each other's forced-offline window. Worth a module-level comment or a pytestmark (pytest.mark.serial) if parallel test execution is on the roadmap.

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 39-39: Yoda condition detected

Rewrite as original == _hf_const().HF_HUB_OFFLINE

(SIM300)

[warning] 59-59: Yoda condition detected

Rewrite as before == _hf_const().HF_HUB_OFFLINE

(SIM300)

[warning] 70-70: Yoda condition detected

Rewrite as original == _hf_const().HF_HUB_OFFLINE

(SIM300)

[warning] 105-105: Yoda condition detected

Rewrite as original == _hf_const().HF_HUB_OFFLINE

(SIM300)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_offline_guard.py` around lines 35 - 105, Swap the Yoda-style comparisons flagged by Ruff (SIM300) so literals are on the left in the assertions within test_mutates_cached_huggingface_hub_constant, test_mutates_cached_transformers_constant, test_sets_env_variable, test_nested_contexts_respect_refcount, and the final assertion in test_concurrent_threads_share_offline_window (e.g., change "assert _hf_const().HF_HUB_OFFLINE is True" to "assert True is _hf_const().HF_HUB_OFFLINE" or better yet use "assert _hf_const().HF_HUB_OFFLINE is True" → "assert _hf_const().HF_HUB_OFFLINE is True" corrected to place literal on left per your linter); additionally add a brief module-level comment (or a pytestmark like pytest.mark.serial) above these tests warning that they mutate global state in huggingface_hub.constants and transformers.utils.hub and may conflict under parallel test execution so they should be run serially.

Review follow-up: - Wrap the `_offline_refcount == 0` setup in a try/except so any failure during the cached-constant mutation (including unexpected non-ImportError like RuntimeError or AttributeError from a half-initialized module) rolls back *all* partial state before re-raising. Without this, a mid-setup crash could leave `huggingface_hub.constants.HF_HUB_OFFLINE` mutated but the refcount at 0 — a persistent offline flag outliving the process. - Swap ruff-flagged Yoda comparisons in the new test file (SIM300) and add a module-level note warning that these tests mutate global state and are not safe under cross-process parallelism. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

backend/tests/test_offline_guard.py (1)

54-58: ⚠️ Potential issue | 🟡 Minor

Clear the remaining Ruff SIM300 warning.

Line 57 still uses a Yoda-style comparison.

Proposed fix

-        assert "1" == os.environ.get("HF_HUB_OFFLINE")
+        assert os.environ.get("HF_HUB_OFFLINE") == "1"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_offline_guard.py` around lines 54 - 58, The test
test_sets_env_variable uses a Yoda-style comparison ("1" ==
os.environ.get("HF_HUB_OFFLINE")); change it to a normal comparison by asserting
os.environ.get("HF_HUB_OFFLINE") == "1" inside the with block where
force_offline_if_cached(True, "t") is used, referencing the HF_HUB_OFFLINE env
var and the test_sets_env_variable function to locate the assertion.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/tests/test_offline_guard.py`:
- Line 18: The concurrency test currently uses sleep(0.15) and barrier.wait()
without timeouts which is racy; replace the sleep-based ordering with explicit
synchronization (e.g., use a threading.Event or the existing threading.Barrier
with a timeout) to deterministically ensure the "fast" worker has exited before
the "slow" worker reads the flag, add timeouts to all barrier.wait(...) calls,
and join threads with a timeout and assert they are not alive
(thread.join(timeout) + assert not thread.is_alive()) so the test is bounded and
cannot hang; update occurrences of sleep(0.15) and barrier.wait() in the
test_offline_guard tests (including the block covering lines 78-110)
accordingly.

---

Duplicate comments:
In `@backend/tests/test_offline_guard.py`:
- Around line 54-58: The test test_sets_env_variable uses a Yoda-style
comparison ("1" == os.environ.get("HF_HUB_OFFLINE")); change it to a normal
comparison by asserting os.environ.get("HF_HUB_OFFLINE") == "1" inside the with
block where force_offline_if_cached(True, "t") is used, referencing the
HF_HUB_OFFLINE env var and the test_sets_env_variable function to locate the
assertion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a74e6e48-d2ae-4d4d-b637-80cc1ce44058

📥 Commits

Reviewing files that changed from the base of the PR and between de15d8f and 79b70b8.

📒 Files selected for processing (2)

backend/tests/test_offline_guard.py
backend/utils/hf_offline_patch.py

Replace the `sleep(0.15)` ordering hack with an explicit `threading.Event` the fast thread sets in `finally`. The slow thread waits on that event (bounded), then observes the flag — so we deterministically verify the slow thread still sees offline mode after the fast thread has exited. Also add timeouts to `barrier.wait()` and assert `not thread.is_alive()` after the joins so the test can't hang on an unexpected failure path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor bot reviewed Apr 20, 2026

View reviewed changes

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

Comment thread backend/backends/pytorch_backend.py Outdated

Comment thread backend/backends/qwen_custom_voice_backend.py

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

Comment thread backend/tests/test_offline_guard.py Outdated

jamiepine merged commit 5aa1677 into main Apr 20, 2026
1 of 2 checks passed

jamiepine mentioned this pull request Apr 20, 2026

Qwen3-TTS 1.7B still requires internet for generation despite model being loaded (regression from #150) #462

Closed

Conversation

jamiepine commented Apr 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Out of scope

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 20, 2026

Choose a reason for hiding this comment

Race condition on os.environ in concurrent inference threads

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepine commented Apr 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 19, 2026 •

edited

Loading

Race condition on `os.environ` in concurrent inference threads