fix(offline): patch transformers mistral-regex check to survive HF failures#530
fix(offline): patch transformers mistral-regex check to survive HF failures#530
Conversation
…ilures transformers 4.57.x's `PreTrainedTokenizerBase._patch_mistral_regex` calls `huggingface_hub.model_info(repo_id)` unconditionally during any non-local tokenizer load to probe for Mistral-family models. The call raises on `HF_HUB_OFFLINE=1`, on network outages, and on slow/blocked HF endpoints, and transformers doesn't catch any of it — the exception bubbles out of `from_pretrained` and kills the load for unrelated engines (Qwen TTS, Qwen CustomVoice, TADA, etc.). 0.4.2's load-time `force_offline_if_cached` guard walked straight into this trap: on cached online users it flipped `HF_HUB_OFFLINE=1` and converted a healthy load into a hard crash. 0.4.3's inference-path guard masked it; #524 removed the inference guard in 0.4.4, and users updating to 0.4.4 started hitting the same error on the load path instead (#526). Fix: - Wrap `_patch_mistral_regex` so any exception from the inner HF metadata check is swallowed and the tokenizer is returned unchanged. Voicebox never loads Mistral models, so the regex rewrite this check gates is a no-op for us; matches the success-path behavior for non-Mistral repos (tokenization_utils_base.py:2503). - Drop the `force_offline_if_cached` wraps from every load path (pytorch_backend Qwen + Whisper, qwen_custom_voice_backend, mlx_backend Qwen + Whisper). With the mistral patch in place they provide zero value and only risk re-introducing the same class of bug. Helper and its unit tests stay — still correct for targeted future use. - Add `backend/tests/test_offline_patch.py` covering OfflineModeIsEnabled / ConnectionError suppression, success pass-through, idempotence, and the missing-method no-op path. Fixes #526.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughRemoved conditional offline-forcing wrappers from model-loading paths (MLX, PyTorch, Qwen) and added an idempotent HF patch function that wraps Transformers' Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant Backend as Backend (model load)
participant HFPatch as HF Offline Patch
participant Transformers as Transformers
participant HFHub as HuggingFace Hub
Client->>Backend: request model load
Backend->>HFPatch: ensure patch installed (import/init)
HFPatch->>Transformers: wrap _patch_mistral_regex (idempotent)
Backend->>Transformers: call from_pretrained()/load()
Transformers->>Transformers: invoke _patch_mistral_regex
Transformers->>HFHub: model_info() lookup
alt HF metadata unreachable (offline/conn error)
HFHub-->>Transformers: raises OfflineModeIsEnabled/ConnectionError
Transformers->>Transformers: wrapper suppresses exception -> return tokenizer unchanged
else Metadata available
HFHub-->>Transformers: model metadata
Transformers->>Transformers: apply mistral regex patch if needed
end
Transformers-->>Backend: model/processor returned
Backend-->>Client: model ready
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/backends/pytorch_backend.py (1)
1-24:⚠️ Potential issue | 🟠 MajorSame patch-wiring concern as
qwen_custom_voice_backend.py.See the major-issue comment on
qwen_custom_voice_backend.py— this file similarly does not importbackend.utils.hf_offline_patch, sopatch_transformers_mistral_regex()is not guaranteed to have run beforeWhisperProcessor.from_pretrained(...)/WhisperForConditionalGeneration.from_pretrained(...)at lines 298-299 orQwen3TTSModel.from_pretrained(...)at lines 109/116. If the verification there shows no app-level entry point imports the patch module, consider adding an import +patch_transformers_mistral_regex()call at the top of this module, mirroringmlx_backend.py(lines 13-18).Not re-opening the verification script here — the one in
qwen_custom_voice_backend.pycovers the whole repo.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/pytorch_backend.py` around lines 1 - 24, This file may call WhisperProcessor.from_pretrained, WhisperForConditionalGeneration.from_pretrained and Qwen3TTSModel.from_pretrained before the transformers regex patch is applied; import backend.utils.hf_offline_patch and call patch_transformers_mistral_regex() near the top of this module (mirroring mlx_backend.py) so the patch runs before any of those from_pretrained calls; reference the function patch_transformers_mistral_regex() and the classes/methods WhisperProcessor.from_pretrained, WhisperForConditionalGeneration.from_pretrained, and Qwen3TTSModel.from_pretrained to locate where the import+call must be added.backend/backends/qwen_custom_voice_backend.py (1)
17-32:⚠️ Potential issue | 🟠 MajorThe mistral-regex patch is NOT guaranteed to run on non-MLX platforms if qwen_custom_voice or pytorch backends are selected.
patch_transformers_mistral_regex()is installed only whenbackend/utils/hf_offline_patch.pyis imported (module-level invocation at line 269). Currently, onlymlx_backend.pyimports this module. On non-MLX platforms (Windows/Linux/CUDA),get_backend_type()returns "pytorch", and if a user selects theqwen_custom_voiceengine, theQwenCustomVoiceBackendis imported directly without the patch being installed. TheOfflineModeIsEnabledcrash from issue#526will still surface insideQwen3TTSModel.from_pretrained(...).Add an explicit import and call in
qwen_custom_voice_backend.pyandpytorch_backend.py:from ..utils.hf_offline_patch import patch_huggingface_hub_offline, ensure_original_qwen_config_cached, patch_transformers_mistral_regex # At module level: patch_huggingface_hub_offline() patch_transformers_mistral_regex() ensure_original_qwen_config_cached()Or ensure
backend/utils/hf_offline_patchis imported unconditionally at application startup (e.g.,backend/__init__.pyorbackend/app.py).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/qwen_custom_voice_backend.py` around lines 17 - 32, Import and invoke the HF offline/patch helpers at module import time so the mistral regex and offline behavior are always installed: add an unconditional import of backend.utils.hf_offline_patch and call patch_huggingface_hub_offline(), patch_transformers_mistral_regex(), and ensure_original_qwen_config_cached() at module level in qwen_custom_voice_backend.py (and mirror the same in pytorch_backend.py) so Qwen3TTSModel.from_pretrained(...) cannot trigger the OfflineModeIsEnabled crash when qwen_custom_voice is selected on non-MLX platforms; reference the helper symbols patch_huggingface_hub_offline, patch_transformers_mistral_regex, and ensure_original_qwen_config_cached to locate the needed functions.
🧹 Nitpick comments (2)
backend/tests/test_offline_patch.py (2)
105-109:test_missing_method_is_noopleaks state into subsequent tests.When
monkeypatch.delattr(..., raising=False)runs, the autouse fixture'ssaved = PreTrainedTokenizerBase.__dict__.get("_patch_mistral_regex")was already captured at fixture setup (so restore should work), but there is a subtle ordering issue:monkeypatchteardown runs before therestore_mistral_regexfixture teardown, so the final state issaved(correct). However, if this is the first test and the original method was never captured (hypothetically inherited), the fixture wouldn't restore it. Not a bug today, but add a sanity assertion — e.g.,assert "_patch_mistral_regex" in PreTrainedTokenizerBase.__dict__at fixture entry — to fail loudly if a future transformers release moves the method.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_offline_patch.py` around lines 105 - 109, Add a sanity assertion at the start of the autouse fixture that saves/restores the tokenizer method (e.g., restore_mistral_regex) to ensure the original method exists on PreTrainedTokenizerBase: insert assert "_patch_mistral_regex" in PreTrainedTokenizerBase.__dict__ at the top of the fixture (before capturing saved = PreTrainedTokenizerBase.__dict__.get("_patch_mistral_regex")), so tests like test_missing_method_is_noop will fail loudly if a future transformers release removes or moves the method.
14-24: Prefer package-relative imports oversys.pathmutation.
sys.path.insert(0, str(Path(__file__).parent.parent))followed byimport utils.hf_offline_patch as hf_offline_patchworks but (a) depends on CWD/collection order under pytest, (b) shadows any top-levelutilspackage, and (c) skips thebackend/__init__.pypackage init. Preferred pattern ifbackend/is a package:from backend.utils import hf_offline_patch as hf_offline_patch(with aconftest.pyat the repo root setting the rootdir), matching how the runtime code imports it (from ..utils.hf_offline_patch import ...).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/test_offline_patch.py` around lines 14 - 24, Replace the sys.path mutation pattern and top-level import with a package-relative import: remove the sys.path.insert(0, str(Path(__file__).parent.parent)) line and stop using "import utils.hf_offline_patch as hf_offline_patch"; instead import the module via the package namespace used in production (e.g., from backend.utils import hf_offline_patch or from ..utils import hf_offline_patch depending on test package layout) so the test uses package-relative imports and loads backend/__init__.py correctly; ensure tests' conftest/rootdir is configured so the package import resolves under pytest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/tests/test_offline_patch.py`:
- Around line 79-94: The test test_passthrough_on_success is currently passing
due to an AttributeError swallowed by the wrapper in
PreTrainedTokenizerBase._patch_mistral_regex because sentinel lacks attributes
the wrapper touches; fix by making the test exercise the actual non-Mistral
fall-through: either rename the test to reflect it checks suppression of
attribute errors, or (preferred) construct/provide a real or mocked tokenizer
object that exposes the attributes the wrapper accesses (e.g., pre_tokenizer or
backend_tokenizer, any methods _patch_mistral_regex reads, and support for
setting fix_mistral_regex) and ensure huggingface_hub.model_info returns
FakeInfo with non-mistral tags; also mock transformers.utils.hub.cached_file to
avoid network/filesystem access so the original path executes and returns the
original tokenizer instead of raising.
---
Outside diff comments:
In `@backend/backends/pytorch_backend.py`:
- Around line 1-24: This file may call WhisperProcessor.from_pretrained,
WhisperForConditionalGeneration.from_pretrained and
Qwen3TTSModel.from_pretrained before the transformers regex patch is applied;
import backend.utils.hf_offline_patch and call
patch_transformers_mistral_regex() near the top of this module (mirroring
mlx_backend.py) so the patch runs before any of those from_pretrained calls;
reference the function patch_transformers_mistral_regex() and the
classes/methods WhisperProcessor.from_pretrained,
WhisperForConditionalGeneration.from_pretrained, and
Qwen3TTSModel.from_pretrained to locate where the import+call must be added.
In `@backend/backends/qwen_custom_voice_backend.py`:
- Around line 17-32: Import and invoke the HF offline/patch helpers at module
import time so the mistral regex and offline behavior are always installed: add
an unconditional import of backend.utils.hf_offline_patch and call
patch_huggingface_hub_offline(), patch_transformers_mistral_regex(), and
ensure_original_qwen_config_cached() at module level in
qwen_custom_voice_backend.py (and mirror the same in pytorch_backend.py) so
Qwen3TTSModel.from_pretrained(...) cannot trigger the OfflineModeIsEnabled crash
when qwen_custom_voice is selected on non-MLX platforms; reference the helper
symbols patch_huggingface_hub_offline, patch_transformers_mistral_regex, and
ensure_original_qwen_config_cached to locate the needed functions.
---
Nitpick comments:
In `@backend/tests/test_offline_patch.py`:
- Around line 105-109: Add a sanity assertion at the start of the autouse
fixture that saves/restores the tokenizer method (e.g., restore_mistral_regex)
to ensure the original method exists on PreTrainedTokenizerBase: insert assert
"_patch_mistral_regex" in PreTrainedTokenizerBase.__dict__ at the top of the
fixture (before capturing saved =
PreTrainedTokenizerBase.__dict__.get("_patch_mistral_regex")), so tests like
test_missing_method_is_noop will fail loudly if a future transformers release
removes or moves the method.
- Around line 14-24: Replace the sys.path mutation pattern and top-level import
with a package-relative import: remove the sys.path.insert(0,
str(Path(__file__).parent.parent)) line and stop using "import
utils.hf_offline_patch as hf_offline_patch"; instead import the module via the
package namespace used in production (e.g., from backend.utils import
hf_offline_patch or from ..utils import hf_offline_patch depending on test
package layout) so the test uses package-relative imports and loads
backend/__init__.py correctly; ensure tests' conftest/rootdir is configured so
the package import resolves under pytest.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0136552b-b2ff-4bff-b003-ae1cee20a390
📒 Files selected for processing (5)
backend/backends/mlx_backend.pybackend/backends/pytorch_backend.pybackend/backends/qwen_custom_voice_backend.pybackend/tests/test_offline_patch.pybackend/utils/hf_offline_patch.py
| def test_passthrough_on_success(monkeypatch): | ||
| """When model_info returns non-Mistral tags the original falls through and returns the tokenizer unchanged.""" | ||
| _apply_patch() | ||
|
|
||
| import huggingface_hub | ||
|
|
||
| class FakeInfo: | ||
| tags = ["model-type:qwen", "language:en"] | ||
|
|
||
| monkeypatch.setattr(huggingface_hub, "model_info", lambda *_a, **_kw: FakeInfo()) | ||
|
|
||
| sentinel = object() | ||
| result = PreTrainedTokenizerBase._patch_mistral_regex( | ||
| sentinel, "Qwen/Qwen3-TTS-12Hz-1.7B-Base" | ||
| ) | ||
| assert result is sentinel |
There was a problem hiding this comment.
test_passthrough_on_success likely exercises the exception path, not the success path.
sentinel = object() has none of the attributes the real _patch_mistral_regex touches (it reaches into the tokenizer's pre_tokenizer / backend_tokenizer, calls cached_file(...), does setattr(tokenizer, "fix_mistral_regex", ...), etc.), so original(...) will raise AttributeError/TypeError before the non-Mistral fall-through branch is reached. The wrapper's except Exception swallows that and returns sentinel, so the assertion passes — but for the wrong reason, which defeats the test's stated purpose and hides regressions in the success branch.
Either rename the test to reflect what's actually tested (e.g., test_suppresses_attribute_errors) or feed a real PreTrainedTokenizerFast (e.g., load a tiny cached tokenizer) / a mock with the attributes _patch_mistral_regex accesses so the original can fall through naturally when FakeInfo.tags lacks base_model:*mistralai. Also consider mocking transformers.utils.hub.cached_file to avoid a real filesystem/network hit.
🧰 Tools
🪛 Ruff (0.15.10)
[warning] 86-86: Mutable default value for class attribute
(RUF012)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/test_offline_patch.py` around lines 79 - 94, The test
test_passthrough_on_success is currently passing due to an AttributeError
swallowed by the wrapper in PreTrainedTokenizerBase._patch_mistral_regex because
sentinel lacks attributes the wrapper touches; fix by making the test exercise
the actual non-Mistral fall-through: either rename the test to reflect it checks
suppression of attribute errors, or (preferred) construct/provide a real or
mocked tokenizer object that exposes the attributes the wrapper accesses (e.g.,
pre_tokenizer or backend_tokenizer, any methods _patch_mistral_regex reads, and
support for setting fix_mistral_regex) and ensure huggingface_hub.model_info
returns FakeInfo with non-mistral tags; also mock
transformers.utils.hub.cached_file to avoid network/filesystem access so the
original path executes and returns the original tokenizer instead of raising.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit fb9fe30. Configure here.
| ) | ||
| from ..utils.cache import get_cache_key, get_cached_voice_prompt, cache_voice_prompt | ||
| from ..utils.audio import load_audio | ||
| from ..utils.hf_offline_patch import force_offline_if_cached |
There was a problem hiding this comment.
Mistral regex patch never applied for PyTorch backends
High Severity
Removing the force_offline_if_cached import from pytorch_backend.py and qwen_custom_voice_backend.py means neither file imports hf_offline_patch at all anymore. The module-level patch_transformers_mistral_regex() call in hf_offline_patch.py only runs when the module is first imported, which only happens via mlx_backend.py. On Windows/Linux (PyTorch), hf_offline_patch is never imported, so the core fix of this PR — the mistral regex wrapper — is never applied.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit fb9fe30. Configure here.
The previous commit left the patch wired only through ``mlx_backend.py``'s existing import of ``hf_offline_patch``. On Windows/Linux/CUDA users who never load the MLX backend (everyone who hit #526), the patch module was never imported, so ``patch_transformers_mistral_regex`` never ran and the crash persisted. Hoist the import into ``backends/__init__.py``. Every backend imports from this package, so the module-level patch install runs before any ``from_pretrained`` call regardless of which engine the user picks. Caught by CodeRabbit and Cursor Bugbot on #530.
|
Good catch from the bots — both flagged that Fixed in bc5faa2: moved the import to Verified with |


Summary
Fixes the "Cannot reach ... offline mode is enabled" crash hitting users on 0.4.4 (#526). Users updating from 0.4.3 expected 0.4.4 to fix this error — it didn't, because 0.4.4's PR #524 only reverted the inference-path guards. The same trap on the load path has been shipping since 0.4.2 and was masked by the more visible inference-path regression.
Root cause
transformers4.57.x'sPreTrainedTokenizerBase._patch_mistral_regexcallshuggingface_hub.model_info(repo_id)unconditionally for every non-local tokenizer load, to probe whether the repo is a Mistral variant. That call raises onHF_HUB_OFFLINE=1, on network outages, and on blocked HF endpoints._patch_mistral_regexdoesn't catch any of it — the exception bubbles out offrom_pretrainedand kills the load.0.4.2's load-time
force_offline_if_cachedguard (from #503) walked straight into this: on cached online users it flippedHF_HUB_OFFLINE=1, converting a healthy load intoOfflineModeIsEnabled. 0.4.3's inference-path guard got hit first so nobody reported the load-path one. #524 removed the inference guard, users got further into the flow, and now the same error lands on load.Fix
_patch_mistral_regexso any exception from the inner HF metadata check is swallowed and the tokenizer is returned unchanged. Voicebox never loads Mistral models, so the regex rewrite this check gates is a no-op for us anyway — matches the success-path behavior for non-Mistral repos (tokenization_utils_base.py:2503).force_offline_if_cachedwraps from every load path (pytorch_backendQwen + Whisper,qwen_custom_voice_backend,mlx_backendQwen + Whisper). With the mistral patch in place they provide zero value and only risk re-introducing the same class of bug. Helper + its unit tests stay untouched.backend/tests/test_offline_patch.pycoversOfflineModeIsEnabled/ConnectionErrorsuppression, success pass-through, idempotence, and the missing-method no-op path.Code example
Test plan
pytest tests/test_offline_patch.py tests/test_offline_guard.py -v— 11/11 greenHF_HUB_OFFLINE=1+ calling_patch_mistral_regexon a Qwen repo ID returns the tokenizer unchanged instead of raisingmodel_infotimeout (closes the 0.4.4 "known caveat")HF_HUB_OFFLINE=1 just serverload works (skyooo's hf-mirror case)just build, run cached Qwen load against the PyInstaller-frozen server to confirm the import-time patch fires inside the frozen runtimeFixes #526.
Note
Medium Risk
Touches model-loading paths and globally monkey-patches
transformerstokenizer behavior, which could affect all HuggingFace/Transformers loads if the wrapper masks unexpected errors. Changes are targeted to offline/network-failure scenarios and covered by new unit tests.Overview
Fixes offline-mode crashes during model/tokenizer loading by adding
patch_transformers_mistral_regex()inutils/hf_offline_patch.py, which wrapsPreTrainedTokenizerBase._patch_mistral_regexto swallow HuggingFacemodel_info()failures and return the tokenizer unchanged.Removes
force_offline_if_cached(...)usage from MLX, PyTorch (Qwen TTS + Whisper), andqwen_custom_voice_backendload paths so cached models no longer flip global offline mode duringfrom_pretrained/load.Adds
test_offline_patch.pyto validate the new wrapper (suppressesOfflineModeIsEnabled/network errors, passes through on success, is idempotent, and no-ops when the upstream method is missing).Reviewed by Cursor Bugbot for commit fb9fe30. Configure here.
Summary by CodeRabbit
Bug Fixes
Tests