Skip to content

fix(offline): patch transformers mistral-regex check to survive HF failures#530

Merged
jamiepine merged 2 commits intomainfrom
investigate/hf-offline-526
Apr 22, 2026
Merged

fix(offline): patch transformers mistral-regex check to survive HF failures#530
jamiepine merged 2 commits intomainfrom
investigate/hf-offline-526

Conversation

@jamiepine
Copy link
Copy Markdown
Owner

@jamiepine jamiepine commented Apr 22, 2026

Summary

Fixes the "Cannot reach ... offline mode is enabled" crash hitting users on 0.4.4 (#526). Users updating from 0.4.3 expected 0.4.4 to fix this error — it didn't, because 0.4.4's PR #524 only reverted the inference-path guards. The same trap on the load path has been shipping since 0.4.2 and was masked by the more visible inference-path regression.

Root cause

transformers 4.57.x's PreTrainedTokenizerBase._patch_mistral_regex calls huggingface_hub.model_info(repo_id) unconditionally for every non-local tokenizer load, to probe whether the repo is a Mistral variant. That call raises on HF_HUB_OFFLINE=1, on network outages, and on blocked HF endpoints. _patch_mistral_regex doesn't catch any of it — the exception bubbles out of from_pretrained and kills the load.

0.4.2's load-time force_offline_if_cached guard (from #503) walked straight into this: on cached online users it flipped HF_HUB_OFFLINE=1, converting a healthy load into OfflineModeIsEnabled. 0.4.3's inference-path guard got hit first so nobody reported the load-path one. #524 removed the inference guard, users got further into the flow, and now the same error lands on load.

Fix

  • Wrap _patch_mistral_regex so any exception from the inner HF metadata check is swallowed and the tokenizer is returned unchanged. Voicebox never loads Mistral models, so the regex rewrite this check gates is a no-op for us anyway — matches the success-path behavior for non-Mistral repos (tokenization_utils_base.py:2503).
  • Drop force_offline_if_cached wraps from every load path (pytorch_backend Qwen + Whisper, qwen_custom_voice_backend, mlx_backend Qwen + Whisper). With the mistral patch in place they provide zero value and only risk re-introducing the same class of bug. Helper + its unit tests stay untouched.
  • New test file backend/tests/test_offline_patch.py covers OfflineModeIsEnabled / ConnectionError suppression, success pass-through, idempotence, and the missing-method no-op path.

Code example

def patch_transformers_mistral_regex():
    original = getattr(PreTrainedTokenizerBase, "_patch_mistral_regex", None)
    if original is None:
        return

    def safe_patch_mistral_regex(cls, tokenizer, pretrained_model_name_or_path, *args, **kwargs):
        try:
            return original(tokenizer, pretrained_model_name_or_path, *args, **kwargs)
        except Exception as exc:
            logger.debug(
                "[mistral-regex-patch] suppressed %s for %r, returning tokenizer as-is",
                type(exc).__name__, pretrained_model_name_or_path,
            )
            return tokenizer

    PreTrainedTokenizerBase._patch_mistral_regex = classmethod(safe_patch_mistral_regex)

Test plan

  • pytest tests/test_offline_patch.py tests/test_offline_guard.py -v — 11/11 green
  • Import-time smoke test: HF_HUB_OFFLINE=1 + calling _patch_mistral_regex on a Qwen repo ID returns the tokenizer unchanged instead of raising
  • Online cached Qwen CustomVoice 1.7B load (the 0.4.4 regression) succeeds
  • Online cached TADA 3B load succeeds (was also reported failing in HF Hub Offline #526 comments)
  • Offline cached load completes without the 30s model_info timeout (closes the 0.4.4 "known caveat")
  • HF_HUB_OFFLINE=1 just server load works (skyooo's hf-mirror case)
  • Bundled-binary smoke test: just build, run cached Qwen load against the PyInstaller-frozen server to confirm the import-time patch fires inside the frozen runtime

Fixes #526.


Note

Medium Risk
Touches model-loading paths and globally monkey-patches transformers tokenizer behavior, which could affect all HuggingFace/Transformers loads if the wrapper masks unexpected errors. Changes are targeted to offline/network-failure scenarios and covered by new unit tests.

Overview
Fixes offline-mode crashes during model/tokenizer loading by adding patch_transformers_mistral_regex() in utils/hf_offline_patch.py, which wraps PreTrainedTokenizerBase._patch_mistral_regex to swallow HuggingFace model_info() failures and return the tokenizer unchanged.

Removes force_offline_if_cached(...) usage from MLX, PyTorch (Qwen TTS + Whisper), and qwen_custom_voice_backend load paths so cached models no longer flip global offline mode during from_pretrained/load.

Adds test_offline_patch.py to validate the new wrapper (suppresses OfflineModeIsEnabled/network errors, passes through on success, is idempotent, and no-ops when the upstream method is missing).

Reviewed by Cursor Bugbot for commit fb9fe30. Configure here.

Summary by CodeRabbit

  • Bug Fixes

    • Model loading now respects standard offline behavior across MLX, PyTorch, and Qwen backends, reducing unexpected network access.
    • Tokenizer initialization for Mistral models now fails gracefully when metadata lookups are unavailable, improving offline robustness.
  • Tests

    • Added unit tests validating the offline/connection-failure patch and its idempotent, safe behavior.

…ilures

transformers 4.57.x's `PreTrainedTokenizerBase._patch_mistral_regex` calls
`huggingface_hub.model_info(repo_id)` unconditionally during any non-local
tokenizer load to probe for Mistral-family models. The call raises on
`HF_HUB_OFFLINE=1`, on network outages, and on slow/blocked HF endpoints,
and transformers doesn't catch any of it — the exception bubbles out of
`from_pretrained` and kills the load for unrelated engines (Qwen TTS,
Qwen CustomVoice, TADA, etc.).

0.4.2's load-time `force_offline_if_cached` guard walked straight into
this trap: on cached online users it flipped `HF_HUB_OFFLINE=1` and
converted a healthy load into a hard crash. 0.4.3's inference-path guard
masked it; #524 removed the inference guard in 0.4.4, and users updating
to 0.4.4 started hitting the same error on the load path instead
(#526).

Fix:
- Wrap `_patch_mistral_regex` so any exception from the inner HF
  metadata check is swallowed and the tokenizer is returned unchanged.
  Voicebox never loads Mistral models, so the regex rewrite this check
  gates is a no-op for us; matches the success-path behavior for
  non-Mistral repos (tokenization_utils_base.py:2503).
- Drop the `force_offline_if_cached` wraps from every load path
  (pytorch_backend Qwen + Whisper, qwen_custom_voice_backend,
  mlx_backend Qwen + Whisper). With the mistral patch in place they
  provide zero value and only risk re-introducing the same class of
  bug. Helper and its unit tests stay — still correct for targeted
  future use.
- Add `backend/tests/test_offline_patch.py` covering
  OfflineModeIsEnabled / ConnectionError suppression, success
  pass-through, idempotence, and the missing-method no-op path.

Fixes #526.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c6f57cad-55f6-4e7c-8512-be0470eafc63

📥 Commits

Reviewing files that changed from the base of the PR and between fb9fe30 and bc5faa2.

📒 Files selected for processing (1)
  • backend/backends/__init__.py

📝 Walkthrough

Walkthrough

Removed conditional offline-forcing wrappers from model-loading paths (MLX, PyTorch, Qwen) and added an idempotent HF patch function that wraps Transformers' _patch_mistral_regex to suppress metadata-lookup failures (e.g., offline errors), installing that patch at backend import time.

Changes

Cohort / File(s) Summary
Backend Model Loading
backend/backends/mlx_backend.py, backend/backends/pytorch_backend.py, backend/backends/qwen_custom_voice_backend.py
Removed force_offline_if_cached(...) wrappers around TTS/STT model loading; loads now call library load/from_pretrained directly and rely on global HF offline behavior.
HF Offline Patch Infrastructure
backend/utils/hf_offline_patch.py, backend/backends/__init__.py
Added _mistral_regex_patched flag and patch_transformers_mistral_regex() to install an exception-suppressing wrapper around PreTrainedTokenizerBase._patch_mistral_regex; module import now triggers the patch (when enabled).
Tests
backend/tests/test_offline_patch.py
New unit tests for patch_transformers_mistral_regex(): verifies suppression of OfflineModeIsEnabled/ConnectionError, pass-through behavior, idempotency, and no-op when target method is absent; includes fixture to restore global state.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Backend as Backend (model load)
    participant HFPatch as HF Offline Patch
    participant Transformers as Transformers
    participant HFHub as HuggingFace Hub

    Client->>Backend: request model load
    Backend->>HFPatch: ensure patch installed (import/init)
    HFPatch->>Transformers: wrap _patch_mistral_regex (idempotent)
    Backend->>Transformers: call from_pretrained()/load()
    Transformers->>Transformers: invoke _patch_mistral_regex
    Transformers->>HFHub: model_info() lookup
    alt HF metadata unreachable (offline/conn error)
        HFHub-->>Transformers: raises OfflineModeIsEnabled/ConnectionError
        Transformers->>Transformers: wrapper suppresses exception -> return tokenizer unchanged
    else Metadata available
        HFHub-->>Transformers: model metadata
        Transformers->>Transformers: apply mistral regex patch if needed
    end
    Transformers-->>Backend: model/processor returned
    Backend-->>Client: model ready
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I hopped through code with nimble feet,
I patched the regex, kept errors neat.
No forced-offline scares in sight,
Tokenizers now sleep well at night. 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(offline): patch transformers mistral-regex check to survive HF failures' accurately describes the main change: adding a patch to handle HuggingFace failures in the mistral-regex check.
Linked Issues check ✅ Passed The PR successfully addresses issue #526 by preventing OfflineModeIsEnabled errors from aborting model loads through the new patch_transformers_mistral_regex wrapper and removing problematic force_offline_if_cached wrappers.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the HuggingFace offline mode crash: new patch logic, removal of force_offline_if_cached wrappers, tests for the patch, and early patch installation via backends/init.py.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch investigate/hf-offline-526

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jamiepine jamiepine mentioned this pull request Apr 22, 2026
@jamiepine jamiepine marked this pull request as ready for review April 22, 2026 01:24
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/backends/pytorch_backend.py (1)

1-24: ⚠️ Potential issue | 🟠 Major

Same patch-wiring concern as qwen_custom_voice_backend.py.

See the major-issue comment on qwen_custom_voice_backend.py — this file similarly does not import backend.utils.hf_offline_patch, so patch_transformers_mistral_regex() is not guaranteed to have run before WhisperProcessor.from_pretrained(...) / WhisperForConditionalGeneration.from_pretrained(...) at lines 298-299 or Qwen3TTSModel.from_pretrained(...) at lines 109/116. If the verification there shows no app-level entry point imports the patch module, consider adding an import + patch_transformers_mistral_regex() call at the top of this module, mirroring mlx_backend.py (lines 13-18).

Not re-opening the verification script here — the one in qwen_custom_voice_backend.py covers the whole repo.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/pytorch_backend.py` around lines 1 - 24, This file may call
WhisperProcessor.from_pretrained,
WhisperForConditionalGeneration.from_pretrained and
Qwen3TTSModel.from_pretrained before the transformers regex patch is applied;
import backend.utils.hf_offline_patch and call
patch_transformers_mistral_regex() near the top of this module (mirroring
mlx_backend.py) so the patch runs before any of those from_pretrained calls;
reference the function patch_transformers_mistral_regex() and the
classes/methods WhisperProcessor.from_pretrained,
WhisperForConditionalGeneration.from_pretrained, and
Qwen3TTSModel.from_pretrained to locate where the import+call must be added.
backend/backends/qwen_custom_voice_backend.py (1)

17-32: ⚠️ Potential issue | 🟠 Major

The mistral-regex patch is NOT guaranteed to run on non-MLX platforms if qwen_custom_voice or pytorch backends are selected.

patch_transformers_mistral_regex() is installed only when backend/utils/hf_offline_patch.py is imported (module-level invocation at line 269). Currently, only mlx_backend.py imports this module. On non-MLX platforms (Windows/Linux/CUDA), get_backend_type() returns "pytorch", and if a user selects the qwen_custom_voice engine, the QwenCustomVoiceBackend is imported directly without the patch being installed. The OfflineModeIsEnabled crash from issue #526 will still surface inside Qwen3TTSModel.from_pretrained(...).

Add an explicit import and call in qwen_custom_voice_backend.py and pytorch_backend.py:

from ..utils.hf_offline_patch import patch_huggingface_hub_offline, ensure_original_qwen_config_cached, patch_transformers_mistral_regex

# At module level:
patch_huggingface_hub_offline()
patch_transformers_mistral_regex()
ensure_original_qwen_config_cached()

Or ensure backend/utils/hf_offline_patch is imported unconditionally at application startup (e.g., backend/__init__.py or backend/app.py).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/qwen_custom_voice_backend.py` around lines 17 - 32, Import
and invoke the HF offline/patch helpers at module import time so the mistral
regex and offline behavior are always installed: add an unconditional import of
backend.utils.hf_offline_patch and call patch_huggingface_hub_offline(),
patch_transformers_mistral_regex(), and ensure_original_qwen_config_cached() at
module level in qwen_custom_voice_backend.py (and mirror the same in
pytorch_backend.py) so Qwen3TTSModel.from_pretrained(...) cannot trigger the
OfflineModeIsEnabled crash when qwen_custom_voice is selected on non-MLX
platforms; reference the helper symbols patch_huggingface_hub_offline,
patch_transformers_mistral_regex, and ensure_original_qwen_config_cached to
locate the needed functions.
🧹 Nitpick comments (2)
backend/tests/test_offline_patch.py (2)

105-109: test_missing_method_is_noop leaks state into subsequent tests.

When monkeypatch.delattr(..., raising=False) runs, the autouse fixture's saved = PreTrainedTokenizerBase.__dict__.get("_patch_mistral_regex") was already captured at fixture setup (so restore should work), but there is a subtle ordering issue: monkeypatch teardown runs before the restore_mistral_regex fixture teardown, so the final state is saved (correct). However, if this is the first test and the original method was never captured (hypothetically inherited), the fixture wouldn't restore it. Not a bug today, but add a sanity assertion — e.g., assert "_patch_mistral_regex" in PreTrainedTokenizerBase.__dict__ at fixture entry — to fail loudly if a future transformers release moves the method.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_offline_patch.py` around lines 105 - 109, Add a sanity
assertion at the start of the autouse fixture that saves/restores the tokenizer
method (e.g., restore_mistral_regex) to ensure the original method exists on
PreTrainedTokenizerBase: insert assert "_patch_mistral_regex" in
PreTrainedTokenizerBase.__dict__ at the top of the fixture (before capturing
saved = PreTrainedTokenizerBase.__dict__.get("_patch_mistral_regex")), so tests
like test_missing_method_is_noop will fail loudly if a future transformers
release removes or moves the method.

14-24: Prefer package-relative imports over sys.path mutation.

sys.path.insert(0, str(Path(__file__).parent.parent)) followed by import utils.hf_offline_patch as hf_offline_patch works but (a) depends on CWD/collection order under pytest, (b) shadows any top-level utils package, and (c) skips the backend/__init__.py package init. Preferred pattern if backend/ is a package: from backend.utils import hf_offline_patch as hf_offline_patch (with a conftest.py at the repo root setting the rootdir), matching how the runtime code imports it (from ..utils.hf_offline_patch import ...).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_offline_patch.py` around lines 14 - 24, Replace the
sys.path mutation pattern and top-level import with a package-relative import:
remove the sys.path.insert(0, str(Path(__file__).parent.parent)) line and stop
using "import utils.hf_offline_patch as hf_offline_patch"; instead import the
module via the package namespace used in production (e.g., from backend.utils
import hf_offline_patch or from ..utils import hf_offline_patch depending on
test package layout) so the test uses package-relative imports and loads
backend/__init__.py correctly; ensure tests' conftest/rootdir is configured so
the package import resolves under pytest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/tests/test_offline_patch.py`:
- Around line 79-94: The test test_passthrough_on_success is currently passing
due to an AttributeError swallowed by the wrapper in
PreTrainedTokenizerBase._patch_mistral_regex because sentinel lacks attributes
the wrapper touches; fix by making the test exercise the actual non-Mistral
fall-through: either rename the test to reflect it checks suppression of
attribute errors, or (preferred) construct/provide a real or mocked tokenizer
object that exposes the attributes the wrapper accesses (e.g., pre_tokenizer or
backend_tokenizer, any methods _patch_mistral_regex reads, and support for
setting fix_mistral_regex) and ensure huggingface_hub.model_info returns
FakeInfo with non-mistral tags; also mock transformers.utils.hub.cached_file to
avoid network/filesystem access so the original path executes and returns the
original tokenizer instead of raising.

---

Outside diff comments:
In `@backend/backends/pytorch_backend.py`:
- Around line 1-24: This file may call WhisperProcessor.from_pretrained,
WhisperForConditionalGeneration.from_pretrained and
Qwen3TTSModel.from_pretrained before the transformers regex patch is applied;
import backend.utils.hf_offline_patch and call
patch_transformers_mistral_regex() near the top of this module (mirroring
mlx_backend.py) so the patch runs before any of those from_pretrained calls;
reference the function patch_transformers_mistral_regex() and the
classes/methods WhisperProcessor.from_pretrained,
WhisperForConditionalGeneration.from_pretrained, and
Qwen3TTSModel.from_pretrained to locate where the import+call must be added.

In `@backend/backends/qwen_custom_voice_backend.py`:
- Around line 17-32: Import and invoke the HF offline/patch helpers at module
import time so the mistral regex and offline behavior are always installed: add
an unconditional import of backend.utils.hf_offline_patch and call
patch_huggingface_hub_offline(), patch_transformers_mistral_regex(), and
ensure_original_qwen_config_cached() at module level in
qwen_custom_voice_backend.py (and mirror the same in pytorch_backend.py) so
Qwen3TTSModel.from_pretrained(...) cannot trigger the OfflineModeIsEnabled crash
when qwen_custom_voice is selected on non-MLX platforms; reference the helper
symbols patch_huggingface_hub_offline, patch_transformers_mistral_regex, and
ensure_original_qwen_config_cached to locate the needed functions.

---

Nitpick comments:
In `@backend/tests/test_offline_patch.py`:
- Around line 105-109: Add a sanity assertion at the start of the autouse
fixture that saves/restores the tokenizer method (e.g., restore_mistral_regex)
to ensure the original method exists on PreTrainedTokenizerBase: insert assert
"_patch_mistral_regex" in PreTrainedTokenizerBase.__dict__ at the top of the
fixture (before capturing saved =
PreTrainedTokenizerBase.__dict__.get("_patch_mistral_regex")), so tests like
test_missing_method_is_noop will fail loudly if a future transformers release
removes or moves the method.
- Around line 14-24: Replace the sys.path mutation pattern and top-level import
with a package-relative import: remove the sys.path.insert(0,
str(Path(__file__).parent.parent)) line and stop using "import
utils.hf_offline_patch as hf_offline_patch"; instead import the module via the
package namespace used in production (e.g., from backend.utils import
hf_offline_patch or from ..utils import hf_offline_patch depending on test
package layout) so the test uses package-relative imports and loads
backend/__init__.py correctly; ensure tests' conftest/rootdir is configured so
the package import resolves under pytest.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0136552b-b2ff-4bff-b003-ae1cee20a390

📥 Commits

Reviewing files that changed from the base of the PR and between 74e0044 and fb9fe30.

📒 Files selected for processing (5)
  • backend/backends/mlx_backend.py
  • backend/backends/pytorch_backend.py
  • backend/backends/qwen_custom_voice_backend.py
  • backend/tests/test_offline_patch.py
  • backend/utils/hf_offline_patch.py

Comment on lines +79 to +94
def test_passthrough_on_success(monkeypatch):
"""When model_info returns non-Mistral tags the original falls through and returns the tokenizer unchanged."""
_apply_patch()

import huggingface_hub

class FakeInfo:
tags = ["model-type:qwen", "language:en"]

monkeypatch.setattr(huggingface_hub, "model_info", lambda *_a, **_kw: FakeInfo())

sentinel = object()
result = PreTrainedTokenizerBase._patch_mistral_regex(
sentinel, "Qwen/Qwen3-TTS-12Hz-1.7B-Base"
)
assert result is sentinel
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

test_passthrough_on_success likely exercises the exception path, not the success path.

sentinel = object() has none of the attributes the real _patch_mistral_regex touches (it reaches into the tokenizer's pre_tokenizer / backend_tokenizer, calls cached_file(...), does setattr(tokenizer, "fix_mistral_regex", ...), etc.), so original(...) will raise AttributeError/TypeError before the non-Mistral fall-through branch is reached. The wrapper's except Exception swallows that and returns sentinel, so the assertion passes — but for the wrong reason, which defeats the test's stated purpose and hides regressions in the success branch.

Either rename the test to reflect what's actually tested (e.g., test_suppresses_attribute_errors) or feed a real PreTrainedTokenizerFast (e.g., load a tiny cached tokenizer) / a mock with the attributes _patch_mistral_regex accesses so the original can fall through naturally when FakeInfo.tags lacks base_model:*mistralai. Also consider mocking transformers.utils.hub.cached_file to avoid a real filesystem/network hit.

🧰 Tools
🪛 Ruff (0.15.10)

[warning] 86-86: Mutable default value for class attribute

(RUF012)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_offline_patch.py` around lines 79 - 94, The test
test_passthrough_on_success is currently passing due to an AttributeError
swallowed by the wrapper in PreTrainedTokenizerBase._patch_mistral_regex because
sentinel lacks attributes the wrapper touches; fix by making the test exercise
the actual non-Mistral fall-through: either rename the test to reflect it checks
suppression of attribute errors, or (preferred) construct/provide a real or
mocked tokenizer object that exposes the attributes the wrapper accesses (e.g.,
pre_tokenizer or backend_tokenizer, any methods _patch_mistral_regex reads, and
support for setting fix_mistral_regex) and ensure huggingface_hub.model_info
returns FakeInfo with non-mistral tags; also mock
transformers.utils.hub.cached_file to avoid network/filesystem access so the
original path executes and returns the original tokenizer instead of raising.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit fb9fe30. Configure here.

)
from ..utils.cache import get_cache_key, get_cached_voice_prompt, cache_voice_prompt
from ..utils.audio import load_audio
from ..utils.hf_offline_patch import force_offline_if_cached
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mistral regex patch never applied for PyTorch backends

High Severity

Removing the force_offline_if_cached import from pytorch_backend.py and qwen_custom_voice_backend.py means neither file imports hf_offline_patch at all anymore. The module-level patch_transformers_mistral_regex() call in hf_offline_patch.py only runs when the module is first imported, which only happens via mlx_backend.py. On Windows/Linux (PyTorch), hf_offline_patch is never imported, so the core fix of this PR — the mistral regex wrapper — is never applied.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit fb9fe30. Configure here.

The previous commit left the patch wired only through ``mlx_backend.py``'s
existing import of ``hf_offline_patch``. On Windows/Linux/CUDA users who
never load the MLX backend (everyone who hit #526), the patch module was
never imported, so ``patch_transformers_mistral_regex`` never ran and the
crash persisted.

Hoist the import into ``backends/__init__.py``. Every backend imports from
this package, so the module-level patch install runs before any
``from_pretrained`` call regardless of which engine the user picks.

Caught by CodeRabbit and Cursor Bugbot on #530.
@jamiepine
Copy link
Copy Markdown
Owner Author

Good catch from the bots — both flagged that patch_transformers_mistral_regex() was only being installed through mlx_backend.py's existing hf_offline_patch import, which meant Windows / Linux / CUDA users (i.e. everyone who actually hit #526) would never run the patch and the crash would persist.

Fixed in bc5faa2: moved the import to backends/__init__.py. Every backend module imports from this package, so the module-level patch install runs before any from_pretrained call regardless of which engine is loaded.

Verified with HF_HUB_OFFLINE=1 + a fresh interpreter that imports backend.backends — the wrapper is installed, and _patch_mistral_regex returns the tokenizer unchanged instead of raising OfflineModeIsEnabled.

@jamiepine jamiepine merged commit d61e884 into main Apr 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HF Hub Offline

1 participant