Skip to content

bug: HF_HUB_OFFLINE=1 bypasses local HF cache and triggers GCS download #615

@amasolov

Description

@amasolov

Describe the bug

When HF_HUB_OFFLINE=1 is set, download_model() does not properly use the local HuggingFace cache. Instead, the model_info() API call fails immediately with an EnvironmentError (offline mode), and fastembed falls back to downloading ~83MB from storage.googleapis.com. In air-gapped environments where Google Cloud Storage is also unreachable, fastembed cannot load models that are already present in the local cache.

Expected behavior

With HF_HUB_OFFLINE=1, fastembed should resolve models from the local HF cache (models--org--name/snapshots/...) without any network calls.

Actual behavior

  1. download_model() calls download_files_from_huggingface() with local_files_only=False
  2. Inside, model_info(hf_source_repo) is called — this is a network API call
  3. With HF_HUB_OFFLINE=1, huggingface_hub raises EnvironmentError: offline mode is enabled
  4. fastembed catches this and logs "Could not download model from HuggingFace... Falling back to other sources."
  5. Falls back to retrieve_model_gcs() → downloads ~83MB from storage.googleapis.com
  6. In air-gapped environments, GCS is also unreachable → complete failure

Note: on current main (v0.7.x), there is a local_files_only=True first pass before the retry loop. However, if that pass fails for any reason (e.g. missing metadata file), the retry loop still hits the network path described above.

Steps to reproduce

from fastembed import TextEmbedding
import os

# Step 1: Download the model (populates HF cache)
TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")

# Step 2: Enable offline mode
os.environ["HF_HUB_OFFLINE"] = "1"

# Step 3: Try to load the same model — triggers GCS download instead of using local cache
TextEmbedding("sentence-transformers/all-MiniLM-L6-v2")

Environment

  • fastembed 0.6.0 (pinned in container image) and 0.7.4 (current main)
  • Deployed in NeMo Guardrails on Red Hat OpenShift AI
  • Corporate air-gapped environment (both HuggingFace and Google Cloud Storage unreachable)

Fix

PR #614 — sets local_files_only=True in download_model() when HF_HUB_OFFLINE=1 is detected, ensuring zero network calls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions