Skip to content

fix: defer fastembed import errors to use time instead of import time#280

Merged
bashandbone merged 2 commits intoknitli:mainfrom
aiedwardyi:fix/defer-fastembed-import-error
Mar 27, 2026
Merged

fix: defer fastembed import errors to use time instead of import time#280
bashandbone merged 2 commits intoknitli:mainfrom
aiedwardyi:fix/defer-fastembed-import-error

Conversation

@aiedwardyi
Copy link
Copy Markdown
Contributor

@aiedwardyi aiedwardyi commented Mar 26, 2026

Summary

  • Replace import-time ConfigurationError raises with conditional imports using has_package() sentinels, following the existing pattern in service_cards.py
  • Modules can now be safely imported when fastembed is unavailable — errors are deferred to the point where fastembed functionality is actually invoked
  • Applies the fix consistently across all 3 affected files: fastembed_extensions.py, embedding/providers/fastembed.py, and reranking/providers/fastembed.py

Approach

Each file follows the same pattern:

  1. Sentinel check_FASTEMBED_AVAILABLE = has_package("fastembed") or has_package("fastembed-gpu")
  2. Conditional import — real imports under TYPE_CHECKING or _FASTEMBED_AVAILABLE, Any placeholders in the else branch
  3. Use-time guard_require_fastembed() raises ConfigurationError only when fastembed features are actually called

This mirrors the lazy import convention already used in service_cards.py and keeps type checking fully functional.

Test plan

  • Import chain no longer raises when fastembed is not installed
  • Python 3.14 test collection succeeds (fastembed-specific tests skipped via requires_fastembed marker)
  • Existing tests pass when fastembed IS installed — no behavior change
  • ConfigurationError is still raised with a clear message when fastembed functionality is invoked without the package

Closes #279

🤖 Generated with Claude Code

Summary by Sourcery

Defer FastEmbed dependency errors from import time to use time across embedding and reranking providers, allowing the package to be imported without FastEmbed installed while still failing clearly when FastEmbed-backed functionality is invoked.

Bug Fixes:

  • Prevent import-time failures when FastEmbed is not installed by guarding FastEmbed-dependent code with runtime availability checks in embedding and reranking providers.

Enhancements:

  • Introduce shared FastEmbed availability sentinels and lazy-loading patterns in embedding extensions and providers to align with existing lazy import conventions.

Replace import-time ConfigurationError raises with conditional imports
using has_package() sentinels, following the existing pattern in
service_cards.py. Modules can now be safely imported even when fastembed
is unavailable — errors are raised only when fastembed functionality is
actually invoked.

Files changed:
- fastembed_extensions.py: guard imports + model constants, add
  _require_fastembed() check in provider functions
- embedding/providers/fastembed.py: conditional import with Any fallbacks
- reranking/providers/fastembed.py: conditional import with Any fallbacks

Closes knitli#279

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 26, 2026 04:46
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 26, 2026

Reviewer's Guide

Defers FastEmbed-related ConfigurationError exceptions from import time to use time by introducing package-availability sentinels, conditional imports, and runtime guards across the fastembed extensions, embedding provider, and reranking provider modules.

Sequence diagram for deferred FastEmbed ConfigurationError at use time

sequenceDiagram
    actor User
    participant Application
    participant FastEmbedEmbeddingProvider
    participant fastembed_extensions
    participant FastEmbedLibrary

    User->>Application: request_embeddings(texts)
    Application->>FastEmbedEmbeddingProvider: embed(texts)

    alt First use of FastEmbedEmbeddingProvider
        FastEmbedEmbeddingProvider->>fastembed_extensions: get_text_embedder()
        fastembed_extensions->>fastembed_extensions: _require_fastembed()
        alt FastEmbed is installed
            fastembed_extensions->>FastEmbedLibrary: import TextEmbedding, DenseModelDescription, BaseModelDescription
            fastembed_extensions-->>FastEmbedEmbeddingProvider: TextEmbedding subclass with custom models
            FastEmbedEmbeddingProvider->>FastEmbedLibrary: TextEmbedding(texts)
            FastEmbedLibrary-->>FastEmbedEmbeddingProvider: embeddings
            FastEmbedEmbeddingProvider-->>Application: embeddings
            Application-->>User: embeddings
        else FastEmbed is not installed
            fastembed_extensions-->>FastEmbedEmbeddingProvider: raise ConfigurationError
            FastEmbedEmbeddingProvider-->>Application: propagate ConfigurationError
            Application-->>User: error response
        end
    else Subsequent uses
        FastEmbedEmbeddingProvider->>FastEmbedLibrary: reuse _TextEmbedding(texts)
        FastEmbedLibrary-->>FastEmbedEmbeddingProvider: embeddings
        FastEmbedEmbeddingProvider-->>Application: embeddings
        Application-->>User: embeddings
    end
Loading

Flow diagram for FastEmbed provider import and sentinel-based initialization

flowchart TD
    A[Module import: embedding/providers/fastembed.py] --> B[Call has_package fastembed]
    B --> C[Call has_package fastembed-gpu]
    C --> D{Any FastEmbed package available?}

    D -- Yes --> E[Set _FASTEMBED_AVAILABLE to True]
    D -- No --> F[Set _FASTEMBED_AVAILABLE to False]

    E --> G[Import TextEmbedding, SparseTextEmbedding under TYPE_CHECKING or runtime]
    F --> H[Assign TextEmbedding = Any, SparseTextEmbedding = Any]

    G --> I[Import get_text_embedder and get_sparse_embedder]
    I --> J[Initialize _TextEmbedding and _SparseTextEmbedding via helpers]

    F --> K[Set _TextEmbedding = None, _SparseTextEmbedding = None]

    J --> L[Provider methods use _TextEmbedding and _SparseTextEmbedding]
    K --> L

    L --> M{FastEmbed requested at runtime?}

    M -- Yes, FastEmbed available --> N[Use underlying FastEmbed classes normally]
    M -- Yes, FastEmbed missing --> O[_require_fastembed raises ConfigurationError]
    M -- No --> P[No FastEmbed usage, module import remains successful]

    %% fastembed_extensions guard
    Q[Module import: fastembed_extensions.py] --> R[Compute _FASTEMBED_AVAILABLE with has_package]
    R --> S{_FASTEMBED_AVAILABLE?}

    S -- Yes --> T[Import real FastEmbed model description types and TextCrossEncoder]
    S -- No --> U[Assign BaseModelDescription, DenseModelDescription, ModelSource, PoolingType, TextCrossEncoder, SparseTextEmbedding, TextEmbedding to Any]

    T --> V[Populate DENSE_MODELS and RERANKING_MODELS tuples]
    U --> W[Set DENSE_MODELS = empty tuple, RERANKING_MODELS = empty tuple]

    V --> X[Runtime helpers call _require_fastembed before using FastEmbed]
    W --> X

    %% reranking provider
    AA[Module import: reranking/providers/fastembed.py] --> AB[Compute _FASTEMBED_AVAILABLE with has_package]
    AB --> AC{_FASTEMBED_AVAILABLE?}

    AC -- Yes --> AD[Import TextCrossEncoder under TYPE_CHECKING or runtime]
    AC -- No --> AE[Assign TextCrossEncoder = Any]

    AD --> AF[FastEmbedRerankingProvider uses real TextCrossEncoder]
    AE --> AF[FastEmbedRerankingProvider type checks but will error only on actual FastEmbed use]

    AF --> AG[Import completes without raising even if FastEmbed is missing]
Loading

File-Level Changes

Change Details Files
Introduce fastembed availability sentinel and conditional imports in fastembed extensions module to avoid import-time failures while preserving model registries when available.
  • Add TYPE_CHECKING/Any-based conditional imports guarded by a _FASTEMBED_AVAILABLE flag using has_package for fastembed and fastembed-gpu.
  • Replace import-time ConfigurationError raising with a _require_fastembed() helper that is called at the beginning of public accessors.
  • Wrap DENSE_MODELS and RERANKING_MODELS definitions in an _FASTEMBED_AVAILABLE conditional and provide empty fallbacks when fastembed is missing.
src/codeweaver/providers/embedding/fastembed_extensions.py
Lazy-load fastembed embedding classes and model getters in the embedding provider to allow module import without fastembed installed.
  • Introduce a _FASTEMBED_AVAILABLE flag using has_package and gate fastembed imports behind TYPE_CHECKING or that flag with Any fallbacks.
  • Remove the import-time ConfigurationError and instead only initialize _TextEmbedding and _SparseTextEmbedding when fastembed is available, using None placeholders otherwise.
src/codeweaver/providers/embedding/providers/fastembed.py
Make the FastEmbed reranking provider resilient to missing fastembed by using a sentinel and conditional import instead of raising at import time.
  • Add has_package-based _FASTEMBED_AVAILABLE flag and guard TextCrossEncoder import behind TYPE_CHECKING or that flag with Any as a fallback.
  • Remove the try/except ImportError block that logged a warning and raised ConfigurationError during module import.
src/codeweaver/providers/reranking/providers/fastembed.py

Assessment against linked issues

Issue Objective Addressed Explanation
#279 Modify src/codeweaver/providers/embedding/fastembed_extensions.py so that it no longer raises ConfigurationError at import time when fastembed is missing, allowing the module (and any import chain touching it) to be imported even if fastembed is not installed.
#279 Defer fastembed-related errors in fastembed_extensions.py to the point where fastembed functionality is actually invoked, using the project’s lazy import pattern (e.g., sentinel-based availability checks and runtime guards).

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

👋 Hey @aiedwardyi,

Thanks for your contribution to codeweaver! 🧵

You need to agree to the CLA first... 🖊️

Before we can accept your contribution, you need to agree to our Contributor License Agreement (CLA).

To agree to the CLA, please comment:

I read the contributors license agreement and I agree to it.

Those exact words are important1, so please don't change them. 😉

You can read the full CLA here: Contributor License Agreement


@aiedwardyi has signed the CLA.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Footnotes

  1. Our bot needs those exact words to recognize that you agree to the CLA.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In embedding/providers/fastembed.py, _TextEmbedding and _SparseTextEmbedding are set to None when FastEmbed is unavailable; consider mirroring the _require_fastembed() pattern used in fastembed_extensions.py so that any code path that instantiates or uses these types raises a clear ConfigurationError instead of hitting None at runtime.
  • In reranking/providers/fastembed.py, FastEmbedRerankingProvider now type-checks without FastEmbed installed, but there is no corresponding runtime guard; it would be safer to add a use-time availability check (similar to _require_fastembed()) in the provider’s constructor or first-use methods to fail with a clear configuration error if FastEmbed is missing.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `embedding/providers/fastembed.py`, `_TextEmbedding` and `_SparseTextEmbedding` are set to `None` when FastEmbed is unavailable; consider mirroring the `_require_fastembed()` pattern used in `fastembed_extensions.py` so that any code path that instantiates or uses these types raises a clear `ConfigurationError` instead of hitting `None` at runtime.
- In `reranking/providers/fastembed.py`, `FastEmbedRerankingProvider` now type-checks without FastEmbed installed, but there is no corresponding runtime guard; it would be safer to add a use-time availability check (similar to `_require_fastembed()`) in the provider’s constructor or first-use methods to fail with a clear configuration error if FastEmbed is missing.

## Individual Comments

### Comment 1
<location path="src/codeweaver/providers/embedding/providers/fastembed.py" line_range="55-56" />
<code_context>
+        )
+
+
+if _FASTEMBED_AVAILABLE:
+    """
+    SPARSE_MODELS = (
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `None` placeholders for embedding classes can lead to unclear runtime failures when fastembed is missing.

Previously this module raised a `ConfigurationError` at import time when FastEmbed was missing. With `_TextEmbedding` and `_SparseTextEmbedding` now set to `None` when `_FASTEMBED_AVAILABLE` is false, callers may hit `AttributeError`/`TypeError` instead of a clear configuration error if they construct or use this provider without checking the flag. Consider either raising `ConfigurationError` in the provider’s constructor/factory when `_FASTEMBED_AVAILABLE` is false, or avoiding `None` placeholders so selection of this provider fails fast with a clear message.
</issue_to_address>

### Comment 2
<location path="src/codeweaver/providers/reranking/providers/fastembed.py" line_range="26-31" />
<code_context>
 from codeweaver.core.di import dependency_provider
+from codeweaver.core.utils import has_package

+_FASTEMBED_AVAILABLE = has_package("fastembed") or has_package("fastembed-gpu")

-try:
</code_context>
<issue_to_address>
**issue (bug_risk):** Removing the configuration error on missing fastembed may cause harder-to-debug runtime issues for reranking.

Previously, a failed `TextCrossEncoder` import raised a `ConfigurationError` with a clear install hint. Now, with `TextCrossEncoder` typed as `Any` when fastembed is missing, `FastEmbedRerankingProvider` can still be constructed and will only fail on first use with a likely opaque error. Please add an explicit check (e.g., in `FastEmbedRerankingProvider.__init__` or its factory) that raises a clear `ConfigurationError` when `_FASTEMBED_AVAILABLE` is false.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make FastEmbed-related modules safe to import when fastembed/fastembed-gpu isn’t installed, deferring dependency errors until FastEmbed functionality is actually used (addressing #279 and improving Python 3.14+ test collection behavior).

Changes:

  • Added _FASTEMBED_AVAILABLE sentinels based on has_package() and conditional TYPE_CHECKING imports across FastEmbed embedding/reranking modules.
  • Introduced a use-time _require_fastembed() guard in fastembed_extensions.py and gated model registries behind availability checks.
  • Replaced prior import-time ConfigurationError raises with lazy patterns intended to defer failures.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/codeweaver/providers/reranking/providers/fastembed.py Switches FastEmbed import logic to a sentinel + conditional imports to avoid import-time failure when dependency is absent.
src/codeweaver/providers/embedding/providers/fastembed.py Applies the same sentinel + conditional import approach for embedding/sparse embedding providers.
src/codeweaver/providers/embedding/fastembed_extensions.py Adds _require_fastembed() and gates model registries; intended to centralize use-time dependency errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


raise ConfigurationError(
"fastembed is not installed. Please install it with "
"`pip install code-weaver[fastembed]` or `codeweaver[fastembed-gpu]`."
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ConfigurationError install hint uses pip install codeweaver[fastembed-gpu], but the project’s distribution name is code-weaver (see pyproject.toml). Using the wrong name will send users to a failing install command. Update the message so both extras use the correct distribution name consistently.

Suggested change
"`pip install code-weaver[fastembed]` or `codeweaver[fastembed-gpu]`."
"`pip install code-weaver[fastembed]` or `pip install code-weaver[fastembed-gpu]`."

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +18
_FASTEMBED_AVAILABLE = has_package("fastembed") or has_package("fastembed-gpu")

try:
if TYPE_CHECKING or _FASTEMBED_AVAILABLE:
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes import-time behavior (modules should be importable when fastembed isn’t available). There doesn’t appear to be a regression test asserting that these modules can be imported when has_package("fastembed")/has_package("fastembed-gpu") is false. Adding a unit test that temporarily forces has_package to return false (and reloads these modules) would help prevent reintroducing import-time failures.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +31
_FASTEMBED_AVAILABLE = has_package("fastembed") or has_package("fastembed-gpu")

raise ConfigurationError(
r"FastEmbed is not installed. Please install it with `pip install code-weaver\[fastembed]` or `codeweaver\[fastembed-gpu]`."
) from e
if TYPE_CHECKING or _FASTEMBED_AVAILABLE:
from fastembed.rerank.cross_encoder import TextCrossEncoder
else:
TextCrossEncoder = Any
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_FASTEMBED_AVAILABLE is computed from has_package(...), but the subsequent from fastembed... import is unconditional when that flag is true. If the distribution is present but import fastembed fails (e.g., missing platform wheels/compiled deps), this will still crash at module import time, which defeats the goal of deferring fastembed import errors to use-time. Consider wrapping the fastembed import in try/except ImportError and, on failure, treating fastembed as unavailable (optionally stash the ImportError) so you can raise a ConfigurationError from a small _require_fastembed() guard when the provider is actually used.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to 46
_FASTEMBED_AVAILABLE = has_package("fastembed") or has_package("fastembed-gpu")

try:
if TYPE_CHECKING or _FASTEMBED_AVAILABLE:
from fastembed.sparse import SparseTextEmbedding
from fastembed.text import TextEmbedding

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file now defers the absence of fastembed, but it no longer defers ImportError during fastembed import (the from fastembed... imports run whenever has_package(...) returns true). To fully meet the PR’s goal, the imports should be protected so that a broken/partial fastembed install doesn’t raise at module import time; instead, record the failure and raise ConfigurationError when FastEmbedEmbeddingProvider/FastEmbedSparseProvider functionality is invoked.

Copilot uses AI. Check for mistakes.
Comment on lines 16 to 28
@@ -24,87 +26,104 @@
from fastembed.rerank.cross_encoder import TextCrossEncoder
from fastembed.sparse import SparseTextEmbedding
from fastembed.text import TextEmbedding
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_FASTEMBED_AVAILABLE is based on package presence, but the fastembed imports in the next block are not guarded. If fastembed (or fastembed-gpu) is installed but fails to import (missing binary deps, incompatible Python, etc.), this module will still raise at import time. To actually defer import errors to use-time, wrap these imports in try/except ImportError and fall back to the Any placeholders + _require_fastembed() raising ConfigurationError when the exported helpers are called.

Copilot uses AI. Check for mistakes.
@aiedwardyi
Copy link
Copy Markdown
Contributor Author

I read the contributors license agreement and I agree to it.

@aiedwardyi
Copy link
Copy Markdown
Contributor Author

recheck

@aiedwardyi
Copy link
Copy Markdown
Contributor Author

I read the contributors license agreement and I agree to it.

@aiedwardyi
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@aiedwardyi
Copy link
Copy Markdown
Contributor Author

recheck

@aiedwardyi
Copy link
Copy Markdown
Contributor Author

The CLA bot doesn't seem to be recognizing my signature. I've commented with both the custom phrase and the default phrase multiple times, but the check keeps failing. Could you take a look at the CLA bot configuration? It may be a permissions issue writing to \ in the \ repo.

Address review feedback from Sourcery and Copilot:

- Wrap fastembed imports in try/except ImportError so broken installs
  (missing binary deps, incompatible Python) don't crash at import time
- Add _require_fastembed() guards in embedding and reranking providers
  so missing fastembed raises a clear ConfigurationError instead of
  opaque AttributeError/TypeError from None placeholders
- Fix install hint typo: codeweaver[fastembed-gpu] -> code-weaver[fastembed-gpu]
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

@bashandbone bashandbone merged commit ca4ecac into knitli:main Mar 27, 2026
13 of 16 checks passed
@bashandbone
Copy link
Copy Markdown
Contributor

@aiedwardyi thanks for the contribution!

Tracking the CLA bot issue, it wasn't high on my list while it was just me and the robot masses here. You get the distinction of being the first (other) human contributor to CodeWeaver. 🎉

And I appreciate it -- you gave me a reason to stop endlessly tweaking and ship Alpha 6. Should go out tomorrow.

This looks good to me.

Closes #279

@github-actions github-actions bot locked and limited conversation to collaborators Mar 27, 2026
@aiedwardyi aiedwardyi deleted the fix/defer-fastembed-import-error branch March 27, 2026 02:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fastembed_extensions.py raises at import time instead of deferring gracefully

3 participants