Skip to content

refactor(12A): move logger + monitoring to core/, repoint imports#434

Merged
hedhoud merged 15 commits into
refactor/hexagonalfrom
refactor/phase-12a-logger
May 29, 2026
Merged

refactor(12A): move logger + monitoring to core/, repoint imports#434
hedhoud merged 15 commits into
refactor/hexagonalfrom
refactor/phase-12a-logger

Conversation

@EnjoyBacon7
Copy link
Copy Markdown
Collaborator

@EnjoyBacon7 EnjoyBacon7 commented May 29, 2026

Phase 12A — Migrate logger + monitoring to core/

First step of Phase 12 (internal cleanup). Relocates the cross-cutting observability utilities out of the legacy utils/ tree into the hexagonal core/ layer and repoints every consumer. This is the critical-path task that unblocks 12B–12G (it touches files those tasks also edit, so it lands first to avoid merge conflicts).

Module moves (git-tracked as renames)

  • utils/logger.pycore/utils/logging.py (renamed to the structlog-wrapper convention)
  • utils/monitoring.pycore/observability/monitoring.py
  • utils/test_logger.pycore/utils/test_logging.py

logging.py now reads config via from core.config import load_config. To preserve the cached-singleton behavior the old shim provided, core/config/__init__.py (previously empty) now exposes the canonical load_config / get_settings / Settings public API.

Import re-points (61 files, every layer)

  • from utils.loggerfrom core.utils.logging (all forms: get_logger, bare logger, mask_email)
  • from utils.monitoringfrom core.observability.monitoring
  • from utils.exceptions.base / from utils.exceptions.embeddingsfrom core.utils.exceptions
  • from utils.external_resource_errorsfrom core.utils.external_errors

The exceptions and external_errors modules already lived canonically in core/utils/ (with shims in utils/); this only switches the remaining consumers off the shims. The utils/ directory now holds re-export shims only — 12H deletes it.

Notes

  • Deleted the duplicate utils/test_external_resource_errors.py (kept as core/utils/test_external_errors.py).
  • Updated the test_auth_router.py sys.modules stub to patch core.utils.logging.
  • Replaced a wildcard from core.utils.exceptions import * (widened from 3 to ~38 names by the sweep, since core/utils/exceptions.py has no __all__) with an explicit 3-name import in the embeddings shim.
  • Re-sorted imports in the swept files (utils.* sorted late, core.* sorts early) so ruff check stays clean.
  • Updated stale from utils.logger examples in CLAUDE.md and the Milvus-migration doc.

Out of scope (deferred per the phase plan): the bare-vs-openrag. import-style unification (Phase 14) and deletion of the remaining utils/, config/, components/ shims (12H).

Verification

  • grep "from utils." outside the shim dir → zero results
  • ruff check / ruff format --check on all changed files → clean
  • Full test collection: 1212 tests, no import errors
  • Touched tests: 278 passed

Summary by CodeRabbit

  • New Features

    • Centralized FastAPI exception handling with structured JSON responses
    • Configuration singleton caching (get_settings)
    • Streaming source filtering for LLM outputs
    • Token-counting helper and language detection + runtime concurrency controls
    • Temporal metadata validation and async file upload handling
  • Bug Fixes

    • TLS certificate verification enabled by default for web search
    • Masked usernames in authentication logs
  • Refactoring

    • Reorganized modules and consolidated logging into core utilities

Review Change Stack

Phase 12A. Relocate the cross-cutting observability utilities out of the
old utils/ tree into the hexagonal core layer:

- utils/logger.py            -> core/utils/logging.py (structlog-convention name)
- utils/monitoring.py        -> core/observability/monitoring.py
- utils/test_logger.py       -> core/utils/test_logging.py

logging.py now reads config via `from core.config import load_config`; the
core.config package exposes the cached load_config/get_settings/Settings
public API (previously only the config/ shim did).

Repoint every import site to the new paths:
- from utils.logger            -> from core.utils.logging
- from utils.monitoring        -> from core.observability.monitoring
- from utils.exceptions.base    -> from core.utils.exceptions
- from utils.exceptions.embeddings -> from core.utils.exceptions
- from utils.external_resource_errors -> from core.utils.external_errors

The exceptions and external_errors modules already live canonically in
core.utils; this only switches the remaining consumers off the shims.
Delete the duplicate utils/test_external_resource_errors.py (kept in
core/utils/test_external_errors.py). Update the auth-router test stub to
patch core.utils.logging instead of the moved utils.logger.

No remaining `from utils.` imports outside the (now shim-only) utils/ dir.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR centralizes logging/config/prompts into core.*, adds FastAPI error handlers that map OpenRAG errors to HTTP statuses and include optional request_id, introduces core utilities (source-citation filtering, token counting, temporal-field parsing), runtime inference semaphores, async file-upload saving, and migrates many legacy component re-exports and tests to the services/core packages.

Changes

Core module reorganization and centralized infrastructure

Layer / File(s) Summary
Config API and core logging
openrag/core/config/__init__.py, openrag/core/utils/logging.py, openrag/core/utils/test_logging.py
Adds cached get_settings() and load_config(config_path, overrides) wrapper; updates internal logging imports to core.config.
Centralized FastAPI error handling
openrag/api/error_handlers.py, openrag/api/main.py
Adds openrag_exception_handler, unhandled_exception_handler, _status_for, _get_request_id, and register_error_handlers(app) to standardize JSON error responses and optional request_id.
Source filtering and token utilities
openrag/core/utils/source_filtering.py, openrag/core/utils/test_source_filtering.py, openrag/core/utils/text.py, openrag/core/utils/filename.py
Implements citation-block extraction/stripping, streaming SSE source-filter transformer, get_num_tokens() lazy token counter, and extract_temporal_fields() ISO-8601 parsing/validation.
Inference runtime helpers
openrag/services/inference/runtime.py, openrag/services/inference/test_runtime.py
Adds detect_language() and semaphore factories (get_llm_semaphore, get_vlm_semaphore, get_audio_semaphore) and tests covering detector edge cases.
Async file uploads and file helpers
openrag/api/dependencies/files.py, openrag/api/dependencies/test_files.py
Adds save_file_to_disk() (async chunked writes, sanitization, path-traversal protection) and tests for filename stripping and empty-name validation.
Chunking factory and indexer integration
openrag/core/chunking/factory.py, openrag/services/workers/indexer_pool.py, openrag/services/workers/test_indexer_pool.py
Adds create_chunker(config) that uses chunking_registry and integrates it into _build_chunker(), updating tests to validate factory-based chunkers and error cases.
Middleware, routers, and DI migration
openrag/api/middleware/*, openrag/api/routers/*, openrag/di/container.py
Migrates monitoring/logging imports to core modules, wires save_file_to_disk into admin file endpoints, updates DI container imports (auth/websearch) to services locations.
Auth refresh delegation and tests
openrag/services/orchestrators/auth_service.py, openrag/api/middleware/auth.py, openrag/components/auth/test_middleware.py
Adds AuthService.refresh_session_if_needed wrapper delegating to services.auth.refresh; middleware calls service method; tests updated to mock new seam.
Query & retrieval prompt/template loading
openrag/services/orchestrators/query_service.py, openrag/services/orchestrators/retrieval_service.py
Loads prompt templates at init via load_template_by_key, formats context with length_function=get_num_tokens(), and uses doc.page_content strings for formatting; retriever templates loaded similarly.
Websearch and legacy loader relocations
openrag/services/websearch/*, openrag/services/workers/parsers/legacy_loaders/*, openrag/services/workers/bootstrap.py
Moves websearch and many legacy loader exports to services.*, updates discovery roots and bootstrap wiring.
Logger import migration across codebase
multiple openrag/* modules
Replaces legacy utils.logger/utils.* imports with core.utils.logging and other core.* locations across API, services, workers, scripts, and components.
Removals and compatibility shims
multiple openrag/components/* and openrag/config/* files
Removes many legacy re-export shims and deprecated compatibility modules/tests as the codebase centralizes canonical locations under core and services.

Sequence Diagram(s)

(omitted — change is refactor-heavy, no new multi-component runtime flow requiring a sequence diagram)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • Ahmath-Gadji

"🐰 I hopped through code with nimble feet,
Moved logs to core and made errors neat,
Files now save in chunks, tokens counted too,
Prompts load on demand — a tidy view! 🥕"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/phase-12a-logger

@coderabbitai coderabbitai Bot added breaking-change Change of behavior after upgrade refactor labels May 29, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
openrag/core/config/__init__.py (2)

30-30: 💤 Low value

Consider explicit None checks instead of truthiness for clearer semantics.

The condition if overrides or config_path: treats empty string "" as falsy and falls through to the cached singleton. If an empty string is a valid (though unusual) config path, this could be unexpected. An explicit is not None check makes the intent clearer.

🔍 Alternative: explicit None checks
-    if overrides or config_path:
+    if overrides is not None or config_path is not None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/core/config/__init__.py` at line 30, Change the truthy check "if
overrides or config_path:" to explicit None checks so empty strings or empty
dicts don't incorrectly skip the cached singleton; specifically replace that
condition with "if overrides is not None or config_path is not None:"
(referencing the existing overrides and config_path variables/condition in
__init__.py) to make the intent explicit.

22-22: ⚡ Quick win

Add type hints to load_config parameters for consistency.

The get_settings() function has complete type hints, but load_config() parameters lack type annotations. Adding hints improves IDE support and maintains consistency with the rest of the module.

📝 Proposed type hint additions
-def load_config(config_path=None, overrides=None) -> Settings:
+def load_config(config_path: str | None = None, overrides: dict | None = None) -> Settings:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/core/config/__init__.py` at line 22, Update the load_config signature
to include explicit type hints: annotate config_path as Optional[Union[str,
Path]] = None and overrides as Optional[Mapping[str, Any]] = None, returning
Settings (matching get_settings style). Add necessary imports (Optional, Union,
Mapping, Any from typing and Path from pathlib) if not already present, and
ensure the function definition for load_config(config_path, overrides) is
updated accordingly to maintain consistency with get_settings and improve
IDE/type checking.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openrag/app_front.py`:
- Line 12: Import mask_email from core.utils.logging alongside get_logger and
use it to pseudonymize the username variable before any authentication-failure
logging; locate the auth failure block that currently logs username (the
statement around the auth handler that logs on failed login) and replace the raw
username in the processLogger/get_logger call with mask_email(username) so email
addresses are not written to logs.

In `@openrag/components/indexer/loaders/pdf_loaders/dotsocr.py`:
- Line 1: The import currently pulls a pre-configured logger instance; replace
it with the project pattern that calls the factory: import get_logger from
core.utils.logging and then create a logger by assigning logger = get_logger()
in this module (update the import and the logger variable in
openrag/components/indexer/loaders/pdf_loaders/dotsocr.py so the module uses
get_logger() instead of importing logger directly).

In `@openrag/components/indexer/loaders/pdf_loaders/openai.py`:
- Line 10: Replace the direct import of the logger instance with the get_logger
factory: change the import line that currently reads "from core.utils.logging
import logger" to import get_logger, then call get_logger() to assign to a
module-level logger variable (e.g., logger = get_logger()) so this module
follows the same structured Loguru pattern used by other loaders; update any
references to the existing logger variable to use this newly created logger if
needed.

---

Nitpick comments:
In `@openrag/core/config/__init__.py`:
- Line 30: Change the truthy check "if overrides or config_path:" to explicit
None checks so empty strings or empty dicts don't incorrectly skip the cached
singleton; specifically replace that condition with "if overrides is not None or
config_path is not None:" (referencing the existing overrides and config_path
variables/condition in __init__.py) to make the intent explicit.
- Line 22: Update the load_config signature to include explicit type hints:
annotate config_path as Optional[Union[str, Path]] = None and overrides as
Optional[Mapping[str, Any]] = None, returning Settings (matching get_settings
style). Add necessary imports (Optional, Union, Mapping, Any from typing and
Path from pathlib) if not already present, and ensure the function definition
for load_config(config_path, overrides) is updated accordingly to maintain
consistency with get_settings and improve IDE/type checking.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 56ce3b25-b2ad-442d-a7fa-1ca9d4d720d9

📥 Commits

Reviewing files that changed from the base of the PR and between 507b3fa and a80b363.

📒 Files selected for processing (80)
  • CLAUDE.md
  • docs/content/docs/documentation/milvus_migration.mdx
  • openrag/api/dependencies/auth.py
  • openrag/api/dependencies/llm.py
  • openrag/api/error_handlers.py
  • openrag/api/main.py
  • openrag/api/mcp/server.py
  • openrag/api/middleware/auth.py
  • openrag/api/middleware/instrumentation.py
  • openrag/api/routers/admin/cluster.py
  • openrag/api/routers/admin/indexing.py
  • openrag/api/routers/admin/monitoring.py
  • openrag/api/routers/admin/partitions.py
  • openrag/api/routers/admin/tools.py
  • openrag/api/routers/admin/users.py
  • openrag/api/routers/admin/workspaces.py
  • openrag/api/routers/auth/oidc.py
  • openrag/api/routers/user/chat.py
  • openrag/api/routers/user/extract.py
  • openrag/api/routers/user/search.py
  • openrag/app_front.py
  • openrag/components/indexer/chunker/chunker.py
  • openrag/components/indexer/embeddings/openai.py
  • openrag/components/indexer/loaders/__init__.py
  • openrag/components/indexer/loaders/audio/local_whisper.py
  • openrag/components/indexer/loaders/audio/openai.py
  • openrag/components/indexer/loaders/base.py
  • openrag/components/indexer/loaders/doc.py
  • openrag/components/indexer/loaders/docx.py
  • openrag/components/indexer/loaders/image.py
  • openrag/components/indexer/loaders/pdf_loaders/docling.py
  • openrag/components/indexer/loaders/pdf_loaders/docling2.py
  • openrag/components/indexer/loaders/pdf_loaders/dotsocr.py
  • openrag/components/indexer/loaders/pdf_loaders/marker.py
  • openrag/components/indexer/loaders/pdf_loaders/openai.py
  • openrag/components/indexer/loaders/pdf_loaders/pymupdf.py
  • openrag/components/indexer/loaders/pptx_loader.py
  • openrag/components/indexer/loaders/txt_loader.py
  • openrag/components/llm.py
  • openrag/components/reranker/infinity.py
  • openrag/components/reranker/openai.py
  • openrag/components/utils.py
  • openrag/components/websearch/content_fetcher.py
  • openrag/components/websearch/providers/staan.py
  • openrag/components/websearch/service.py
  • openrag/core/config/__init__.py
  • openrag/core/observability/monitoring.py
  • openrag/core/utils/logging.py
  • openrag/core/utils/test_logging.py
  • openrag/di/container.py
  • openrag/scripts/backup.py
  • openrag/scripts/restore.py
  • openrag/services/auth/refresh.py
  • openrag/services/inference/_circuit_breaker.py
  • openrag/services/inference/_retry.py
  • openrag/services/inference/healthcheck.py
  • openrag/services/inference/ollama_client.py
  • openrag/services/inference/reranker_clients.py
  • openrag/services/inference/vllm_client.py
  • openrag/services/orchestrators/auth_service.py
  • openrag/services/orchestrators/conversion_service.py
  • openrag/services/orchestrators/indexing_service.py
  • openrag/services/orchestrators/mcp_service.py
  • openrag/services/orchestrators/partition_service.py
  • openrag/services/orchestrators/query_service.py
  • openrag/services/orchestrators/retrieval_service.py
  • openrag/services/orchestrators/user_service.py
  • openrag/services/orchestrators/workspace_service.py
  • openrag/services/persistence/connection.py
  • openrag/services/persistence/migrations/milvus/1.add_created_at_temporal_fields.py
  • openrag/services/persistence/migrations/milvus/migrate.py
  • openrag/services/workers/batch_ingest.py
  • openrag/services/workers/bootstrap.py
  • openrag/services/workers/parsers/doc_serializer.py
  • openrag/services/workers/parsers/docling_workers.py
  • openrag/services/workers/parsers/marker_workers.py
  • openrag/services/workers/parsers/whisper_workers.py
  • openrag/services/workers/ray_utils.py
  • openrag/test_auth_router.py
  • openrag/utils/test_external_resource_errors.py
💤 Files with no reviewable changes (1)
  • openrag/utils/test_external_resource_errors.py

Comment thread openrag/app_front.py Outdated
Comment thread openrag/components/indexer/loaders/pdf_loaders/dotsocr.py Outdated
Comment thread openrag/components/indexer/loaders/pdf_loaders/openai.py Outdated
hedhoud and others added 5 commits May 29, 2026 14:28
…refactor/phase-12-person-b-cleanup

# Conflicts:
#	openrag/services/websearch/content_fetcher.py
#	openrag/services/websearch/providers/staan.py
#	openrag/services/websearch/service.py
Phase 12B. The canonical auth adapters (OIDC client, session tokens,
state cookie, deps) live in services/auth/; components/auth/ is now only
re-export shims. Repoint the remaining consumers off the shims:

- api/middleware/auth.py        components.auth.refresh    -> services.auth.refresh
- api/routers/auth/oidc.py       components.auth            -> services.auth.state_cookie
- di/container.py                components.auth            -> services.auth
- services/orchestrators/auth_service.py      components.auth -> services.auth
- services/orchestrators/test_auth_service.py components.auth -> services.auth

Also refresh stale module references in docstrings that pointed at the
old components.auth paths (auth_service header, oidc_session_repo notes).

No remaining `from components.auth` imports outside components/ itself.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openrag/api/dependencies/files.py`:
- Around line 47-48: Normalize and sanitize the incoming UploadFile.filename
before joining with dest_dir to prevent path traversal: in the code that
computes filename (where make_unique_filename(file.filename) is called and
file_path is computed), strip any path components and leading slashes (e.g., use
the filename's basename), reject or normalize absolute paths, then apply
make_unique_filename to that sanitized basename and join with dest_dir to form
file_path; ensure the final path is contained under dest_dir (e.g., by resolving
and asserting the parent is dest_dir) before writing.

In `@openrag/core/utils/source_filtering.py`:
- Around line 36-39: The debug log currently emits raw LLM output via
tail=repr(tail), which may contain PII; update the block that checks "if not
citations and not saw_none:" to redact/sanitize the preview before logging
(e.g., call a new or existing helper like sanitize_preview(text) or
mask_email(text) to replace email addresses and other PII, then truncate to a
fixed length) and pass that sanitized preview to logger.debug instead of
repr(tail); reference the variables "tail", "citations", "saw_none" and the
logger call in openrag/core/utils/source_filtering.py when making this change.
- Line 18: The regex _SOURCES_NUMS_RE only matches capitalized "Source(s)" so
lower/upper case tags are missed; update its declaration to be case-insensitive
by compiling it with the IGNORECASE flag (e.g., re.compile(r"...",
re.IGNORECASE)) or by adding an inline (?i) modifier to the pattern, keeping the
existing capture group and structure unchanged so numeric parsing still works
for variants like [sources: 1,3] or [SOURCES: 2].

In `@openrag/services/inference/runtime.py`:
- Around line 17-20: Guard detect_language against empty input and failed
detection by validating and normalizing the incoming text (e.g., strip and
return None for blank strings), wrapping the call to _lang_detector.detect(text,
k=1) in a try/except, and verifying outputs is a non-empty list before indexing;
if detection fails or outputs[0] has no "lang" key, return None (or a designated
fallback value) so callers like QueryService.generate_query can fall back
instead of propagating an exception.

In `@openrag/services/workers/indexer_pool.py`:
- Around line 104-106: The current check only verifies the presence of a "chunk"
attribute on the created chunker but not that it's callable; update the
validation in the indexer pool after create_chunker(cfg) to ensure
callable(getattr(chunker, "chunk", None)) and raise a TypeError if it's not
callable, with the same explanatory message referencing the chunk(document,
partition) method so misconfigured chunkers (e.g., chunk = None) are rejected
early.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 09d1d8bd-a991-4f90-b669-1146a6220e12

📥 Commits

Reviewing files that changed from the base of the PR and between 5b6c2f7 and c7be638.

📒 Files selected for processing (25)
  • openrag/api/dependencies/files.py
  • openrag/api/dependencies/test_files.py
  • openrag/api/routers/admin/indexing.py
  • openrag/api/routers/admin/tools.py
  • openrag/api/routers/user/chat.py
  • openrag/core/chunking/factory.py
  • openrag/core/config/retrieval.py
  • openrag/core/utils/filename.py
  • openrag/core/utils/source_filtering.py
  • openrag/core/utils/test_source_filtering.py
  • openrag/core/utils/text.py
  • openrag/di/container.py
  • openrag/services/inference/parsers/openai_audio.py
  • openrag/services/inference/runtime.py
  • openrag/services/orchestrators/indexing_service.py
  • openrag/services/orchestrators/query_service.py
  • openrag/services/websearch/__init__.py
  • openrag/services/websearch/base.py
  • openrag/services/websearch/content_fetcher.py
  • openrag/services/websearch/providers/__init__.py
  • openrag/services/websearch/providers/staan.py
  • openrag/services/websearch/service.py
  • openrag/services/websearch/test_content_fetcher.py
  • openrag/services/workers/indexer_pool.py
  • openrag/services/workers/test_indexer_pool.py
✅ Files skipped from review due to trivial changes (6)
  • openrag/services/websearch/content_fetcher.py
  • openrag/services/websearch/providers/staan.py
  • openrag/services/websearch/test_content_fetcher.py
  • openrag/services/orchestrators/indexing_service.py
  • openrag/services/inference/parsers/openai_audio.py
  • openrag/services/websearch/service.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • openrag/di/container.py
  • openrag/api/routers/admin/tools.py
  • openrag/api/routers/user/chat.py

Comment thread openrag/api/dependencies/files.py Outdated
Comment thread openrag/core/utils/source_filtering.py Outdated
Comment thread openrag/core/utils/source_filtering.py
Comment thread openrag/services/inference/runtime.py Outdated
Comment thread openrag/services/workers/indexer_pool.py
EnjoyBacon7 and others added 7 commits May 29, 2026 13:27
…ts.prompts

Phase 12C. The orchestrators imported HYDE/MULTI_QUERY/SYS/CONTEXTUALIZER/
SPOKEN_STYLE prompts from the components.prompts shim. Those are not Python
string constants — they are disk templates the shim eager-loads at import
via load_config(). Distributing them as eager module-level constants into
core/prompts builders would add import-time disk I/O and a config dependency
to the whole core.prompts package (its __init__ imports every builder), so
instead each service loads its templates from the injected Settings using the
existing pure core.prompts.load_template_by_key(prompts_dir, mapping, key) —
exactly the call shape template_loader documents for its callers.

- retrieval_service.py: multiQuery / hyde branches load "multi_query" / "hyde"
  from `config` (already in __init__) instead of importing the shim inline.
- query_service.py: load "query_contextualizer" / "spoken_style_answer" /
  "sys_prompt" once in __init__ from the injected config; store on the
  instance. Per-instance config binding replaces the old global eager load.
- test_query_service.py: extend the fake config with the real paths/prompts
  so QueryService can resolve templates.

No remaining `components.prompts` imports outside components/ itself.
Phase 12F. Two parts.

1. Bootstrap re-points (safe — old modules are re-export shims):
   - components.indexer.loaders.serializer.DocSerializer
       -> services.workers.parsers.doc_serializer
   - components.indexer.loaders.pdf_loaders.docling2.DoclingPool
       -> services.workers.parsers.docling_workers
   - components.indexer.loaders.pdf_loaders.marker.MarkerPool
       -> services.workers.parsers.marker_workers

2. Loader-registry relocation. The plan assumed the loaders were already
   migrated to core/indexing/parsers/ and the old ones were dead. They are
   not: core/indexing/parsers/registry.py is an empty Registry, and the new
   doc_serializer / doc_serializer_bridge still drive the legacy loaders via
   get_loader_classes (which walks the loaders package and discovers ~20
   BaseLoader adapters). So the legacy loader subsystem is the live runtime
   path. To make components/ deletable (12H) it is moved wholesale:

   components/indexer/loaders/  ->  services/workers/parsers/legacy_loaders/

   - get_loader_classes root_pkg updated to the new package path.
   - The two get_loader_classes consumers (doc_serializer, doc_serializer_bridge)
     and internal absolute imports / tests repointed.
   - The orphaned serializer.py re-export shim (no remaining consumers after the
     bootstrap re-point) is deleted rather than carried into services/.
   - Internal `from components.utils` / `from components.prompts` imports inside
     the loaders are left as-is; they remain valid and are 12C/12E/12H's concern.

   Runtime discovery verified: get_loader_classes resolves all 19 extension
   mappings from the new location.

Also fix a latent test-isolation bug this relocation surfaced:
services/inference/parsers/test_openai_audio.py installed a fake non-package
`pydub` into sys.modules at import time and never restored it. It was masked
only by collection order (the real-pydub loader test used to sort before it
under components/). It now tries the real pydub first and only falls back to
the stub when the import genuinely fails — its stated Python-3.13 purpose.

No remaining `components.indexer.loaders` references anywhere.
The 12B import switch repointed two api files at services.auth directly,
which the layer guard forbids (api -> di, core only; never services).
Resolve it properly rather than bridging through di:

- StateCookieSerializer / StateCookiePayload are pure (dataclasses +
  itsdangerous, no I/O), so move them to core/auth/state_cookie.py.
  api/routers/auth/oidc.py now imports from core (api -> core, allowed);
  services/auth re-exports them so existing services consumers are
  unaffected; the components.auth shim repoints to core.

- refresh_session_if_needed is infrastructure and stays in services, but
  the middleware reached it via a direct module import. Expose it as
  AuthService.refresh_session_if_needed (a thin seam over the helper,
  passing self) so the middleware calls it on the AuthService it already
  obtains from di — consistent with every other session op it performs.
  Update the middleware test: the mock auth-service gains a default
  refresh delegating to the real helper, and the two refresh-outcome
  tests override that method instead of patching the module name.

`scripts/check_layer_imports.py` is now clean (0 violations).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openrag/services/inference/parsers/test_openai_audio.py`:
- Around line 24-27: Replace the broad except Exception in the pydub shim with a
narrow import-related catch so real runtime errors still propagate: when
importing pydub fails in the test setup (the block that creates fake_pydub and
assigns fake_pydub.AudioSegment = MagicMock() and sys.modules["pydub"] =
fake_pydub), catch only ImportError or ModuleNotFoundError (e.g., except
(ImportError, ModuleNotFoundError)) and preserve the existing shim behavior
inside that except block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e1131938-9210-45c8-aa83-829df6336df7

📥 Commits

Reviewing files that changed from the base of the PR and between c7be638 and 9553397.

📒 Files selected for processing (45)
  • openrag/api/middleware/auth.py
  • openrag/api/routers/auth/oidc.py
  • openrag/components/auth/state_cookie.py
  • openrag/components/auth/test_middleware.py
  • openrag/components/indexer/loaders/serializer.py
  • openrag/core/auth/__init__.py
  • openrag/core/auth/state_cookie.py
  • openrag/di/container.py
  • openrag/services/auth/__init__.py
  • openrag/services/inference/parsers/test_openai_audio.py
  • openrag/services/orchestrators/auth_service.py
  • openrag/services/orchestrators/query_service.py
  • openrag/services/orchestrators/retrieval_service.py
  • openrag/services/orchestrators/test_auth_service.py
  • openrag/services/orchestrators/test_query_service.py
  • openrag/services/persistence/oidc_session_repo.py
  • openrag/services/workers/bootstrap.py
  • openrag/services/workers/parsers/doc_serializer.py
  • openrag/services/workers/parsers/doc_serializer_bridge.py
  • openrag/services/workers/parsers/legacy_loaders/CustomDocLoader.py
  • openrag/services/workers/parsers/legacy_loaders/CustomHTMLLoader.py
  • openrag/services/workers/parsers/legacy_loaders/__init__.py
  • openrag/services/workers/parsers/legacy_loaders/audio/__init__.py
  • openrag/services/workers/parsers/legacy_loaders/audio/local_whisper.py
  • openrag/services/workers/parsers/legacy_loaders/audio/openai.py
  • openrag/services/workers/parsers/legacy_loaders/audio/test_openai.py
  • openrag/services/workers/parsers/legacy_loaders/base.py
  • openrag/services/workers/parsers/legacy_loaders/doc.py
  • openrag/services/workers/parsers/legacy_loaders/docx.py
  • openrag/services/workers/parsers/legacy_loaders/eml_loader.py
  • openrag/services/workers/parsers/legacy_loaders/image.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/__init__.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/docling.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/docling2.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/dotsocr.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/marker.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/openai.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/pymupdf.py
  • openrag/services/workers/parsers/legacy_loaders/pptx_loader.py
  • openrag/services/workers/parsers/legacy_loaders/test_base_loader.py
  • openrag/services/workers/parsers/legacy_loaders/test_customdocloader.py
  • openrag/services/workers/parsers/legacy_loaders/test_doc_loader.py
  • openrag/services/workers/parsers/legacy_loaders/test_docx_loader.py
  • openrag/services/workers/parsers/legacy_loaders/test_eml_recursion.py
  • openrag/services/workers/parsers/legacy_loaders/txt_loader.py
💤 Files with no reviewable changes (13)
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/dotsocr.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/openai.py
  • openrag/services/workers/parsers/legacy_loaders/base.py
  • openrag/services/workers/parsers/legacy_loaders/audio/openai.py
  • openrag/services/workers/parsers/legacy_loaders/image.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/pymupdf.py
  • openrag/services/workers/parsers/legacy_loaders/docx.py
  • openrag/services/workers/parsers/legacy_loaders/doc.py
  • openrag/components/indexer/loaders/serializer.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/marker.py
  • openrag/services/workers/parsers/legacy_loaders/pptx_loader.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/docling2.py
  • openrag/services/workers/parsers/legacy_loaders/pdf_loaders/docling.py
✅ Files skipped from review due to trivial changes (4)
  • openrag/components/auth/state_cookie.py
  • openrag/services/auth/init.py
  • openrag/core/auth/init.py
  • openrag/services/persistence/oidc_session_repo.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • openrag/api/routers/auth/oidc.py
  • openrag/services/workers/parsers/doc_serializer.py
  • openrag/services/orchestrators/query_service.py

Comment thread openrag/services/inference/parsers/test_openai_audio.py Outdated
@hedhoud hedhoud merged commit a07175e into refactor/hexagonal May 29, 2026
10 checks passed
@hedhoud hedhoud deleted the refactor/phase-12a-logger branch May 29, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change of behavior after upgrade refactor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants