Skip to content

Conversation

@tisnik
Copy link
Contributor

@tisnik tisnik commented Oct 8, 2025

Description

LCORE-642: BYOK RAG configuration

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #LCORE-642

Summary by CodeRabbit

  • New Features

    • Added BYOK RAG configuration support with fields for RAG type, embedding model, embedding dimension, vector DB identifier, and local DB path.
    • Introduced sensible defaults for RAG type, embedding model, and embedding dimension.
    • Configuration export now includes a top-level byok_rag section (defaults to an empty list).
  • Tests

    • Added unit tests covering default/non-default configurations and validation errors for the new BYOK RAG settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 8, 2025

Walkthrough

Adds BYOK RAG defaults to constants, introduces a ByokRag pydantic model and a top-level byok_rag: list[ByokRag] field on Configuration, and updates unit tests to validate model defaults, validation rules, and serialized output including the new byok_rag key.

Changes

Cohort / File(s) Summary
BYOK RAG constants
src/constants.py
Adds DEFAULT_RAG_TYPE = "inline::faiss", DEFAULT_EMBEDDING_MODEL = "sentence-transformers/all-mpnet-base-v2", and DEFAULT_EMBEDDING_DIMENSION = 768.
Configuration models
src/models/config.py
Adds ByokRag(ConfigurationBase) with validated fields (rag_id, rag_type, embedding_model, embedding_dimension, vector_db_id, db_path) using the new constants. Adds byok_rag: list[ByokRag] to Configuration.
Unit tests
tests/unit/models/config/test_byok_rag.py, tests/unit/models/config/test_dump_configuration.py
Adds tests for ByokRag defaults and validation errors; updates dumped-configuration tests to assert presence of top-level "byok_rag" (defaults to []).
Docs
docs/config.puml
Adds ByokRag model and documents Configuration.byok_rag: Optional[list[ByokRag]] and inheritance from ConfigurationBase.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant App
  participant Configuration
  participant ByokRag
  participant Constants

  User->>App: Load configuration file
  App->>Configuration: parse/instantiate Configuration
  alt byok_rag entries present
    Configuration->>ByokRag: instantiate per entry
    ByokRag->>Constants: apply defaults for missing rag_type / embedding_model / embedding_dimension
    ByokRag-->>Configuration: validated instances
  else none provided
    Configuration->>Configuration: set byok_rag = []
  end
  App->>Configuration: request dump()
  Configuration-->>App: serialized JSON including "byok_rag": [...]
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • are-ces
  • jrobertboos

Poem

I nibble defaults beneath moonlight,
rag ids nest and models alight.
Fields validate with gentle hop,
tests watch over each little stop.
ByokRag hums — ready to write. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the primary change by referencing the ticket and stating the addition of BYOK RAG configuration, which aligns directly with the code, tests, and documentation updates introduced in this pull request.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72e8477 and 643cdd5.

⛔ Files ignored due to path filters (2)
  • docs/config.png is excluded by !**/*.png
  • docs/config.svg is excluded by !**/*.svg
📒 Files selected for processing (2)
  • docs/config.puml (3 hunks)
  • src/constants.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/constants.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: build-pr
🔇 Additional comments (3)
docs/config.puml (3)

22-29: LGTM!

The ByokRag class documentation correctly lists all six fields with appropriate type annotations. The documentation follows the existing pattern for Annotated types (consistent with other classes like InMemoryCacheConfig at line 83) and properly shows the class as part of the models hierarchy.


40-40: LGTM!

The byok_rag field is correctly added to the Configuration class with the proper type Optional[list[ByokRag]] and is positioned in correct alphabetical order between authorization and conversation_cache.


167-167: LGTM!

The inheritance relationship correctly shows ByokRag extending ConfigurationBase, using proper PlantUML syntax and positioned in correct alphabetical order within the inheritance declarations.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 690a6bc and 31f0bf6.

📒 Files selected for processing (4)
  • src/constants.py (1 hunks)
  • src/models/config.py (4 hunks)
  • tests/unit/models/config/test_byok_rag.py (1 hunks)
  • tests/unit/models/config/test_dump_configuration.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/constants.py
  • src/models/config.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/constants.py
  • tests/unit/models/config/test_dump_configuration.py
  • src/models/config.py
  • tests/unit/models/config/test_byok_rag.py
src/constants.py

📄 CodeRabbit inference engine (CLAUDE.md)

Keep shared constants in a central src/constants.py with descriptive comments

Files:

  • src/constants.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_byok_rag.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_byok_rag.py
src/{models/config.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/config.py,configuration.py}: All configuration uses Pydantic models extending ConfigurationBase
Configuration base models must set model_config with extra="forbid" to reject unknown fields

Files:

  • src/models/config.py
src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/config.py
🧬 Code graph analysis (1)
tests/unit/models/config/test_byok_rag.py (1)
src/models/config.py (2)
  • config (139-145)
  • ByokRag (549-559)
🪛 GitHub Actions: Python linter
src/models/config.py

[error] 4-4: Pylint: W0611 Unused Annotated imported from typing (unused-import).

🪛 GitHub Actions: Ruff
src/models/config.py

[error] 4-4: F401: 'typing.Annotated' imported but unused. Remove unused import: 'typing.Annotated'. This can be fixed with 'ruff --fix'.

🪛 GitHub Actions: Unit tests
tests/unit/models/config/test_byok_rag.py

[error] 1-1: Pytest failure: ByokRag validation error. db_path 'tests/configuration/rag.txt' does not point to an existing file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (3)
tests/unit/models/config/test_dump_configuration.py (1)

91-91: LGTM!

The additions correctly validate that the new byok_rag field is present in the serialized configuration and defaults to an empty list, which aligns with the Field(default_factory=list) declaration in Configuration.

Also applies to: 173-173

src/models/config.py (2)

549-560: LGTM! Well-structured configuration model.

The ByokRag class follows Pydantic best practices:

  • Extends ConfigurationBase with extra="forbid"
  • Uses appropriate type constraints (constr(min_length=1), PositiveInt, FilePath)
  • Provides sensible defaults from constants
  • FilePath for db_path ensures file existence validation

The validation rules will properly reject empty strings and negative dimensions, as tested in test_byok_rag.py.


580-580: LGTM! Field properly integrated.

The byok_rag field is correctly added to the Configuration model with an appropriate default factory, ensuring an empty list when not specified.

@tisnik tisnik merged commit dde0743 into lightspeed-core:main Oct 8, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant