-
Notifications
You must be signed in to change notification settings - Fork 54
Reject unknown fields #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughIntroduces ConfigurationBase (extra="forbid") and migrates all public config models to inherit from it. Adds TLS key fields, default ports for service and PostgreSQL, JWT username_claim and role_rules, customization prompt fields, per-model validators (including CORS), and a unit test ensuring unknown fields are rejected. CI e2e config toggles feedback/transcripts enabled. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Loader as Config Loader
participant Models as Pydantic Models\n(ConfigurationBase & children)
participant Validators as Model Validators
User->>Loader: Provide configuration data (dict / file)
Loader->>Models: Instantiate models (extra="forbid")
Models->>Validators: Execute @model_validator hooks
alt Unknown fields present
Models-->>User: ValidationError (Extra inputs are not permitted)
else CORS config present
Validators->>Validators: Validate allow_credentials vs allow_origins
alt allow_credentials=true AND "*" in allow_origins
Validators-->>User: ValidationError (invalid CORS combo)
else Valid CORS
Validators-->>Models: Pass
end
end
Models-->>User: Validated Configuration instance
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
eranco74
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/models/config.py (3)
136-147: Avoid shared instances for nested models.Use Field(default_factory=...) for tls_config and cors to prevent shared mutable defaults between Configuration instances.
- tls_config: TLSConfiguration = TLSConfiguration() - cors: CORSConfiguration = CORSConfiguration() + tls_config: TLSConfiguration = Field(default_factory=TLSConfiguration) + cors: CORSConfiguration = Field(default_factory=CORSConfiguration)
387-403: Don’t silently overwrite system_prompt when both are provided.Either prioritize one with a warning or error out. Prefer explicit error.
- if self.system_prompt_path is not None: + if self.system_prompt is not None and self.system_prompt_path is not None: + raise ValueError("Provide either system_prompt or system_prompt_path, not both") + if self.system_prompt_path is not None: checks.file_check(self.system_prompt_path, "system prompt") self.system_prompt = checks.get_attribute_from_file( dict(self), "system_prompt_path" )
425-438: Fix multiple shared mutable defaults in the root Configuration.mcp_servers uses a bare list; nested models use prebuilt instances. Switch to default_factory across the board.
- database: DatabaseConfiguration = DatabaseConfiguration() - mcp_servers: list[ModelContextProtocolServer] = [] - authentication: AuthenticationConfiguration = AuthenticationConfiguration() + database: DatabaseConfiguration = Field(default_factory=DatabaseConfiguration) + mcp_servers: list[ModelContextProtocolServer] = Field(default_factory=list) + authentication: AuthenticationConfiguration = Field(default_factory=AuthenticationConfiguration) @@ - inference: InferenceConfiguration = InferenceConfiguration() + inference: InferenceConfiguration = Field(default_factory=InferenceConfiguration)
🧹 Nitpick comments (9)
src/models/config.py (8)
25-30: Centralize “reject unknown fields” — good; consider enabling populate_by_name for alias back-compat.Add populate_by_name=True to support field aliases without breaking name-based population.
class ConfigurationBase(BaseModel): """Base class for all configuration models that rejects unknown fields.""" - model_config = ConfigDict(extra="forbid") + model_config = ConfigDict(extra="forbid", populate_by_name=True)
34-36: Ambiguous TLS field: treat it as a path or a secret; name accordingly and keep back-compat via alias.If it’s a path, rename to tls_key_password_path with alias "tls_key_password". If it’s the secret value, use SecretStr instead. Below assumes “path”.
- tls_key_path: Optional[FilePath] = None - tls_key_password: Optional[FilePath] = None + tls_key_path: Optional[FilePath] = None + tls_key_password_path: Optional[FilePath] = Field(default=None, alias="tls_key_password")Also relies on ConfigurationBase.populate_by_name=True (see prior comment).
54-65: Tighten CORS error text.Minor grammar/clarity fixes.
- raise ValueError( - "Invalid CORS configuration: allow_credentials can not be set to true when " - "allow origins contains '*' wildcard." - "Use explicit origins or disable credential." - ) + raise ValueError( + "Invalid CORS configuration: allow_credentials cannot be true when " + "allow_origins contains the '*' wildcard. " + "Use explicit origins or disable credentials." + )
74-92: Constrain Postgres port at the type level; drop runtime check.Use Annotated[PositiveInt, Field(le=65535)] and remove the validator branch for the port.
- port: PositiveInt = 5432 + port: PositiveInt = Field(5432, le=65535) @@ - if self.port > 65535: - raise ValueError("Port value should be less than 65536") + # Port bounds enforced by field constraints. return selfOutside-range import to support this (add once near other imports):
from pydantic import Field
148-154: Constrain Service port at the field; drop runtime check.Mirror Postgres port approach.
- port: PositiveInt = 8080 + port: PositiveInt = Field(8080, le=65535) @@ - if self.port > 65535: - raise ValueError("Port value should be less than 65536") + # Port bounds enforced by field constraints. return self
156-162: Validate MCP server URL type.Prefer AnyHttpUrl for url to catch typos early (unless non-HTTP schemes are expected).
- url: str + url: AnyHttpUrl
164-207: Clarify mutually exclusive LlamaStack modes.If both url and use_as_library_client=True are provided, reject to avoid ambiguous config.
- if self.url is None: + if self.url is not None and self.use_as_library_client: + raise ValueError("Specify either url (server mode) or enable library client mode, not both") + if self.url is None: if self.use_as_library_client is None: raise ValueError( "Llama stack URL is not specified and library client mode is not specified" )
439-443: Dump helper is fine; consider Path.write_text and mkdir(parents=True) if needed.Optional ergonomics; current code is OK.
tests/unit/test_configuration_unknown_fields.py (1)
1-12: Good sentinel test; add coverage for nested and top-level models.Also assert rejection for nested extras (e.g., tls_config) and for the root Configuration to guard future regressions.
def test_nested_configuration_rejects_unknown_fields(): with pytest.raises(ValidationError, match="Extra inputs are not permitted"): ServiceConfiguration(tls_config={"tls_certificate_path": "/tmp/cert.pem", "oops": 1}) def test_root_configuration_rejects_unknown_fields(): from models.config import Configuration, ServiceConfiguration with pytest.raises(ValidationError, match="Extra inputs are not permitted"): Configuration( name="x", service=ServiceConfiguration(), llama_stack={"url": "http://localhost", "unknown": True}, user_data_collection={"feedback_enabled": False}, database={}, unknown_top_level=True, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/models/config.py(14 hunks)tests/unit/test_configuration_unknown_fields.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/test_configuration_unknown_fields.py (1)
src/models/config.py (2)
config(127-133)ServiceConfiguration(136-153)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-pr
🔇 Additional comments (12)
src/models/config.py (12)
9-17: Pydantic v2 imports look correct.Using ConfigDict, model_validator, PositiveInt aligns with v2. No issues.
95-116: Default-to-sqlite behavior is reasonable and backward-compatible.Good single-source-of-truth decision with explicit guard against multiple DB configs.
117-135: Accessor properties are clear and safe.Straightforward and defensive. No changes.
210-227: Storage gating logic reads well.Validations for required *_storage when enabled are correct.
238-257: JSONPath validation is solid.Good early failure with clear message.
259-274: Role list validation is thorough.Covers empty list, duplicates, and disallows wildcard. Nicely done.
316-321: Access rule model is concise and typed.Looks good.
323-329: RBAC config default_factory=list is correct.Consistent with Pydantic best practices.
331-339: JWT config defaults make sense.username_claim addition aligns with typical deployments.
348-374: Auth module validation is clear and explicit.Strong checks and messages.
375-385: Accessor enforces module precondition — good.Defensive and readable.
405-423: Default model/provider coupling is correct.Validation reads cleanly.
| class JwkConfiguration(ConfigurationBase): | ||
| """JWK configuration.""" | ||
|
|
||
| url: AnyHttpUrl | ||
| jwt_configuration: JwtConfiguration = JwtConfiguration() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid nested default instances in JWK config.
Use default_factory to prevent shared state.
- jwt_configuration: JwtConfiguration = JwtConfiguration()
+ jwt_configuration: JwtConfiguration = Field(default_factory=JwtConfiguration)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class JwkConfiguration(ConfigurationBase): | |
| """JWK configuration.""" | |
| url: AnyHttpUrl | |
| jwt_configuration: JwtConfiguration = JwtConfiguration() | |
| class JwkConfiguration(ConfigurationBase): | |
| """JWK configuration.""" | |
| url: AnyHttpUrl | |
| jwt_configuration: JwtConfiguration = Field(default_factory=JwtConfiguration) |
🤖 Prompt for AI Agents
In src/models/config.py around lines 341 to 346, the JwtConfiguration default is
created as a shared instance (jwt_configuration: JwtConfiguration =
JwtConfiguration()), which can cause shared mutable state across model
instances; change this to use pydantic's default_factory (e.g.,
Field(default_factory=JwtConfiguration)) so each JwkConfiguration instance gets
its own JwtConfiguration object and avoid shared state.
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We've had an issue where a YAML field was indented wrong and lightspeed stack accepted it. I think it's a good idea to reject unknown fields, to prevent typos from becoming silent misconfigurations.
|
e2e test failed because it was using non-existent config, forced push to fix it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
.github/workflows/e2e_tests.yaml (2)
4-25: High risk: pull_request_target + job-level secrets + PR-head checkout enables secret exfiltrationThis workflow checks out untrusted PR code while exposing OPENAI_API_KEY (and runs docker/pip). Switch to pull_request and gate secrets to internal PRs, or gate the entire job. Move secrets to step-level with if guards.
- on: [push, pull_request_target] +on: + push: + pull_request: jobs: e2e_tests: + # Run on pushes, and on PRs only when not from a fork (so secrets stay safe). + if: ${{ github.event_name == 'push' || (github.event_name == 'pull_request' && github.event.pull_request.head.repo.fork == false) }} - env: - OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}Then add OPENAI_API_KEY only on the steps that truly need it, guarded for internal PRs:
- - name: Run service manually - env: - OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} + - name: Run service manually + if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.fork == false }} + env: + OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} - - name: Run e2e tests - run: | + - name: Run e2e tests + if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.fork == false }} + run: | # ...
60-67: Pin third-party actions to a commit SHA1arp/create-a-file-action is referenced by a tag. Pin to a specific commit SHA to prevent supply-chain attacks.
- - uses: 1arp/create-a-file-action@0.4.5 + - uses: 1arp/create-a-file-action@<pinned-commit-sha> env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}Also applies to: 94-101
src/models/config.py (5)
34-36: tls_key_password should not be a FilePathPasswords aren’t paths. Prefer SecretStr (or rename to tls_key_password_path if file-based). Suggest SecretStr for now.
from pydantic import ( BaseModel, ConfigDict, Field, model_validator, - FilePath, + FilePath, + SecretStr, AnyHttpUrl, PositiveInt, ) @@ - tls_key_password: Optional[FilePath] = None + tls_key_password: Optional[SecretStr] = NoneAlso applies to: 9-17
145-147: Avoid shared default instances for nested modelsUsing TLSConfiguration() and CORSConfiguration() creates shared instances across Configurations. Use default_factory.
- tls_config: TLSConfiguration = TLSConfiguration() - cors: CORSConfiguration = CORSConfiguration() + tls_config: TLSConfiguration = Field(default_factory=TLSConfiguration) + cors: CORSConfiguration = Field(default_factory=CORSConfiguration)
432-438: Fix mutable defaults and nested default instances in top-level Configuration[] and direct instances are shared across model instances; use default_factory.
- database: DatabaseConfiguration = DatabaseConfiguration() - mcp_servers: list[ModelContextProtocolServer] = [] - authentication: AuthenticationConfiguration = AuthenticationConfiguration() + database: DatabaseConfiguration = Field(default_factory=DatabaseConfiguration) + mcp_servers: list[ModelContextProtocolServer] = Field(default_factory=list) + authentication: AuthenticationConfiguration = Field(default_factory=AuthenticationConfiguration) @@ - inference: InferenceConfiguration = InferenceConfiguration() + inference: InferenceConfiguration = Field(default_factory=InferenceConfiguration)
238-246: Validate operator/value shape for JwtRoleRuleEnforce value types per operator to fail fast on misconfig.
class JwtRoleRule(ConfigurationBase): @@ roles: list[str] # Roles to assign if rule matches + @model_validator(mode="after") + def check_operator_value_shape(self) -> Self: + if self.operator == JsonPathOperator.IN and not isinstance(self.value, (list, tuple, set)): + raise ValueError("For 'in' operator, 'value' must be a list/tuple/set.") + if self.operator == JsonPathOperator.CONTAINS and not isinstance(self.value, (str, list, tuple)): + raise ValueError("For 'contains' operator, 'value' must be a string or list/tuple.") + return self
210-227: Add backward-compatibility shim for legacy_disabledkeys
Without mapping,feedback_disabledandtranscripts_disabledwill trigger “extra fields” errors underextra="forbid". Add a@model_validator(mode="before")that, whendatais a dict, translates each*_disabledinto the inverse of the new*_enabledflag (only if the new key isn’t already set), e.g.:class UserDataCollection(ConfigurationBase): @@ transcripts_storage: Optional[str] = None + @model_validator(mode="before") + @classmethod + def _compat_disabled_aliases(cls, data: Any): + if isinstance(data, dict): + d = dict(data) + if "feedback_disabled" in d and "feedback_enabled" not in d: + d["feedback_enabled"] = not bool(d.pop("feedback_disabled")) + if "transcripts_disabled" in d and "transcripts_enabled" not in d: + d["transcripts_enabled"] = not bool(d.pop("transcripts_disabled")) + return d + return dataOptionally emit a deprecation warning when mapping occurs.
♻️ Duplicate comments (1)
src/models/config.py (1)
345-346: Duplicate: use default_factory for JwtConfigurationSame issue as noted previously—avoid shared state.
- jwt_configuration: JwtConfiguration = JwtConfiguration() + jwt_configuration: JwtConfiguration = Field(default_factory=JwtConfiguration)
🧹 Nitpick comments (4)
.github/workflows/e2e_tests.yaml (2)
245-249: Replace fixed sleep with readiness/health checksFixed wait causes flakiness. Poll service health or compose healthchecks.
- - name: Wait for services - run: | - echo "Waiting for services to be healthy..." - sleep 20 # adjust depending on boot time + - name: Wait for services + run: | + echo "Waiting for services to be healthy..." + for i in {1..60}; do + curl -sf http://localhost:8080/v1/models && exit 0 + sleep 2 + done + echo "Service didn't become ready in time"; exit 1
1-1: Nit: header filename mismatchComment says e2e_test.yml but file is e2e_tests.yaml. Align to avoid confusion.
src/models/config.py (2)
59-64: Nit: error message grammarUse plural “credentials” and tighten wording.
- "Invalid CORS configuration: allow_credentials can not be set to true when " - "allow origins contains '*' wildcard." - "Use explicit origins or disable credential." + "Invalid CORS configuration: allow_credentials cannot be true when " + "allow_origins contains the '*' wildcard. Use explicit origins or disable credentials."
101-116: Side effects in validator: document defaulting to SQLite on empty configThis mutates the model to inject a default SQLite DB. Fine for BC, but document prominently and ensure callers don't rely on “no DB configured” semantics.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
.github/workflows/e2e_tests.yaml(1 hunks)src/models/config.py(14 hunks)tests/unit/test_configuration_unknown_fields.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/test_configuration_unknown_fields.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-pr
🔇 Additional comments (2)
.github/workflows/e2e_tests.yaml (1)
86-89: Config key rename aligns with extra='forbid' models—verify downstreamSwitch to feedback_enabled/transcripts_enabled matches src/models/config.py. Ensure any scripts still writing legacy ..._disabled keys are updated; otherwise config will fail under extra="forbid".
src/models/config.py (1)
25-29: Good: centralized extra="forbid" base modelThis cleanly enforces unknown-field rejection across all configs.
Description
We've had an issue where a YAML field was indented wrong and lightspeed stack accepted it. I think it's a good idea to reject unknown fields, to prevent typos from becoming silent misconfigurations.
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Unit tests
Summary by CodeRabbit
New Features
Tests
Chores