Skip to content

Conversation

@omertuc
Copy link
Contributor

@omertuc omertuc commented Sep 4, 2025

Description

Right now in assisted-chat we're blocking the config endpoint completely, even for employees, because it leaks secrets like:

  • database password
  • llama stack api key

This change makes sure those secrets are not included in the serialized configuration. This way we can open up the config endpoint to employees again (still not the public, because it contains things like system prompt, which we would like to keep private).

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #
  • Closes #
    LCORE-395

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Adjusted unit-tests and checked manually

Summary by CodeRabbit

  • New Features

    • Configuration export now shows a richer PostgreSQL section (host, port, database, user, SSL options, namespace, CA cert) for improved clarity.
  • Bug Fixes / Security

    • Secrets (API keys, DB passwords) are now represented as secure secret types and are redacted in exported configuration views; note logging behavior related to API keys was adjusted—please review logging/config for sensitive data exposure.
  • Tests

    • Unit and integration tests updated to validate the new serialization and secret handling.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Walkthrough

Pydantic config fields for secrets were changed to SecretStr; code now calls get_secret_value() when constructing DB URLs and service clients. Tests were updated to treat secrets as SecretStr. Some endpoints and streaming handlers were modified to log or expose the unmasked configuration object (including API key).

Changes

Cohort / File(s) Summary
Model types and Configuration additions
src/models/config.py
Changed secret fields to use SecretStr: PostgreSQLDatabaseConfiguration.password: SecretStr and LlamaStackConfiguration.api_key: Optional[SecretStr]. Added database attribute to Configuration (type: DatabaseConfiguration, optional).
DB engine creation
src/app/database.py
In _create_postgres_engine, interpolate password using config.password.get_secret_value() when building the Postgres URL.
Client construction
src/client.py
For service-mode client, unwrap API key with llama_stack_config.api_key.get_secret_value() (if present) so a plain string (or None) is passed to the client.
Endpoints: config, query, streaming_query
src/app/endpoints/config.py, src/app/endpoints/query.py, src/app/endpoints/streaming_query.py
config.py: sample response now masks api_key in the example. query.py and streaming_query.py: removed prior masking logic and now log the Llama Stack configuration directly (unmasked), instead of logging a masked copy.
Unit & integration tests
tests/unit/models/test_config.py, tests/unit/test_configuration.py, tests/integration/test_configuration.py, tests/unit/test_configuration.py
Tests updated to expect secrets wrapped in SecretStr and to assert via get_secret_value(); updated dumped JSON expectations to include PostgreSQL database block with password redaction in dumped output (represented as "**********" in serialized dumps where masking logic applies).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Config as Pydantic Config
  participant DB as EngineCreator/_create_postgres_engine
  participant Client as ServiceClientBuilder
  participant Logger as App Logger

  Caller->>Config: load config (password: SecretStr, api_key: SecretStr)
  Caller->>DB: request engine (uses config.postgres.password)
  DB->>Config: config.postgres.password.get_secret_value()
  DB-->>Caller: engine created with plaintext password in URL

  Caller->>Client: build service client (uses config.llama_stack.api_key)
  Client->>Config: config.llama_stack.api_key.get_secret_value()
  Client-->>Caller: client constructed with plaintext api_key

  Caller->>Logger: log configuration
  Logger-->>Caller: (now logs unmasked config object, includes SecretStr values if not explicitly masked)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A rabbit found a secret string,
Wrapped it soft in SecretStr wing.
It hopped to logs with curious peeks,
Sometimes it hides, sometimes it speaks.
Tests nibble bits to make things right—hop hop, delight! 🥕🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/models/config.py (3)

38-38: Exclude tls_key_password from serialization — also hide from repr

Good hardening. Consider additionally setting repr=False to avoid accidental log exposure, and (optional) renaming to tls_key_password_path for clarity.

-    tls_key_password: Optional[FilePath] = Field(default=None, exclude=True)
+    tls_key_password: Optional[FilePath] = Field(default=None, exclude=True, repr=False)

83-83: DB password: prefer SecretStr and hide from repr

Excluding solves the config-leak; using SecretStr adds defense-in-depth for logs/repr and downstream handling.

-from pydantic import (
+from pydantic import (
     BaseModel,
     ConfigDict,
     Field,
     model_validator,
     FilePath,
     AnyHttpUrl,
     PositiveInt,
+    SecretStr,
 )
-    password: str = Field(exclude=True)
+    password: SecretStr = Field(exclude=True, repr=False)

Note: callers need .password.get_secret_value() where a str is required.


170-170: Llama API key: use SecretStr and hide from repr

Same rationale as DB password.

-    api_key: Optional[str] = Field(default=None, exclude=True)
+    api_key: Optional[SecretStr] = Field(default=None, exclude=True, repr=False)
tests/unit/models/test_config.py (1)

540-551: Tests: add explicit “not present” assertions for secrets and fix outdated note

You already validate via full-dict equality. Adding negative asserts makes test intent obvious. Also update the note that redaction is now in place.

-    Please note that redaction process is not in place.
+    Secrets are redacted via Field(exclude=True).

After loading content:

         content = json.load(fin)
         # content should be loaded
         assert content is not None
+        # explicitly verify secrets are not serialized
+        assert "tls_key_password" not in content["service"]["tls_config"]
+        assert "api_key" not in content["llama_stack"]
+        assert "password" not in content["database"]["postgres"]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e42062c and 2aee79e.

📒 Files selected for processing (2)
  • src/models/config.py (3 hunks)
  • tests/unit/models/test_config.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/models/test_config.py (1)
src/models/config.py (2)
  • DatabaseConfiguration (97-135)
  • PostgreSQLDatabaseConfiguration (76-94)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-pr
🔇 Additional comments (1)
tests/unit/models/test_config.py (1)

638-648: LGTM: expected DB JSON matches shape and omits password

The expected structure is comprehensive and correctly redacts the secret.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please update PR title to contain Jira issue

Right now in assisted-chat we're blocking the config endpoint
completely, even for employees, because it leaks secrets like:

- database password
- llama stack api key
- tls key password

This change makes sure those secrets are not included in the serialized
configuration. This way we can open up the config endpoint to employees
again (still not the public, because it contains things like system
prompt, which we would like to keep private).
@omertuc omertuc changed the title config endpoint should not leak secrets LCORE-395: config endpoint should not leak secrets Sep 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/models/config.py (1)

37-39: tls_key_password type is incorrect; should be a secret string, not a file path.

A password shouldn't be modeled as FilePath. This rejects valid configs (string password) and, if set to a path, would leak filesystem details via the /config endpoint. Use SecretStr to align with the PR goal.

Apply:

 class TLSConfiguration(ConfigurationBase):
@@
-    tls_key_password: Optional[FilePath] = None
+    tls_key_password: Optional[SecretStr] = None

If any consumers expect a raw string, unwrap with .get_secret_value().

🧹 Nitpick comments (8)
src/models/config.py (1)

146-150: Avoid shared mutable defaults for nested models and lists.

Use Field(default_factory=...) to prevent instances from sharing the same default objects.

Apply:

 class ServiceConfiguration(ConfigurationBase):
@@
-    tls_config: TLSConfiguration = TLSConfiguration()
-    cors: CORSConfiguration = CORSConfiguration()
+    tls_config: TLSConfiguration = Field(default_factory=TLSConfiguration)
+    cors: CORSConfiguration = Field(default_factory=CORSConfiguration)
 class Configuration(ConfigurationBase):
@@
-    database: DatabaseConfiguration = DatabaseConfiguration()
-    mcp_servers: list[ModelContextProtocolServer] = []
-    authentication: AuthenticationConfiguration = AuthenticationConfiguration()
+    database: DatabaseConfiguration = Field(default_factory=DatabaseConfiguration)
+    mcp_servers: list[ModelContextProtocolServer] = Field(default_factory=list)
+    authentication: AuthenticationConfiguration = Field(default_factory=AuthenticationConfiguration)
     authorization: Optional[AuthorizationConfiguration] = None
     customization: Optional[Customization] = None
-    inference: InferenceConfiguration = InferenceConfiguration()
+    inference: InferenceConfiguration = Field(default_factory=InferenceConfiguration)

Also applies to: 461-467

src/client.py (1)

41-46: Minor: treat empty api_key as None.

Defensively normalize empty strings to None to avoid sending an empty header.

Apply:

-                api_key=(
-                    llama_stack_config.api_key.get_secret_value()
-                    if llama_stack_config.api_key is not None
-                    else None
-                ),
+                api_key=(lambda v: (v.strip() or None))(
+                    llama_stack_config.api_key.get_secret_value()
+                    if llama_stack_config.api_key is not None else ""
+                ),
src/app/endpoints/config.py (1)

39-41: Example response masks api_key; ensure runtime serialization also masks secrets.

Given SecretStr usage, FastAPI/Pydantic should emit masked values in /config. Consider adding/keeping an integration test asserting redaction for all secret fields (api_key, DB password, TLS key password).

I can add a test hitting GET /config and asserting that secret fields match a redaction pattern (e.g., r'^*+$') and never equal the real values. Do you want me to draft it?

tests/integration/test_configuration.py (1)

61-61: Guard against None before unwrapping SecretStr.

Be explicit to prevent accidental AttributeError if the fixture changes.

-    assert ls_config.api_key.get_secret_value() == "xyzzy"
+    assert ls_config.api_key is not None
+    assert ls_config.api_key.get_secret_value() == "xyzzy"
tests/unit/models/test_config.py (4)

540-551: Add a regression check that no cleartext secret appears in the raw dump

The structure looks good. To harden this test against regressions, also assert the raw file doesn’t contain the actual secret value (e.g., "ls_password"), not just the redacted form.

Add after the equality assertion in this test:

with open(dump_file, "r", encoding="utf-8") as fin:
    raw = fin.read()
assert "ls_password" not in raw

615-619: Avoid brittle equality on the exact redaction mask

Asserting the exact "**********" ties you to the library’s redaction formatting. Prefer asserting “masked and not equal to the real value.”

Append these checks in the test (keep the deep-equality if you like, but these make it robust):

api_key_mask = content["llama_stack"]["api_key"]
assert api_key_mask != "whatever"
assert set(api_key_mask) == {"*"}  # contains only asterisks

640-651: Same brittleness for Postgres password redaction

Mirror the robust checks for the DB password to avoid depending on the exact number of asterisks.

pwd_mask = content["database"]["postgres"]["password"]
assert pwd_mask != "ls_password"
assert set(pwd_mask) == {"*"}

515-516: Update stale note: redaction is now in place

The comment says redaction isn’t in place, but the expectations assert redaction. Update to avoid confusion.

-    Please note that redaction process is not in place.
+    Note: secrets are redacted in dumps; this test asserts the redaction markers.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2aee79e and 626137b.

📒 Files selected for processing (9)
  • src/app/database.py (1 hunks)
  • src/app/endpoints/config.py (1 hunks)
  • src/app/endpoints/query.py (1 hunks)
  • src/app/endpoints/streaming_query.py (1 hunks)
  • src/client.py (1 hunks)
  • src/models/config.py (3 hunks)
  • tests/integration/test_configuration.py (1 hunks)
  • tests/unit/models/test_config.py (4 hunks)
  • tests/unit/test_configuration.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/app/endpoints/query.py (1)
src/configuration.py (2)
  • configuration (60-64)
  • llama_stack_configuration (74-78)
src/app/database.py (1)
src/models/config.py (1)
  • config (130-136)
tests/unit/test_configuration.py (1)
src/configuration.py (1)
  • llama_stack_configuration (74-78)
src/app/endpoints/streaming_query.py (1)
src/configuration.py (2)
  • configuration (60-64)
  • llama_stack_configuration (74-78)
tests/unit/models/test_config.py (1)
src/models/config.py (2)
  • DatabaseConfiguration (98-136)
  • PostgreSQLDatabaseConfiguration (77-95)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-pr
🔇 Additional comments (5)
src/models/config.py (2)

84-85: Good: Postgres password switched to SecretStr.

This aligns with secret handling and prevents accidental exposure in serialization.


171-174: Good: Llama Stack api_key now SecretStr.

Unifies secret treatment and keeps OpenAPI/example masking consistent.

src/client.py (1)

41-46: Correct: unwrap api_key with get_secret_value().

This prevents passing a SecretStr object to the client and avoids accidental stringification leaks.

tests/unit/test_configuration.py (1)

86-86: Good: updated test to unwrap SecretStr.

Accurately reflects the new secret-wrapped api_key.

tests/unit/models/test_config.py (1)

1005-1005: LGTM: SecretStr handling validated correctly

Switching to get_secret_value() is the right assertion for SecretStr.

Comment on lines +62 to 65
f"postgresql://{config.user}:{config.password.get_secret_value()}@"
f"{config.host}:{config.port}/{config.db}"
f"?sslmode={config.ssl_mode}&gssencmode={config.gss_encmode}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Build the Postgres URL with SQLAlchemy URL.create to correctly escape credentials.

String interpolation will break with special characters in user/password and risks subtle connection bugs. Use URL.create and fix the options flag spacing.

Apply:

-from sqlalchemy import create_engine, text
+from sqlalchemy import create_engine, text
+from sqlalchemy.engine import URL
@@
-    postgres_url = (
-        f"postgresql://{config.user}:{config.password.get_secret_value()}@"
-        f"{config.host}:{config.port}/{config.db}"
-        f"?sslmode={config.ssl_mode}&gssencmode={config.gss_encmode}"
-    )
+    postgres_url = URL.create(
+        drivername="postgresql",
+        username=config.user,
+        password=config.password.get_secret_value(),
+        host=config.host,
+        port=int(config.port),
+        database=config.db,
+        query={"sslmode": config.ssl_mode, "gssencmode": config.gss_encmode},
+    )
@@
-    if is_custom_schema:
-        connect_args["options"] = f"-csearch_path={config.namespace}"
+    if is_custom_schema:
+        connect_args["options"] = f"-c search_path={config.namespace}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
f"postgresql://{config.user}:{config.password.get_secret_value()}@"
f"{config.host}:{config.port}/{config.db}"
f"?sslmode={config.ssl_mode}&gssencmode={config.gss_encmode}"
)
++ b/src/app/database.py
@@ -1,3 +1,4 @@
from sqlalchemy import create_engine, text
from sqlalchemy.engine import URL
# … earlier code …
- postgres_url = (
- f"postgresql://{config.user}:{config.password.get_secret_value()}@"
- f"{config.host}:{config.port}/{config.db}"
- f"?sslmode={config.ssl_mode}&gssencmode={config.gss_encmode}"
postgres_url = URL.create(
drivername="postgresql",
username=config.user,
password=config.password.get_secret_value(),
host=config.host,
port=int(config.port),
database=config.db,
query={
"sslmode": config.ssl_mode,
"gssencmode": config.gss_encmode,
},
)
- if is_custom_schema:
if is_custom_schema:
connect_args["options"] = f"-c search_path={config.namespace}"
# … later code …
🤖 Prompt for AI Agents
In src/app/database.py around lines 62 to 65, the Postgres connection string is
being built via naive f-string interpolation which will mis-handle special
characters in the user/password and has incorrect query option spacing; replace
the f-string construction with SQLAlchemy's URL.create to properly escape
credentials and assemble the components (drivername 'postgresql', username from
config.user, password from config.password.get_secret_value(), host, port,
database name) and pass the query options as a dict (e.g., {'sslmode':
config.ssl_mode, 'gssencmode': config.gss_encmode}) so the URL is correctly
encoded and the options flag spacing is fixed.

Comment on lines +179 to 181
# log Llama Stack configuration
logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid logging secret-bearing config at INFO; sanitize and downgrade to DEBUG.

Even with SecretStr masking, logging the full config object at INFO on every request is risky and noisy. Emit only a sanitized JSON view and use DEBUG.

-    # log Llama Stack configuration
-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    # log Llama Stack configuration (sanitized)
+    logger.debug(
+        "Llama stack config: %s",
+        configuration.llama_stack_configuration.model_dump(
+            mode="json", exclude={"api_key"}, exclude_none=True
+        ),
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# log Llama Stack configuration
logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
# log Llama Stack configuration (sanitized)
logger.debug(
"Llama stack config: %s",
configuration.llama_stack_configuration.model_dump(
mode="json", exclude={"api_key"}, exclude_none=True
),
)
🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 179-181, the code currently logs the
entire Llama stack configuration at INFO which may expose secrets; change this
to log a sanitized representation at DEBUG instead: build a JSON-safe dict from
configuration that excludes or redacts secret-bearing fields (exclude fields
named like '*secret*' or instances of SecretStr/SecretBytes, or use pydantic's
.dict(exclude=...) if applicable), then logger.debug the sanitized JSON view and
remove or downgrade the existing logger.info call so no full config is logged at
INFO.

Comment on lines +555 to 557
# log Llama Stack configuration
logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Same logging concern: sanitize and use DEBUG.

Mirror the sanitized, lower-verbosity logging to avoid leaking sensitive config details.

-    # log Llama Stack configuration
-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    # log Llama Stack configuration (sanitized)
+    logger.debug(
+        "Llama stack config: %s",
+        configuration.llama_stack_configuration.model_dump(
+            mode="json", exclude={"api_key"}, exclude_none=True
+        ),
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# log Llama Stack configuration
logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
# log Llama Stack configuration (sanitized)
logger.debug(
"Llama stack config: %s",
configuration.llama_stack_configuration.model_dump(
mode="json",
exclude={"api_key"},
exclude_none=True,
),
)

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 3c0c98b into lightspeed-core:main Sep 4, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants