feat(logs): secret-redaction core for the Logs app#1558
Conversation
The security foundation of the Logs app (docs/design/logs-app.md): every log line that leaves the box through the coming system-logs API passes through redact() first, so an operator copying a bug-report bundle into a public issue cannot leak a credential a dependency or stack trace happened to log. Redacts by pattern (key=value, --flag value, Authorization: Bearer, PEM private-key blocks, connection-string passwords, and bare provider-key shapes sk-/ghp_/xoxb-/AKIA) and by known literal secret value (the exact strings from the secrets store). Fails closed: known values under 6 chars are ignored to avoid runaway redaction. Pure functions, 21 tests. Test fixtures are deliberately synthetic (EXAMPLE-shaped) so they exercise the regexes without tripping secret scanning. Routes + frontend build on this next.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Warning Review limit reached
Next review available in: 50 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR introduces a new ChangesLog Redaction Module
Estimated code review effort: 3 (Moderate) | ~25 minutes Sequence Diagram(s)sequenceDiagram
participant Caller
participant redact_lines
participant redact
participant RegexRules
Caller->>redact_lines: lines, known_values
loop each line
redact_lines->>redact: text, known_values
redact->>RegexRules: apply PEM, connection string, bearer, flag, key/value, token-shape rules
RegexRules-->>redact: masked text
redact->>redact: mask surviving known_values tokens
redact-->>redact_lines: redacted line
end
redact_lines-->>Caller: redacted lines
Related Issues: None found in the provided context. Related PRs: None found in the provided context. Suggested labels: security, enhancement, tests Suggested reviewers: None specified. 🐰 A whisker twitch, a careful sweep, 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
||
| # Authorization: Bearer <token> (header form; the key-value rule catches the | ||
| # "authorization=" form, this catches the header " Bearer <tok>" shape). | ||
| _BEARER_RE = re.compile(r'(?P<pre>bearer\s+)(?P<val>[A-Za-z0-9._\-]{8,})', re.IGNORECASE) |
There was a problem hiding this comment.
CRITICAL: Bearer-token value class misses characters that commonly appear in real tokens.
The value character class is [A-Za-z0-9._\-]{8,}. It does not allow +, /, or =, which are the trailing three characters of standard base64/base64url encodings. A Bearer header like Authorization: Bearer abc123def+456ghi= would split at the first illegal character: the value class would only see abc123def (and, depending on its length, may not even satisfy {8,}), so the token would either be partially masked or not masked at all.
This is exactly the shape used by AWS session tokens, many OAuth2 access tokens, and other application/x-www-form-urlencoded-friendly credentials. A bypass here defeats the purpose of the header form.
| _BEARER_RE = re.compile(r'(?P<pre>bearer\s+)(?P<val>[A-Za-z0-9._\-]{8,})', re.IGNORECASE) | |
| _BEARER_RE = re.compile(r'(?P<pre>bearer\s+)(?P<val>[A-Za-z0-9._\-+/=]{8,})', re.IGNORECASE) |
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
|
|
||
| # Provider-key shapes that are secrets on their own with no key= prefix: | ||
| # sk-..., sk-taos-..., ghp_/gho_/ghs_ (GitHub), xoxb-/xoxp- (Slack), AKIA... (AWS). | ||
| _TOKEN_SHAPE_RE = re.compile( |
There was a problem hiding this comment.
WARNING: No detection for JWT-style tokens or other common bare-secret shapes.
_TOKEN_SHAPE_RE only covers sk-, gh[posru]_, xox[baprs]-, and AKIA…. It misses several categories of secrets that frequently end up in logs without a key= prefix:
- JWTs: many services log
Authorization: eyJhbGciOi…(noBearerscheme word) or emit the token as a bare value. The KV rule may catch some of these when a sensitive key precedes them, but a JWT inside a JSON body field that is not in_SENSITIVE_KEYS(e.g.id_token,refresh_token,assertion,saml,idp_token) would slip through. - Stripe live keys (
sk_live_…), Google API keys (AIza…), Slack legacy (xoxs-is covered butxoxa-/xoxp-may over-match short identifiers), npm tokens (npm_…), PyPI tokens (pypi-…), SendGrid (SG.…), Mailgun (key-…).
At minimum, consider adding eyJ[A-Za-z0-9_\-]{10,}\.eyJ[A-Za-z0-9_\-]{10,}\.[A-Za-z0-9_\-]{10,} for JWTs and an AIza[0-9A-Za-z_\-]{35} rule for Google keys, plus expanding _SENSITIVE_KEYS to include id_token, refresh_token, assertion, jwt.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
| for val in sorted((v for v in known_values if v), key=len, reverse=True): | ||
| if len(val) < _MIN_KNOWN_VALUE_LEN: | ||
| continue | ||
| text = text.replace(val, PLACEHOLDER) |
There was a problem hiding this comment.
WARNING: known_values are replaced as raw substrings, which can corrupt unrelated text and cause partial leaks.
text.replace(val, PLACEHOLDER) will mask the secret wherever it appears, including inside longer unrelated strings. Two concrete risks:
- Partial-leak context. If the known secret is
abcdef123456and a log line containsrequest_id=abcdef1234567890, the result becomesrequest_id=[REDACTED]7890, leaking both a 6-char prefix (which the rest of the system may consider identifying) and the trailing digits. - False positives in hashes/paths. A 6-char-or-longer known value that happens to appear in a hex hash, URL path, or UUID fragment will be masked, producing garbled output (
[REDACTED]7890-style artefacts) and degrading the diagnostic value of the logs without actually removing the secret.
Consider matching on word boundaries (re.sub(r'\\b' + re.escape(val) + r'\\b', PLACEHOLDER, text)) or, better, masking the whole surrounding token (e.g. \S* on either side) so partial matches either don't redact or redact the entire context that contains the secret.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
| # whitespace, comma, or line end). | ||
| _KV_RE = re.compile( | ||
| r'(?P<pre>["\']?(?:' + _KEY_ALT + r')["\']?\s*[:=]\s*)' | ||
| r'(?P<quote>["\']?)(?P<val>[^\s,"\'}{]+)(?P=quote)', |
There was a problem hiding this comment.
SUGGESTION: _KV_RE value character class is too narrow for some legitimate secret formats.
[^\s,"'}]{]+ stops at <, >, ), ], and ]. A value such as password=<redacted-by-caller> (e.g. an HTML-escaped or shell-quoted token), a URL like redirect_uri=https://app.example.com/cb?x=1&y=2, or anything containing brackets will be truncated mid-value. While the rule itself isn't a security bypass (the start of the secret still gets masked), it means a non-empty tail of the secret can be left in the log.
Consider broadening to [^\s"'}{]+ (or even \S+) and trusting the closing quote/comma heuristics via the existing (?P=quote) group, with a length cap to avoid runaway matches on pathological input.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
|
|
||
|
|
||
| def redact_lines(lines: "list[str]", known_values: "list[str] | None" = None) -> "list[str]": | ||
| """Redact a list of log lines (convenience for the paged log reader).""" |
There was a problem hiding this comment.
SUGGESTION: redact_lines will crash on None or non-string elements.
The list comprehension [redact(line, known_values) for line in lines] passes each element straight to redact, which does text = _PEM_RE.sub(...). re.Pattern.sub on None raises TypeError. Since the docstring advertises this as the entry point for the paged log reader, a single malformed line (e.g. from a buggy upstream parser emitting None) will take down the whole page response rather than producing a partially-redacted result. Consider coercing or guarding: for line in lines if isinstance(line, str).
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
|
|
||
| # PEM private-key blocks (SSH keys materialized on deploy, TLS keys). | ||
| _PEM_RE = re.compile( | ||
| r'-----BEGIN [A-Z0-9 ]*PRIVATE KEY-----.*?-----END [A-Z0-9 ]*PRIVATE KEY-----', |
There was a problem hiding this comment.
SUGGESTION: PEM rule swallows surrounding log context, not just the key block.
_PEM_RE is replaced with a single PLACEHOLDER, but the rule's match is the entire span from -----BEGIN … to -----END … (with .*? and DOTALL). That is correct for masking, but the loss is greater than necessary when a stack trace logs the key material inline — the exception type, file, line number, and the message framing around the key are all gone. Consider preserving the framing markers and masking only the base64 body, e.g. replace -----BEGIN …-----\n.*?-----END …----- so -----BEGIN OPENSSH PRIVATE KEY----- and -----END …----- remain visible and only the bytes between them are replaced.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
|
|
||
| class TestBearer: | ||
| def test_header(self): | ||
| assert redact("Authorization: Bearer abc123def456ghi") == \ |
There was a problem hiding this comment.
SUGGESTION: Test coverage is light for the documented threat model.
For security-critical code that is described as "nothing in the logs path may bypass this", the suite is missing negative tests that would catch the most likely regressions:
- JWT in an
Authorizationheader without the wordBearer(e.g.Authorization: eyJhbGciOi…). - Base64-shaped bearer token containing
+,/,=(the exact bypass flagged onlog_redaction.py:55). - A known secret value that appears as a substring of a longer unrelated identifier (the partial-leak case flagged on
log_redaction.py:116). - Connection string with a percent-encoded password or with
@in the userinfo. - Multi-line PEM block embedded in a stack trace (the test uses a clean block, but real-world logs interleave it with exception frames).
redact_linescalled with a list containingNoneor empty strings.
Adding these would lock in the intended behaviour against the regression modes that are cheapest to introduce and hardest to spot in code review.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (2 files)
Previous Review Summaries (2 snapshots, latest commit 06d42a3)Current summary above is authoritative. Previous snapshots are kept for context only. Previous review (commit 06d42a3)Status: 3 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)SUGGESTION
Files Reviewed (2 files)
Fix these issues in Kilo Cloud Previous review (commit 44cfcd5)Status: 7 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)CRITICAL
WARNING
SUGGESTION
Files Reviewed (2 files)
Reviewed by minimax-m3 · Input: 36.8K · Output: 1.9K · Cached: 167K |
…token known-value mask) - CRITICAL: the Bearer value class dropped + / = so base64/base64url tokens (AWS session, OAuth2) were truncated at the first illegal char and leaked their tail; the class now includes the base64 set - add bare-token shapes commonly logged without a key= prefix: JWT, Google AIza, Stripe sk_live/test, npm_, SendGrid SG.; add id_token/refresh_token/ access_token/jwt/assertion to the sensitive-key list - known-value masking now replaces the WHOLE surrounding token, so a secret that is a prefix of a longer identifier cannot leak the trailing part or produce garbled output - PEM rule keeps the BEGIN/END markers, masking only the key body, so a key in a stack trace still shows what was redacted - broaden the key-value value class (keep URLs/brackets, cap length) so a token is not truncated mid-string - redact_lines coerces non-string elements so one bad line cannot crash a page 7 fold-regression tests added; 28 total.
| r'|\bAIza[0-9A-Za-z_\-]{35}\b' # Google API key | ||
| r'|\bnpm_[A-Za-z0-9]{20,}' # npm token | ||
| r'|\bSG\.[A-Za-z0-9_\-]{16,}\.[A-Za-z0-9_\-]{16,}' # SendGrid | ||
| r'|\beyJ[A-Za-z0-9_\-]{8,}\.eyJ[A-Za-z0-9_\-]{6,}\.[A-Za-z0-9_\-]{6,}' # JWT |
There was a problem hiding this comment.
SUGGESTION: JWT segment character class is missing +, /, and =, which can truncate a real JWT that uses non-strict base64.
The class [A-Za-z0-9_\-] is exactly the base64url alphabet, so a spec-compliant RFC 7519 JWT is matched in full. But a number of real-world issuers (legacy OAuth2 servers, some IdPs, hand-rolled libraries) emit JWTs that use standard base64 with +, /, and = padding in the signature segment. A token like eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxIn0.abc+def/ghi= would match the header and payload but stop at the first + in the signature, leaking the tail. The rule above already widened the Bearer class to include +/= for the same reason — apply the same reasoning here.
| r'|\beyJ[A-Za-z0-9_\-]{8,}\.eyJ[A-Za-z0-9_\-]{6,}\.[A-Za-z0-9_\-]{6,}' # JWT | |
| r'|\beyJ[A-Za-z0-9_\-+/=]{8,}\.eyJ[A-Za-z0-9_\-+/=]{6,}\.[A-Za-z0-9_\-+/=]{6,}' # JWT |
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
| r'|\bAKIA[0-9A-Z]{16}\b' # AWS access key id | ||
| r'|\bAIza[0-9A-Za-z_\-]{35}\b' # Google API key | ||
| r'|\bnpm_[A-Za-z0-9]{20,}' # npm token | ||
| r'|\bSG\.[A-Za-z0-9_\-]{16,}\.[A-Za-z0-9_\-]{16,}' # SendGrid |
There was a problem hiding this comment.
SUGGESTION: SendGrid branch is missing \b anchors and could over-match inside a longer identifier.
Every other branch in _TOKEN_SHAPE_RE is wrapped in \b…\b (or \b… at the start when the suffix already provides a boundary). SG\.[A-Za-z0-9_\-]{16,}\.[A-Za-z0-9_\-]{16,} has no \b at either end, so a string like MYAPP_SG.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.bbbbbbbbbbbbbbbbbbbbbbbbbbbb will be redacted even though the surrounding MYAPP_ makes it clearly not a SendGrid key. Add \b at the start (between word boundary and SG) and at the end of the regex.
| r'|\bSG\.[A-Za-z0-9_\-]{16,}\.[A-Za-z0-9_\-]{16,}' # SendGrid | |
| r'|\bSG\.[A-Za-z0-9_\-]{16,}\.[A-Za-z0-9_\-]{16,}\b' # SendGrid |
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
|
|
||
| def test_bare_jwt_masked(self): | ||
| jwt = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.SflKxwRJSMeKKF2QT4" | ||
| out = redact(f"id_token={jwt}") |
There was a problem hiding this comment.
SUGGESTION: test_bare_jwt_masked does not actually exercise the new bare-JWT branch of _TOKEN_SHAPE_RE.
The input is f"id_token={jwt}". Because id_token is in _SENSITIVE_KEYS, the key-value rule fires first and the whole JWT is consumed as the KV value — the test would still pass even if the JWT branch of _TOKEN_SHAPE_RE were deleted. The whole point of the new JWT rule is to catch tokens logged without any key= prefix (e.g. a service logging received: eyJhbGciOi…); the current test does not cover that path. Add an assertion with a bare JWT and no sensitive key prefix to lock in the new behavior.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_log_redaction.py (1)
55-65: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winSynthetic fixtures still trip the secret scanner, contradicting the stated test design goal.
Betterleaks flags the JWT (line 117), both PEM blocks (lines 57-59, 137-139), and the Stripe key (line 125) as real secret patterns. This file's own
ghp_/xoxb-/AKIAfixtures already avoid this by using literalEXAMPLEplaceholder text instead of realistic-looking random data — the same approach should be applied here to actually meet the PR's goal of exercising these regexes "without triggering secret scanning."♻️ Example rewrite using placeholder text instead of realistic-looking payloads
- jwt = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.SflKxwRJSMeKKF2QT4" + jwt = "eyEXAMPLE.eyEXAMPLEPAYLOAD.EXAMPLESIGNATURE" ... - def test_stripe_live_key(self): - assert PLACEHOLDER in redact("sk_live_0000000000000000abcdef") + def test_stripe_live_key(self): + assert PLACEHOLDER in redact("sk_live_EXAMPLEEXAMPLEEXAMPLE")The PEM blocks are trickier since the base64 body shape itself is what triggers detection regardless of content realism — worth confirming with the secret-scanning tool's allowlist/
nosec-style annotation support if the body must remain base64-shaped.Also applies to: 116-125, 135-145
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_log_redaction.py` around lines 55 - 65, The synthetic secret fixtures in test_log_redaction still look like real secrets, causing the scanner to flag the JWT, PEM blocks, and Stripe key despite the test’s goal; update the fixtures in test_private_key_block and the related secret test cases to use obvious placeholder text like the existing ghp_/xoxb_/AKIA examples instead of realistic payloads, and for the PEM-style cases either replace the base64-shaped body with non-secret placeholder content or add an appropriate test-only allowlist/no-scan annotation if that shape must be preserved.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tinyagentos/log_redaction.py`:
- Line 87: The connection-string redaction regex in _CONN_STR_RE only matches
userinfo when a username is present, so URLs like redis://:password@host bypass
masking. Update the pattern to make the username portion optional while still
capturing the password and @ delimiter, and verify the redaction path in
log_redaction.py continues to use _CONN_STR_RE for these cases.
- Around line 51-59: The Authorization redaction only handles Bearer tokens, so
schemes in _AUTH_SCHEMES like basic, digest, token, and negotiate can still leak
credentials after _kv_repl leaves the scheme word untouched. Update _BEARER_RE
in tinyagentos/log_redaction.py (and its redaction call site if needed) so the
follow-up mask covers any Authorization scheme listed in _AUTH_SCHEMES, not just
bearer, while keeping the existing scheme-detection logic in _kv_repl aligned
with that behavior.
---
Nitpick comments:
In `@tests/test_log_redaction.py`:
- Around line 55-65: The synthetic secret fixtures in test_log_redaction still
look like real secrets, causing the scanner to flag the JWT, PEM blocks, and
Stripe key despite the test’s goal; update the fixtures in
test_private_key_block and the related secret test cases to use obvious
placeholder text like the existing ghp_/xoxb_/AKIA examples instead of realistic
payloads, and for the PEM-style cases either replace the base64-shaped body with
non-secret placeholder content or add an appropriate test-only allowlist/no-scan
annotation if that shape must be preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ebe56ea0-6063-4a50-89a0-a55492a16fa8
📒 Files selected for processing (2)
tests/test_log_redaction.pytinyagentos/log_redaction.py
| # Auth SCHEME words that legitimately follow "authorization:"; the real secret | ||
| # is the NEXT token (handled by the bearer rule), so the KV rule must not treat | ||
| # the scheme word itself as the value and stop there, leaving the token exposed. | ||
| _AUTH_SCHEMES = {"bearer", "basic", "digest", "token", "negotiate"} | ||
|
|
||
| # Authorization: Bearer <token> (header form). The value class includes the | ||
| # base64/base64url trailing set (+ / =) so AWS session tokens and OAuth2 | ||
| # access tokens are masked whole, not truncated at the first + or /. | ||
| _BEARER_RE = re.compile(r'(?P<pre>bearer\s+)(?P<val>[A-Za-z0-9._\-+/=]{8,})', re.IGNORECASE) |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Non-Bearer auth schemes leak their credential.
_kv_repl deliberately returns the match unchanged when the KV value is an auth scheme word (Lines 113-114), trusting a follow-up rule to mask the real token. But only _BEARER_RE exists, so basic, digest, token, and negotiate have no follow-up. Authorization: Basic dXNlcjpwYXNzd29yZA== therefore survives every rule and the base64 user:password is logged verbatim — a direct redaction bypass for HTTP Basic auth (and GitHub-style Authorization: token …).
Broaden the header rule to cover all schemes in _AUTH_SCHEMES rather than only bearer:
🔒 Proposed fix: mask the token after any auth scheme
-# Authorization: Bearer <token> (header form). The value class includes the
-# base64/base64url trailing set (+ / =) so AWS session tokens and OAuth2
-# access tokens are masked whole, not truncated at the first + or /.
-_BEARER_RE = re.compile(r'(?P<pre>bearer\s+)(?P<val>[A-Za-z0-9._\-+/=]{8,})', re.IGNORECASE)
+# Authorization: <scheme> <token> (header form). Covers every scheme in
+# _AUTH_SCHEMES so Basic/Digest/Token/Negotiate credentials are masked too,
+# not just Bearer. The value class includes the base64/base64url trailing set
+# (+ / =) so AWS session tokens and OAuth2 access tokens are masked whole.
+_AUTH_TOKEN_RE = re.compile(
+ r'(?P<pre>(?:bearer|basic|digest|token|negotiate)\s+)(?P<val>[A-Za-z0-9._\-+/=]{8,})',
+ re.IGNORECASE,
+)And update the application site:
- text = _BEARER_RE.sub(lambda m: m.group("pre") + PLACEHOLDER, text)
+ text = _AUTH_TOKEN_RE.sub(lambda m: m.group("pre") + PLACEHOLDER, text)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tinyagentos/log_redaction.py` around lines 51 - 59, The Authorization
redaction only handles Bearer tokens, so schemes in _AUTH_SCHEMES like basic,
digest, token, and negotiate can still leak credentials after _kv_repl leaves
the scheme word untouched. Update _BEARER_RE in tinyagentos/log_redaction.py
(and its redaction call site if needed) so the follow-up mask covers any
Authorization scheme listed in _AUTH_SCHEMES, not just bearer, while keeping the
existing scheme-detection logic in _kv_repl aligned with that behavior.
|
|
||
| # postgres://user:pass@host, mysql://..., redis://..., amqp:// -- mask the | ||
| # password component only. | ||
| _CONN_STR_RE = re.compile(r'(?P<pre>[a-zA-Z][a-zA-Z0-9+.\-]*://[^:/\s]+:)(?P<val>[^@\s]+)(?P<post>@)') |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win
Connection strings with an empty username leak the password.
[^:/\s]+ requires at least one character before the :password@, so URLs of the form redis://:s3cretpass@host (empty user, common for Redis/AMQP) never match and the password survives. Make the userinfo prefix optional.
🔒 Proposed fix
-_CONN_STR_RE = re.compile(r'(?P<pre>[a-zA-Z][a-zA-Z0-9+.\-]*://[^:/\s]+:)(?P<val>[^@\s]+)(?P<post>@)')
+_CONN_STR_RE = re.compile(r'(?P<pre>[a-zA-Z][a-zA-Z0-9+.\-]*://[^:/\s]*:)(?P<val>[^@\s]+)(?P<post>@)')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _CONN_STR_RE = re.compile(r'(?P<pre>[a-zA-Z][a-zA-Z0-9+.\-]*://[^:/\s]+:)(?P<val>[^@\s]+)(?P<post>@)') | |
| _CONN_STR_RE = re.compile(r'(?P<pre>[a-zA-Z][a-zA-Z0-9+.\-]*://[^:/\s]*:)(?P<val>[^@\s]+)(?P<post>@)') |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tinyagentos/log_redaction.py` at line 87, The connection-string redaction
regex in _CONN_STR_RE only matches userinfo when a username is present, so URLs
like redis://:password@host bypass masking. Update the pattern to make the
username portion optional while still capturing the password and @ delimiter,
and verify the redaction path in log_redaction.py continues to use _CONN_STR_RE
for these cases.
- JWT shape now tolerates standard-base64 (+ / =) segments so a non-strict encoder's token is masked whole instead of truncated at the first + or / - SendGrid branch gets a trailing word boundary to match the sibling AWS/Google patterns and avoid over-matching into a longer identifier - test_bare_jwt_masked now feeds a bare JWT in prose (no key= prefix, no Bearer word) so it actually exercises the standalone JWT branch, and asserts the placeholder is present
First piece of the Logs app (#1548 remaining halves; design in docs/design/logs-app.md). A pure-function redaction layer that every log line will pass through before it leaves the box via the coming system-logs API, so a bug-report bundle pasted into a public issue cannot leak a credential.
Covers key=value / --flag value / Authorization: Bearer / PEM private-key blocks / connection-string passwords / bare provider-key shapes (sk-, ghp_, xoxb-, AKIA), plus masking of known literal secret values from the secrets store. Fails closed (ignores <6-char known values to avoid runaway redaction). 21 tests; fixtures are deliberately synthetic so they exercise the regexes without tripping secret scanning.
Security-critical foundation, kept with the orchestrator since nothing in the logs path may bypass it. The system-logs routes (paged read + SSE tail + bug-report bundle) and the LogsApp frontend build on this next.
Summary by CodeRabbit
New Features
Tests