fix(assail): strip C-family line comments before cross-lang URL detection#52
Merged
Merged
Conversation
…etection Self-scan repro: a doc-comment line like /// types: ["http://hyperpolymath.dev/panic-attack/AssailReport"] was flagging InsecureProtocol PA-IP. The string in the comment is an illustrative JSON-LD `@type` namespace URI, not a configured endpoint — it has no runtime effect, but the cross-language detector's regex matched it anyway because `analyze_cross_language` was being passed the raw file content (no comment stripping). Fix: new `strip_c_family_line_comments` helper, applied to the content before the http://-URL regex runs in `analyze_cross_language`. The helper detects `//` (and `///`, `//!` by extension), respects string literals (so `"http://localhost"` is preserved), and handles escape sequences inside strings. Naturally covers Rust, JavaScript, TypeScript, Java, C, C++, Go — every language whose comment syntax is `//`. Python `#`, Lisp `;`, Lua/Idris `--` etc. are not (yet) language-aware; the cross-language detector remains best-effort and could be extended per file_path extension in a follow-up. Limits: - Block comments (/* */) and raw-string literals (r#"..."#) are not consumed here. The detector's existing localhost exemption + this line-comment strip handle the bulk of FPs in practice. - A real string-literal URL like `"http://example.com"` is STILL flagged — the regex sees through the string. That's correct: a hardcoded HTTP endpoint in production code is the signal we want. - JSON-LD `@type` URIs that genuinely live in code (not in comments) remain a TP from the regex's perspective; suppress them via the user-classification registry if they're audited. Regression coverage: 6 new tests in `assail::analyzer::tests` covering basic `//`, doc-comments, string-literal preservation, escape sequences, and the self-scan repro patterns (doc-URL stripped / JSON-LD-string preserved). Test URLs use http://localhost so panic-attack scanning its own source under self-scan doesn't trip the InsecureProtocol detector on the test data. Verification: - cargo test --bin panic-attack --features signing,http — 242 passed (was 236; +6 new tests, no failures) - cargo clippy --all-targets --features signing,http -D warnings — clean - cargo fmt --check — clean - self-scan before: 2 InsecureProtocol findings in storage/mod.rs (1 doc-comment FP, 1 JSON-LD literal — out of scope) - self-scan after: 1 InsecureProtocol finding in storage/mod.rs (the JSON-LD literal remains; the doc-comment FP is gone) - self-scan total findings: 11 (down from 12 yesterday; was already 11 after #51, this PR preserves the count while fixing a categorical FP class) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
hyperpolymath
added a commit
that referenced
this pull request
May 27, 2026
…ureProtocol
The cross-language InsecureProtocol detector was flagging JSON-LD `@type`,
`@id`, `@context` namespace URIs and JSON-Schema `$schema` identifiers
as if they were configured HTTP endpoints. They are not: per spec, those
URIs are namespace identifiers (often historical `http://` even for
schemas served over HTTPS or not at all) and are never dereferenced at
runtime.
Choice rationale (vs verisimdb / user-classification registry):
- VeriSimDB is storage + query, not a classifier — it cannot pre-empt
an FP at detection time; it would just persist the FP and need a
downstream rule.
- The user-classification registry (`audits/assail-classifications.a2ml`)
is the right tool for per-instance audited TPs (`UnsafeCode in
zig_bridge.rs §1` etc.), but JSON-LD identifier URIs are a
CATEGORICAL false-positive class shared by every JSON-LD / JSON-Schema
consumer in the estate. Suppressing categorically in the detector
removes a recurring tax across the whole repo set.
Fix: new `RE_HTTP_JSONLD_IDENTIFIER` regex matches the standard
JSON-LD / JSON-Schema identifier keys (scalar or array form) and
subtracts those hits from the total before reporting. Both shapes
are covered:
{"@type": "http://..."}
{"types": ["http://..."]}
{"$schema": "http://..."}
Exempted keys: @id, @type, @context, @vocab, @graph (JSON-LD);
id, type, types (common shorthands); $schema, $id, $ref (JSON Schema).
Genuine endpoints remain flagged. A field keyed `"url"`, `"endpoint"`,
`"api_url"` etc. is not in the exempt set, so a real config URL like
`{"url": "http://insecure.example.com"}` still produces a finding.
Test fixtures use a runtime-composed URL (`format!("htt{}p://...","")`)
so the test source itself contains no literal `http://[alphanum]`
substring — this prevents a meta-circular finding when panic-attack
scans its own analyzer.rs.
Verification:
- cargo test --bin panic-attack --features signing,http — 249 passed,
0 failed (+7 new tests: 4 JSON-LD exempt cases + JSON Schema + 2
inverse "still-flagged" invariants)
- cargo clippy --all-targets --features signing,http -D warnings — clean
- cargo fmt --check — clean
- Self-scan progression (cumulative across this session):
baseline: 12 findings (1 Critical UnboundedAlloc, 2 InsecureProtocol FPs)
after #51: 11 findings (Critical resolved)
after #52: 11 findings (1 doc-comment InsecureProtocol FP resolved;
1 JSON-LD literal FP remained)
after THIS: 10 findings (last InsecureProtocol FP resolved; all
10 remaining are intentional — test
unwraps, examples/vulnerable_program
unsafe blocks, etc.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath
added a commit
that referenced
this pull request
May 27, 2026
…ureProtocol (#53) ## Summary The cross-language InsecureProtocol detector was flagging JSON-LD `@type`, `@id`, `@context` namespace URIs and JSON-Schema `$schema` identifiers as if they were configured HTTP endpoints. They aren't: per spec, those URIs are namespace identifiers (often historical `http://` even for schemas served over HTTPS or not at all) and are never dereferenced at runtime. **Stacked on #52** (line-comment stripper) — that PR handled doc-comment FPs; this one handles JSON-LD literal FPs that survive into runtime code. ## Why a detector enhancement rather than verisimdb or the user-classification registry | Approach | Verdict | |---|---| | **VeriSimDB** | Storage + query, not a classifier. Cannot pre-empt FP at detection time — would persist the FP and need a downstream rule. Wrong layer. | | **User-classification registry** (`audits/assail-classifications.a2ml`) | Right tool for per-instance audited TPs (\"UnsafeCode in `zig_bridge.rs` §1\"). Wrong tool for a **categorical** FP class shared by every JSON-LD / JSON-Schema consumer in the estate — would require N entries across N repos. | | **Detector enhancement** (this PR) | Removes the recurring tax across every estate repo with a single rule. The FP class is well-defined by spec, low-risk to suppress. | ## Fix New `RE_HTTP_JSONLD_IDENTIFIER` regex that matches the standard JSON-LD / JSON-Schema identifier keys (scalar or array form) and subtracts those hits from the InsecureProtocol total before reporting. | Pattern | Subtracted | |---|---| | `\"@type\": \"http://...\"` | ✓ | | `\"@id\": \"http://...\"` | ✓ | | `\"@context\": \"http://...\"` | ✓ | | `\"types\": [\"http://...\"]` | ✓ | | `\"$schema\": \"http://...\"` | ✓ | | `\"url\": \"http://...\"` | ✗ — not an identifier key | | `client.get(\"http://...\")` | ✗ — bare endpoint | Exempted keys: `@id`, `@type`, `@context`, `@vocab`, `@graph` (JSON-LD); `id`, `type`, `types` (common shorthands); `$schema`, `$id`, `$ref` (JSON Schema). ## Regression coverage 7 new tests in `assail::analyzer::tests` (via a shared `count_http_findings` test helper): | Test | Asserts | |---|---| | `jsonld_at_type_uri_is_exempt` | `{\"@type\": \"http://...\"}` → 0 findings | | `jsonld_at_id_uri_is_exempt` | `{\"@id\": \"http://...\"}` → 0 findings | | `jsonld_at_context_uri_is_exempt` | `{\"@context\": \"http://...\"}` → 0 findings | | `jsonld_types_array_is_exempt` | `{\"types\": [\"http://...\"]}` → 0 findings (exact self-scan repro) | | `json_schema_dollar_schema_is_exempt` | `{\"$schema\": \"http://...\"}` → 0 findings | | `real_endpoint_url_is_still_flagged` | `client.get(\"http://...\")` → >0 findings (invariant) | | `endpoint_key_named_url_is_still_flagged` | `{\"url\": \"http://...\"}` → >0 findings (invariant) | Test URLs are runtime-composed (`format!(\"htt{}p://...\", \"\")`) so the source itself contains no literal `http://[alphanum]` substring — prevents a meta-circular self-scan finding when panic-attack scans its own `analyzer.rs`. ## Verification - [x] `cargo test --bin panic-attack --features signing,http` — **249 passed** (+7 new tests) - [x] `cargo clippy --all-targets --features signing,http -- -D warnings` — clean - [x] `cargo fmt --check` — clean - [x] **Self-scan progression** (cumulative): - baseline: 12 findings (1 Critical UnboundedAlloc, 2 InsecureProtocol FPs) - after #51: 11 findings (Critical resolved) - after #52: 11 findings (doc-comment InsecureProtocol FP resolved; JSON-LD literal FP remained) - **after this PR: 10 findings (last InsecureProtocol FP resolved)** All 10 remaining findings are intentional (test unwraps, `examples/vulnerable_program.rs` unsafe blocks, scaffold `flake.nix`, etc.). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Self-scan repro: a doc-comment line like
/// types: [\"http://hyperpolymath.dev/panic-attack/AssailReport\"]was flagging InsecureProtocol (PA-IP). The string in the comment is an illustrative JSON-LD
@typenamespace URI, not a configured endpoint — it has no runtime effect, but the cross-language detector's regex matched it anyway becauseanalyze_cross_languagewas being passed the raw file content (no comment stripping).Fix
New
strip_c_family_line_commentshelper, applied to the content before thehttp://-URL regex runs inanalyze_cross_language. The helper://(and///,//!by extension)\"http://localhost\"is preserved)\\\") inside stringsNaturally covers Rust, JavaScript, TypeScript, Java, C, C++, Go — every language whose comment syntax is
//. Python#, Lisp;, Lua/Idris--etc. are not (yet) language-aware; cross-language remains best-effort and could be extended per file_path extension in a follow-up.Out of scope
/* */) and raw-string literals (r#\"...\"#) are not consumed here. The existing localhost exemption + this line-comment strip handle the bulk of FPs in practice.\"http://example.com\"is STILL flagged — the regex sees through the string. That's correct: a hardcoded HTTP endpoint in production code is the signal we want.@typeURIs that genuinely live in code (not in comments) remain a TP from the regex's perspective; suppress via the user-classification registry if audited.Regression coverage
6 new tests in
assail::analyzer::tests:strip_c_family_line_comment_handles_basic_double_slash//comment is droppedstrip_c_family_line_comment_handles_doc_comments///and//!lines are droppedstrip_c_family_line_comment_preserves_urls_in_strings\"http://localhost/path\"survives intactstrip_c_family_line_comment_handles_escaped_quote_in_string\\\"hi\\\"doesn't confuse the string trackerstrip_c_family_line_comments_doc_comment_url_fp_gonestrip_c_family_line_comments_keeps_jsonld_type_stringNote: test URLs use
http://localhostso panic-attack scanning its own source doesn't trip the InsecureProtocol detector on the test data itself. Same exemption rule used in production.Verification
cargo test --bin panic-attack --features signing,http— 242 passed (was 236; +6 new tests)cargo clippy --all-targets --features signing,http -- -D warnings— cleancargo fmt --check— cleanstorage/mod.rs(1 doc-comment FP, 1 JSON-LD literal — out of scope)storage/mod.rs(the JSON-LD literal remains; the doc-comment FP is gone)🤖 Generated with Claude Code