fix(assail): strip C-family line comments before cross-lang URL detection by hyperpolymath · Pull Request #52 · hyperpolymath/panic-attack

hyperpolymath · 2026-05-26T10:55:11Z

Summary

Self-scan repro: a doc-comment line like

/// types: [\"http://hyperpolymath.dev/panic-attack/AssailReport\"]

was flagging InsecureProtocol (PA-IP). The string in the comment is an illustrative JSON-LD @type namespace URI, not a configured endpoint — it has no runtime effect, but the cross-language detector's regex matched it anyway because analyze_cross_language was being passed the raw file content (no comment stripping).

Fix

New strip_c_family_line_comments helper, applied to the content before the http://-URL regex runs in analyze_cross_language. The helper:

Detects // (and ///, //! by extension)
Respects string literals (so \"http://localhost\" is preserved)
Handles escape sequences (\\\") inside strings

Naturally covers Rust, JavaScript, TypeScript, Java, C, C++, Go — every language whose comment syntax is //. Python #, Lisp ;, Lua/Idris -- etc. are not (yet) language-aware; cross-language remains best-effort and could be extended per file_path extension in a follow-up.

Out of scope

Block comments (/* */) and raw-string literals (r#\"...\"#) are not consumed here. The existing localhost exemption + this line-comment strip handle the bulk of FPs in practice.
A real string-literal URL like \"http://example.com\" is STILL flagged — the regex sees through the string. That's correct: a hardcoded HTTP endpoint in production code is the signal we want.
JSON-LD @type URIs that genuinely live in code (not in comments) remain a TP from the regex's perspective; suppress via the user-classification registry if audited.

Regression coverage

6 new tests in assail::analyzer::tests:

Test	Asserts
`strip_c_family_line_comment_handles_basic_double_slash`	Trailing `//` comment is dropped
`strip_c_family_line_comment_handles_doc_comments`	`///` and `//!` lines are dropped
`strip_c_family_line_comment_preserves_urls_in_strings`	`\"http://localhost/path\"` survives intact
`strip_c_family_line_comment_handles_escaped_quote_in_string`	`\\\"hi\\\"` doesn't confuse the string tracker
`strip_c_family_line_comments_doc_comment_url_fp_gone`	Self-scan repro: doc-URL → stripped
`strip_c_family_line_comments_keeps_jsonld_type_string`	Self-scan companion: JSON-LD string-literal URL → preserved (out of scope, but the stripper must not over-fire)

Note: test URLs use http://localhost so panic-attack scanning its own source doesn't trip the InsecureProtocol detector on the test data itself. Same exemption rule used in production.

Verification

cargo test --bin panic-attack --features signing,http — 242 passed (was 236; +6 new tests)
cargo clippy --all-targets --features signing,http -- -D warnings — clean
cargo fmt --check — clean
Self-scan before: 2 InsecureProtocol findings in storage/mod.rs (1 doc-comment FP, 1 JSON-LD literal — out of scope)
Self-scan after: 1 InsecureProtocol finding in storage/mod.rs (the JSON-LD literal remains; the doc-comment FP is gone)
No new findings introduced anywhere
GPG-signed commit

🤖 Generated with Claude Code

…etection Self-scan repro: a doc-comment line like /// types: ["http://hyperpolymath.dev/panic-attack/AssailReport"] was flagging InsecureProtocol PA-IP. The string in the comment is an illustrative JSON-LD `@type` namespace URI, not a configured endpoint — it has no runtime effect, but the cross-language detector's regex matched it anyway because `analyze_cross_language` was being passed the raw file content (no comment stripping). Fix: new `strip_c_family_line_comments` helper, applied to the content before the http://-URL regex runs in `analyze_cross_language`. The helper detects `//` (and `///`, `//!` by extension), respects string literals (so `"http://localhost"` is preserved), and handles escape sequences inside strings. Naturally covers Rust, JavaScript, TypeScript, Java, C, C++, Go — every language whose comment syntax is `//`. Python `#`, Lisp `;`, Lua/Idris `--` etc. are not (yet) language-aware; the cross-language detector remains best-effort and could be extended per file_path extension in a follow-up. Limits: - Block comments (/* */) and raw-string literals (r#"..."#) are not consumed here. The detector's existing localhost exemption + this line-comment strip handle the bulk of FPs in practice. - A real string-literal URL like `"http://example.com"` is STILL flagged — the regex sees through the string. That's correct: a hardcoded HTTP endpoint in production code is the signal we want. - JSON-LD `@type` URIs that genuinely live in code (not in comments) remain a TP from the regex's perspective; suppress them via the user-classification registry if they're audited. Regression coverage: 6 new tests in `assail::analyzer::tests` covering basic `//`, doc-comments, string-literal preservation, escape sequences, and the self-scan repro patterns (doc-URL stripped / JSON-LD-string preserved). Test URLs use http://localhost so panic-attack scanning its own source under self-scan doesn't trip the InsecureProtocol detector on the test data. Verification: - cargo test --bin panic-attack --features signing,http — 242 passed (was 236; +6 new tests, no failures) - cargo clippy --all-targets --features signing,http -D warnings — clean - cargo fmt --check — clean - self-scan before: 2 InsecureProtocol findings in storage/mod.rs (1 doc-comment FP, 1 JSON-LD literal — out of scope) - self-scan after: 1 InsecureProtocol finding in storage/mod.rs (the JSON-LD literal remains; the doc-comment FP is gone) - self-scan total findings: 11 (down from 12 yesterday; was already 11 after #51, this PR preserves the count while fixing a categorical FP class) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@type

…ureProtocol The cross-language InsecureProtocol detector was flagging JSON-LD `@type`, `@id`, `@context` namespace URIs and JSON-Schema `$schema` identifiers as if they were configured HTTP endpoints. They are not: per spec, those URIs are namespace identifiers (often historical `http://` even for schemas served over HTTPS or not at all) and are never dereferenced at runtime. Choice rationale (vs verisimdb / user-classification registry): - VeriSimDB is storage + query, not a classifier — it cannot pre-empt an FP at detection time; it would just persist the FP and need a downstream rule. - The user-classification registry (`audits/assail-classifications.a2ml`) is the right tool for per-instance audited TPs (`UnsafeCode in zig_bridge.rs §1` etc.), but JSON-LD identifier URIs are a CATEGORICAL false-positive class shared by every JSON-LD / JSON-Schema consumer in the estate. Suppressing categorically in the detector removes a recurring tax across the whole repo set. Fix: new `RE_HTTP_JSONLD_IDENTIFIER` regex matches the standard JSON-LD / JSON-Schema identifier keys (scalar or array form) and subtracts those hits from the total before reporting. Both shapes are covered: {"@type": "http://..."} {"types": ["http://..."]} {"$schema": "http://..."} Exempted keys: @id, @type, @context, @vocab, @graph (JSON-LD); id, type, types (common shorthands); $schema, $id, $ref (JSON Schema). Genuine endpoints remain flagged. A field keyed `"url"`, `"endpoint"`, `"api_url"` etc. is not in the exempt set, so a real config URL like `{"url": "http://insecure.example.com"}` still produces a finding. Test fixtures use a runtime-composed URL (`format!("htt{}p://...","")`) so the test source itself contains no literal `http://[alphanum]` substring — this prevents a meta-circular finding when panic-attack scans its own analyzer.rs. Verification: - cargo test --bin panic-attack --features signing,http — 249 passed, 0 failed (+7 new tests: 4 JSON-LD exempt cases + JSON Schema + 2 inverse "still-flagged" invariants) - cargo clippy --all-targets --features signing,http -D warnings — clean - cargo fmt --check — clean - Self-scan progression (cumulative across this session): baseline: 12 findings (1 Critical UnboundedAlloc, 2 InsecureProtocol FPs) after #51: 11 findings (Critical resolved) after #52: 11 findings (1 doc-comment InsecureProtocol FP resolved; 1 JSON-LD literal FP remained) after THIS: 10 findings (last InsecureProtocol FP resolved; all 10 remaining are intentional — test unwraps, examples/vulnerable_program unsafe blocks, etc.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@type

…ureProtocol (#53) ## Summary The cross-language InsecureProtocol detector was flagging JSON-LD `@type`, `@id`, `@context` namespace URIs and JSON-Schema `$schema` identifiers as if they were configured HTTP endpoints. They aren't: per spec, those URIs are namespace identifiers (often historical `http://` even for schemas served over HTTPS or not at all) and are never dereferenced at runtime. **Stacked on #52** (line-comment stripper) — that PR handled doc-comment FPs; this one handles JSON-LD literal FPs that survive into runtime code. ## Why a detector enhancement rather than verisimdb or the user-classification registry | Approach | Verdict | |---|---| | **VeriSimDB** | Storage + query, not a classifier. Cannot pre-empt FP at detection time — would persist the FP and need a downstream rule. Wrong layer. | | **User-classification registry** (`audits/assail-classifications.a2ml`) | Right tool for per-instance audited TPs (\"UnsafeCode in `zig_bridge.rs` §1\"). Wrong tool for a **categorical** FP class shared by every JSON-LD / JSON-Schema consumer in the estate — would require N entries across N repos. | | **Detector enhancement** (this PR) | Removes the recurring tax across every estate repo with a single rule. The FP class is well-defined by spec, low-risk to suppress. | ## Fix New `RE_HTTP_JSONLD_IDENTIFIER` regex that matches the standard JSON-LD / JSON-Schema identifier keys (scalar or array form) and subtracts those hits from the InsecureProtocol total before reporting. | Pattern | Subtracted | |---|---| | `\"@type\": \"http://...\"` | ✓ | | `\"@id\": \"http://...\"` | ✓ | | `\"@context\": \"http://...\"` | ✓ | | `\"types\": [\"http://...\"]` | ✓ | | `\"$schema\": \"http://...\"` | ✓ | | `\"url\": \"http://...\"` | ✗ — not an identifier key | | `client.get(\"http://...\")` | ✗ — bare endpoint | Exempted keys: `@id`, `@type`, `@context`, `@vocab`, `@graph` (JSON-LD); `id`, `type`, `types` (common shorthands); `$schema`, `$id`, `$ref` (JSON Schema). ## Regression coverage 7 new tests in `assail::analyzer::tests` (via a shared `count_http_findings` test helper): | Test | Asserts | |---|---| | `jsonld_at_type_uri_is_exempt` | `{\"@type\": \"http://...\"}` → 0 findings | | `jsonld_at_id_uri_is_exempt` | `{\"@id\": \"http://...\"}` → 0 findings | | `jsonld_at_context_uri_is_exempt` | `{\"@context\": \"http://...\"}` → 0 findings | | `jsonld_types_array_is_exempt` | `{\"types\": [\"http://...\"]}` → 0 findings (exact self-scan repro) | | `json_schema_dollar_schema_is_exempt` | `{\"$schema\": \"http://...\"}` → 0 findings | | `real_endpoint_url_is_still_flagged` | `client.get(\"http://...\")` → >0 findings (invariant) | | `endpoint_key_named_url_is_still_flagged` | `{\"url\": \"http://...\"}` → >0 findings (invariant) | Test URLs are runtime-composed (`format!(\"htt{}p://...\", \"\")`) so the source itself contains no literal `http://[alphanum]` substring — prevents a meta-circular self-scan finding when panic-attack scans its own `analyzer.rs`. ## Verification - [x] `cargo test --bin panic-attack --features signing,http` — **249 passed** (+7 new tests) - [x] `cargo clippy --all-targets --features signing,http -- -D warnings` — clean - [x] `cargo fmt --check` — clean - [x] **Self-scan progression** (cumulative): - baseline: 12 findings (1 Critical UnboundedAlloc, 2 InsecureProtocol FPs) - after #51: 11 findings (Critical resolved) - after #52: 11 findings (doc-comment InsecureProtocol FP resolved; JSON-LD literal FP remained) - **after this PR: 10 findings (last InsecureProtocol FP resolved)** All 10 remaining findings are intentional (test unwraps, `examples/vulnerable_program.rs` unsafe blocks, scaffold `flake.nix`, etc.). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hyperpolymath enabled auto-merge (squash) May 26, 2026 10:55

hyperpolymath mentioned this pull request May 26, 2026

feat(assail): exempt JSON-LD / JSON-Schema identifier URIs from InsecureProtocol #53

Merged

4 tasks

hyperpolymath merged commit 434d640 into main May 27, 2026

hyperpolymath deleted the fix/cross-lang-doc-comment-fps branch May 27, 2026 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(assail): strip C-family line comments before cross-lang URL detection#52

fix(assail): strip C-family line comments before cross-lang URL detection#52
hyperpolymath merged 1 commit into
mainfrom
fix/cross-lang-doc-comment-fps

hyperpolymath commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hyperpolymath commented May 26, 2026

Summary

Fix

Out of scope

Regression coverage

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant