feat(audit): hash-chained audit stream for tamper detection#222
feat(audit): hash-chained audit stream for tamper detection#222naveen-kurra wants to merge 1 commit into
Conversation
…nitializ#212) Every AuditEvent now carries a `prev_hash` field pinning it to the sha256 of the previous event's canonical JSON bytes. Together with the per-emit tail-hash update this forms a chain over the audit stream — any post-hoc alteration (changed field, added byte, dropped line) breaks the chain at the point of tampering, and the new `forge audit verify` CLI walks a captured stream to report the break. This satisfies governance requirement R5 (tamper-evident receipts) in the strict reading. R6 (per-event Ed25519 signing) layers on top in issue initializ#213. ## Changes - `forge-core/runtime/audit.go` - New `AuditChainGenesis` constant (32 zero bytes, hex-encoded) - `AuditEvent.PrevHash` field (json:"prev_hash", NOT omitempty — absence is itself a tampering signal) - `AuditLogger.lastHash` state, mu-guarded - `Emit` now serializes chain-mint + marshal + tail-update + sink-write under a single mutex so concurrent emits produce a strictly-ordered chain - `forge-core/runtime/audit_verify.go` (new) - `VerifyAuditLog(io.Reader) (VerifyResult, error)` — walks an NDJSON stream, recomputes each event's canonical hash, reports the first break with expected/actual prev_hash - `forge-cli/cmd/audit.go` (new) — `forge audit verify <file>` subcommand. Exits 0 on clean, non-zero with a report on tampering. Reads from stdin when file is "-". - `docs/security/audit-tamper-evidence.md` — operator guide ## Tests `forge-core/runtime/audit_hash_chain_test.go`: - `TestHashChain_GenesisAndProgression` — first event carries genesis; each subsequent event's prev_hash equals sha256 of the previous event's canonical JSON - `TestHashChain_VerifyWalksCleanly` — 20-event stream round-trips cleanly through the verifier - `TestHashChain_TamperingDetected` — altering event initializ#2's contents makes the verifier flag line 3 (successor whose prev_hash no longer matches the tampered event's recomputed hash) - `TestHashChain_DeletionDetected` — dropping event initializ#2 flags line 2 (the successor now sees line 1's hash as its predecessor's, but its own prev_hash pointed at event initializ#2) - `TestHashChain_ConcurrentEmitsProduceValidChain` — 200 concurrent emitters still produce a chain that verifies (mutex covers the whole chain+write path) - `TestHashChain_GenesisConstantShape` — pins the 64-hex-zero wire constant so a well-meaning refactor can't change it silently - `TestHashChain_VerifyDetectsMalformedLine` — non-JSON garbage is reported cleanly, not a panic - `TestHashChain_PrevHashAlwaysWritten` — pins the "no omitempty" choice `forge-cli/cmd/audit_test.go`: - `TestAuditVerify_CleanStreamExitsZero` — end-to-end OK path - `TestAuditVerify_TamperedStreamReports` — end-to-end fail path; stdout contains "TAMPERING DETECTED" + line number + hashes - `TestAuditVerify_UnreadablePath` — missing file returns OS-shaped error, not a crash ## Compatibility The stream shape is backward-compatible for consumers that ignore unknown fields. `AuditSchemaVersion` is unchanged (this is an additive optional field per the documented schema policy). Existing tests in `forge-core/runtime/` continue to pass without modification. ## Effort 1 day as estimated in the R5 issue.
initializ-mk
left a comment
There was a problem hiding this comment.
Read the full diff and cross-checked it against #220 and the base audit.go. This is a well-tested feature, but it has two blocking problems — one is the same deadlock I flagged on #220, and the other is that #222 and #220 collide and can't both merge as-is.
🔴 Blocking #1 — self-deadlock in Emit (same bug as #220)
Emit now takes a.mu.Lock(); defer a.mu.Unlock() at the top and runs the sink-write loop while holding the lock. On a sink error it calls a.logSinkErrorOnce(...), which itself does a.mu.Lock() (audit.go:584). Go's sync.Mutex isn't reentrant → self-deadlock: the emitting goroutine hangs while holding a.mu, which then hangs every other Emit/config call.
- Trigger: any sink
Writeerror with an ops logger configured (SetOpsLogger). - Why tests miss it: the test sinks (buffer/file) never error and
opsLogis nil.
Here it's trickier than #220 because you have a legitimate reason to hold the lock across the writes — chain order must equal on-disk order — so the fix isn't "move writes outside the lock." Instead, don't call the re-locking logSinkErrorOnce from inside the held section: either add a logSinkErrorOnceLocked variant that assumes the caller holds a.mu, or collect the errored sinks during the locked loop and log them after an explicit Unlock() (not via defer). Ordering is preserved either way; only the error logging moves.
🔴 Blocking #2 — #222 and #220 cannot both merge as-is
#220 (signing, #213) and this PR create the same files with incompatible contents:
forge-core/runtime/audit_verify.go— both defineVerifyAuditLog/VerifyResult, with different signatures: #220 isVerifyAuditLog(r, opts VerifyOptions)withFirstBadLine/SigChecked; this PR isVerifyAuditLog(r)withFirstTamperedLine/GenesisSeen/ExpectedPrevHash. Same symbol, same package — the second to merge won't compile.forge-cli/cmd/audit.go— both defineauditCmd/auditVerifyCmd/auditVerifyRun.forge-core/runtime/audit.go— both add fields toAuditEventand both restructureEmit's locking, in conflicting ways.
The two features are explicitly meant to compose (this PR's docs say "R6 layers Ed25519 signatures on top"; #220's say "combine signing + hash chaining"), but they're built as independent rewrites of the same code. They need to be reconciled into: one AuditEvent (prev_hash + kid + sig), one VerifyAuditLog that checks both chain and signatures, one forge audit verify, and one Emit locking/sequencing strategy. Sequencing matters too — the signature must cover prev_hash, and the tail hash must be computed over the final signed bytes. Please coordinate before either merges (or land one, then rebase the other onto it as a real integration rather than a re-add). Posting a linked note on #220 as well.
🟠 Important — re-marshal hashing has a large-integer precision hole
The verifier deliberately re-marshals the parsed event (json.Marshal(evt)) rather than hashing the raw line, to tolerate benign whitespace. That makes verification depend on unmarshal→marshal being a fixed point — which it isn't for Fields map[string]any numbers: JSON numbers decode to float64, so any field value > 2^53 (or a non-round float) re-marshals to different bytes than the producer emitted from its native int64, so an untampered stream fails verification. Today's audit fields are all small ints, so it doesn't bite yet, but it's a latent landmine (a nanosecond epoch or large ID in a field would trip it).
Worth reconsidering: for a tamper-evidence tool, hashing the raw written bytes (line minus trailing newline) is both simpler and strictly safer — it catches every byte change (including whitespace, which for tamper-evidence you want to catch) and sidesteps the precision issue. The producer already controls the canonical bytes, so the verifier doesn't need to reconstruct them. If you keep re-marshaling, document the "no field integers > 2^53" constraint.
🟡 Minor
- Doc comment contradicts code:
VerifyAuditLog's comment says it "does NOT stop at the first parse error — it keeps reading," but the codereturns immediately on the first malformed line. Fix the comment (or the behavior). - Cross-language portability overstated: the docs invite reimplementing the walk "in any language (sha256 + json.Marshal-equivalent canonical form)." Go's
json.Marshalisn't a standardized canonical form (struct field order, HTML-escaping of< > &, map-key sorting), so a non-Go verifier reproducing it is non-trivial. Same caveat as #220; hashing raw bytes (above) largely removes it. - Head-truncation is only a soft stderr note (
GenesisSeen=falsestill returns OK). That's an inherent hash-chain limitation and it's honestly disclosed — just confirming it's intentional.
What's good
The concurrency design is sound and its rationale (writes must land in chain order) is correct and well-tested with 200 goroutines. prev_hash without omitempty is the right call and is pinned by a test. The "what it does NOT buy you" section (confidentiality, availability, non-repudiation-without-R6) is honest and accurate.
Verdict
Request changes — the deadlock must be fixed, and the #220/#222 collision needs to be resolved before either lands. The precision / raw-bytes question is worth settling now since it's cheap before merge and awkward after. Once those are addressed this is a solid R5 implementation.
|
Ack on all points. Plan: land #220 first (its review is addressed on the -v2 branch), then rebase this PR onto post-#220 main as a real integration — single AuditEvent (prev_hash + kid + sig), single VerifyAuditLog (chain + sig checks), single Fixes I'll roll into the rebase:
Pre-staging on the branch now so the integration is ready when #220 merges. |
|
Pre-staged the integration at What's in it:
Full test sweep + gofmt + golangci-lint clean. Plan: once #220 merges, I'll close this PR (#222) and open the integration branch as the successor R5 PR against post-#220 main. If you'd prefer to review the integration directly against this branch's diff before that, I can open it now as a draft against the r6-v2 branch. |
|
Superseded by the integration branch |
Closes #212. (Re-open of #217 which GitHub auto-closed after main was force-pushed; local branch rebased onto new main.)
Summary
AuditEventcarries aprev_hashfield: sha256 of the previous event's marshaled JSON (genesis = 64 zeros).AuditLogger.Emitso concurrent goroutines cannot interleave and produce an invalid chain.coreruntime.VerifyAuditLogwalks NDJSON and reports the first bad line + reason;forge audit verify <file>is the CLI on top.Test plan
go test ./forge-core/runtime/...(genesis, verify clean, tampering, deletion, 200-goroutine concurrent, genesis constant, malformed, prev_hash always emitted)go test ./forge-cli/cmd/...(CLI OK / failure paths)gofmt -w+golangci-lint runall cleanDocs at
docs/security/audit-tamper-evidence.md.