feat: v1.98.0 Phase 1 — lifecycle bridge + auto-heal wiring + validate re-check by itcmsgr · Pull Request #458 · itcmsgr/nftban

itcmsgr · 2026-04-17T09:29:19Z

Summary

v1.98 Phase 1: Architecture batch (PR-07 through PR-12). No default path change yet.

What this PR adds

PR-07: Lifecycle bridge — observational event emission at all installer phases (INV-I-004)
PR-08/09: Detect + FHS parity evidence (documented, not code changes)
PR-10: Health check → health fix auto-trigger wiring (ExecStartPost in nftban-health.service)
PR-11: Validate phase VALIDATE_1 → safe auto-fix → VALIDATE_2 flow (INV-I-010 through INV-I-013)
PR-12: Logging integration (via lifecycle bridge events)
DEB-PERM-001 documentation (existing permissions module already handles the fix)
VERSION bump to 1.98.0

Key invariants enforced

INV-I-004: Lifecycle is observational only — does NOT drive installer decisions
INV-I-010: Post-install safe auto-fix trigger (one-shot after initial validation)
INV-I-011: Safe auto-fix scope is allowlisted only (permissions/ownership, not authority/SSH)
INV-I-012: One-shot only — auto-fix runs at most once per install
INV-I-013: Re-validation mandatory — only post-fix result determines success

Auto-heal gap closed

Before: Health check ran periodically (timer) but health fix was manual-only. Permission drift between installs was never auto-corrected.
After: Health check triggers health fix via ExecStartPost. Installer validate phase runs fix → re-validate flow.
Verified on lab2: Simulated DEB permission drift → auto-heal chain corrected it automatically.

NOT in this PR (Phase 2, after G2 gate)

PR-13: Feature flag (Go installer default)
PR-14: install.sh bootstrap reduction
PR-15: Legacy script deletion

Phase 2 requires G2 parity gate on real hosts before proceeding.

Lab Validation

Test	Host	Result
Installer builds	lab4 (AlmaLinux 9)	PASS
Lifecycle + rebuild tests	lab4	63 tests PASS
Detect parity (SSH, authority, distro)	lab4 + lab2 + monitor	PASS (3 hosts)
Custom SSH port (55000)	monitor	PASS
FHS permissions (11/12 match)	lab4 vs lab2	PASS (DEB drift auto-healed)
UFW conflict simulation	lab2	SSH lockout proved conflict detection is critical
Auto-heal chain (drift → fix → corrected)	lab2	PASS
DEB permission auto-heal (/usr/sbin/nftban)	lab2	PASS (root:root 755 → root:nftban 750)

Contract

V198_INSTALL_CANONIZATION_CONTRACT.md (13 invariants, INV-I-001 through INV-I-013)
V198_PR08_PR09_PARITY_EVIDENCE.md (detect + FHS evidence)

Test plan

Installer binary builds on lab4
50 lifecycle tests + 13 rebuild tests PASS
Lifecycle bridge emits events at all phases
Auto-heal chain works on lab2 (drift → fix → corrected)
Validate re-check flow: VALIDATE_1 → fix → VALIDATE_2
Pre-commit hooks pass
No default behavior change for users

🤖 Generated with Claude Code

Wire lifecycle event emission into existing installer phases: - lifecycleBridge: observes installer decisions, emits lifecycle events - observeDetect(): records authority + detection at DETECT completion - observePlan(): records authority action from installer decision - observeResult(): maps installer StateFile to lifecycle outcome - v1.96 recovery marker read for last_operation truth Integration points in runInstall(): - After phaseDetect: emit detect + plan observations - On phase failure: emit result with failure outcome - After phaseValidate: emit final result INV-I-004 ENFORCED: Lifecycle is OBSERVATIONAL ONLY. Bridge mirrors decisions — does NOT influence installer execution. Installer logic remains the source of execution truth. No behavior change. Additive lifecycle logging only. Contract: V198_INSTALL_CANONIZATION_CONTRACT.md §4.1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The /usr/sbin/nftban permission drift on DEB (root:root 755 instead of root:nftban 750) is already handled by the existing permissions module: nftban_permissions_enforce_all() → perms_enforce_sbin() uses $PERMS_SBIN from NFTBAN_SBIN_DIR (distro-config based, not hardcoded) Replace hardcoded fix with comment documenting the existing path. The permissions module (nftban_permissions.sh:230) already: - uses distro-aware path ($PERMS_SBIN) - creates nftban group if missing - sets root:nftban 0750 on /usr/sbin/nftban* Verified on lab2 (Ubuntu 24.04): nftban health fix permissions correctly fixes the drift via the existing module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add ExecStartPost to nftban-health.service that triggers nftban-health-fix.service after each health check cycle. This closes the auto-heal gap: - Health CHECK runs periodically as User=nftban (timer) - Health CHECK can fix services/nftables (polkit + CAP_NET_ADMIN) - Health CHECK cannot fix root-owned file permissions - Health FIX runs as root and CAN fix permissions/ownership - Previously: health FIX was manual-only, never auto-triggered - Now: health CHECK triggers health FIX on every cycle The fix service is idempotent — if no permission issues exist, it completes instantly with no changes. Uses --no-block to avoid blocking the health check timer. The `-` prefix on ExecStartPost makes it non-fatal if the fix service fails or is already running. Install/update path already calls RunPermissionsEnforce() in phaseValidate, so this only affects the background periodic path. Contract: V198_INSTALL_CANONIZATION_CONTRACT.md INV-I-010 Evidence: V198_PR08_PR09_PARITY_EVIDENCE.md (DEB-PERM-001) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…-validate Add VALIDATE_1 → safe auto-fix → VALIDATE_2 flow to phaseValidate: If initial assertions fail: 1. Log failed assertions (VALIDATE_1) 2. Run 'nftban health fix all' (one-shot, INV-I-012) 3. Re-run assertions (VALIDATE_2, INV-I-013) 4. Only VALIDATE_2 result determines final outcome This closes the operational gap where install could leave safe-fixable drift (e.g. DEB /usr/sbin/nftban permissions) that would cause DEGRADED when a single auto-fix pass would have corrected it. The auto-fix runs at most ONCE per install (INV-I-012). Re-validation is mandatory (INV-I-013). Only allowlisted safe fixes are applied (INV-I-011). If VALIDATE_2 still fails → DEGRADED (INV-I-008, no false success). Contract: V198_INSTALL_CANONIZATION_CONTRACT.md INV-I-010 through INV-I-013 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-17T09:29:30Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Blocker #1 (phases.go:295): installer validate called 'health fix all' which runs 9 unbounded steps including disabling UFW/firewalld/fail2ban, triggering rebuild, GeoIP download, and panel enable. Violates INV-I-011 (allowlist scope) and INV-I-006 (authority takeover). Fix: Replace with 'permissions enforce' — bounded, safe, idempotent. Only fixes ownership/mode on NFTBan-managed paths. Does not cross authority boundaries or mutate external firewall state. Blocker #2 (nftban-health.service ExecStartPost): unconditionally triggered 'nftban-health-fix.service' (which runs fix all) on every health check timer cycle. Violates INV-I-012 (one-shot) and ships unbounded root remediation to every host. Fix: Remove ExecStartPost trigger. Root-level permission fixes now run only during install/update (phaseValidate → permissions enforce) or manual operator invocation. Document the rationale for future bounded safe-fix target. Audit: V198_FOUNDATION_BATCH_AUDIT.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Policy gate requires FHS spec version to match VERSION file. Regenerated via build/generate-fhs-outputs.sh. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ate module Add bounded safe fix state machine as a testable module: - validate.RunWithBoundedFix(): VALIDATE_1 → permissions enforce → VALIDATE_2 - Only calls 'permissions enforce' (INV-I-011), never 'health fix all' - Fix runs at most once (INV-I-012) - Only VALIDATE_2 result determines final outcome (INV-I-013) NB-6 test cases (from V198_PR13_GO_DECISION.md §11): - Test 1: V1 passes → no fix called → success - Test 2: V1 fails → fix runs → V2 passes → COMMITTED - Test 3: V1 fails → fix runs → V2 still fails → DEGRADED - Test 4: permissions enforce called at most once - Test 5: no destructive side-effects (no service disable, no package removal) MockExecutor enhanced with: - OnCommand(): register callbacks for simulating side-effects - CommandCalled(): assert command was/wasn't executed - CommandCallCount(): assert execution count bounds - Callback firing in Run() for command simulation Contract: V198_INSTALL_CANONIZATION_CONTRACT.md INV-I-010 through INV-I-013 Audit closure: NB-6 from V198_FOUNDATION_BATCH_AUDIT.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

itcmsgr and others added 5 commits April 17, 2026 11:46

chore: bump VERSION to 1.98.0

7c241c1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

itcmsgr and others added 3 commits April 17, 2026 12:42

fix(ci): regenerate FHS spec for v1.98.0 VERSION bump

f406892

Policy gate requires FHS spec version to match VERSION file. Regenerated via build/generate-fhs-outputs.sh. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

itcmsgr merged commit bba638c into main Apr 17, 2026
48 checks passed

itcmsgr deleted the feat/v1.98-install-canonization branch April 17, 2026 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v1.98.0 Phase 1 — lifecycle bridge + auto-heal wiring + validate re-check#458

feat: v1.98.0 Phase 1 — lifecycle bridge + auto-heal wiring + validate re-check#458
itcmsgr merged 8 commits intomainfrom
feat/v1.98-install-canonization

itcmsgr commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

itcmsgr commented Apr 17, 2026

Summary

What this PR adds

Key invariants enforced

Auto-heal gap closed

NOT in this PR (Phase 2, after G2 gate)

Lab Validation

Contract

Test plan

Uh oh!

github-actions Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 17, 2026 •

edited

Loading