Feat/ctem prioritization and maturity by 0xmanhnv · Pull Request #51 · openctemio/api

0xmanhnv · 2026-04-15T19:59:19Z

No description provided.

- Add tenant_id verification to ApproveAndMerge and RejectReview - Add tenant_id to all UPDATE statements in merge (findings, relationships, etc.) - Handler passes tenantID from JWT context to repository - Align approve/reject permissions (both require AssetsDelete) - Promote sub_type from properties/TypeAliases during asset merge - Remove unused queryStringPtr helper

… precedence - 137.down: drop all 3 finding triggers (insert/update/delete), not just 1 - 141.up: add parentheses to fix AND/OR precedence in sub_type backfill

- RFC-003: Monorepo consolidation (api + ui + agent) - RFC-004: Priority classes P0-P3 + EPSS/KEV enrichment - RFC-005: CTEM maturity gaps (cycles, metrics, controls, escalation)

- EPSS CSV URL changed: epss.cyentia.com → epss.empiricalsecurity.com - Add User-Agent header to both EPSS and KEV requests (CISA blocks bare requests)

- Migration 000142: enrichment columns on findings, priority_override_rules, priority_class_audit_log, SLA p0-p3 days - Domain: priority.go (ClassifyPriority deterministic rules, P0/P1/P2/P3) - Domain: priority_rule.go (override rules with condition evaluation) - Finding entity: enrichment fields (epss, kev, priority, reachability) - Repository: 11 new columns in SELECT/INSERT/UPSERT/UPDATE/scan - Service: PriorityClassificationService (classify, batch, enrich) - Handler: FindingResponse includes enrichment + priority fields - 4 default override rules seeded per tenant

- GET /api/v1/dashboard/data-quality endpoint - Single CTE query: asset ownership %, finding evidence %, median last-seen age, dedup rate - DataQualityScorecard type + service + handler + route - Mock updated for test compatibility

- Background controller runs every 15 min - Marks overdue findings as sla_status='breached' - Marks approaching-deadline findings as 'warning' (within 3 days) - Migration 000143: partial index for efficient overdue queries

- Migration 000144: finding_verification_checklists table - Domain entity: IsComplete(), IncompleteItems(), UpdateItem() - Required items: exposure_cleared, evidence_attached, register_updated - Optional items: monitoring_added, regression_scheduled (NULL = N/A) - Gates finding verified/closed transition

- Migration 000145: risk_snapshots table (daily per-tenant) - RiskSnapshotController: runs every 6h, computes risk/findings/SLA/MTTR/priority - GET /api/v1/dashboard/risk-trend?days=90 endpoint - RiskTrendPoint type with risk_score, findings_open, SLA compliance, P0-P3 counts

- Migration 000146: compensating_controls, control_assets, control_findings - Domain entity: IsEffective(), RecordTest(), reduction_factor 0.0-1.0 - Control types: segmentation, identity, runtime, detection, other - Auto-deactivate on failed test

- Migration 000147: attacker_profiles table with 4 types - Domain entity: capabilities JSONB, assumptions, is_default - Seed 4 default profiles per tenant: external_unauth, external_stolen_creds, malicious_insider, supplier_compromise

- Migration 000148: ctem_cycles, scope_snapshots, cycle_metrics, cycle_attacker_profiles tables - Domain entity: state machine (planning → active → review → closed) - Charter JSONB (business_priorities, risk_appetite, objectives) - Scope snapshot at activation, metrics at close - Links to attacker profiles for threat model

- Architecture doc: 5-stage CTEM workflow, feature map, database schema overview, background controllers, outcome metrics explanation - RFC-005 marked completed (all 7 gaps implemented)

…y, missing index - SLA escalation: add per-tenant logging via RETURNING tenant_id - Risk snapshot: replace 12 correlated subqueries with 4 CTEs + LEFT JOIN (O(1) table scans regardless of tenant count) - Migration 000149: add tenant_id index on priority_class_audit_log

…ager - Register SLAEscalationController (15 min interval) - Register RiskSnapshotController (6 hour interval) - Both controllers now start automatically with the application

- Create EPSS/KEV adapters wrapping existing threat intel repos - Create PriorityRuleRepository (list active, create, delete) - Create PriorityAuditRepository (log changes) - Register repos in Repositories struct - Register PriorityClassificationService in Services - Interface compliance verified at compile time

…iles, CTEM cycles - CompensatingControlHandler: List, Get, Create, Update, Delete, RecordTest, LinkAssets/Findings → /api/v1/compensating-controls/* - AttackerProfileHandler: List, Get, Create, Update, Delete → /api/v1/attacker-profiles/* - CTEMCycleHandler: List, Get, Create, Update, Activate, StartReview, Close, GetScope, LinkProfile → /api/v1/ctem-cycles/* - All handlers use direct SQL with tenant_id isolation - Routes registered with permission middleware

…lers - Add PriorityClassifier interface to FindingProcessor - Auto-enrich + classify findings on ingest (Step 4c in ProcessBatch) - Fetch assets for classification context (batch) - Persist enriched fields after classification - Wire PriorityClassificationService into ingest via SetPriorityClassifier - Instantiate CompensatingControl, AttackerProfile, CTEMCycle handlers - All routes now fully functional end-to-end

- CompensatingControl LinkAssets: validate asset belongs to tenant before INSERT - CompensatingControl LinkFindings: validate finding belongs to tenant before INSERT - CTEMCycle LinkProfile: validate profile belongs to tenant before INSERT - Ingest enrichment: clarify dedup map comment (1-5 unique assets per batch)

… findings - VerificationChecklistHandler: GET/PUT /findings/{id}/verification-checklist Auto-creates checklist on first access, computes is_complete - Migration 000150: CIA impact fields (confidentiality/integrity/availability) on assets, environment field (production/staging/dev), mitre_technique_id + mitre_tactic on findings for ATT&CK coverage - Route registered, handler instantiated

…reshness SLA - GET /api/v1/dashboard/executive-summary: risk change, P0/P1 resolved, SLA compliance, MTTR, crown jewels at risk, top 5 risks - GET /api/v1/dashboard/mttr-analytics: MTTR by severity + priority class - GET /api/v1/dashboard/process-metrics: approval velocity, stale assets, unassigned findings, time-to-assign - Migration 000151: freshness_status on assets, stale check index, approval velocity index

…xport Phase 3 (100% CTEM completion): - Migration 000152: business_services + business_service_assets tables, regression tracking fields on findings (is_regression, reopen_count, last_reopened_at), finding_regression_events log - BusinessServiceHandler: full CRUD + asset linking, tenant-scoped, validates asset ownership before linking - GET /api/v1/dashboard/executive-summary/export?format=csv|json downloadable executive reports - ExecutiveSummary includes regression_count + regression_rate_pct - Business Services distinct from Business Units — represent business capabilities (Payment Processing, Customer Login) that span multiple assets and BUs

- Use approved_at/rejected_at instead of non-existent updated_at on approvals - Remove unnecessary fmt.Sprintf (staticcheck S1039)

compliance_service + simulation_service → internal/app/compliance/. pentest_service stays in package app because it depends on NotificationService + FindingActivityService (not yet extracted). parseOptionalDate helper duplicated into compliance/ so package is self-contained. pkg/domain/compliance aliased to compliancedom. Build + test pass.

3 files (module_service, dashboard_service, report_schedule_service) → internal/app/module/. pkg/domain/module aliased to moduledom, pkg/domain/reportschedule aliased to reportscheduledom. Shim re-exports 3 services + 26 value types. Build + test pass.

- activity/: FindingActivityService (self-contained, no cross-svc deps) - aitriage/: AITriageService + TriageOutputValidator + PromptSanitizer (imports newly-extracted audit + activity packages; pkg/domain/aitriage aliased to aitriagedom) Shims at finding_activity_service.go + ai_triage_service.go re-export all public symbols. Pentest, vulnerability_service, workflow_action_handlers (which reference aiTriageService + activityService) resolve via shim transparently — zero caller change. Build + test pass.

3 files (scan_scheduler, scansession_service, scanprofile_service) moved into existing internal/app/scan/. scheduler.go's self-import (scansvc alias pointing to same package) removed — was a relic from the earlier monolithic service.go split. ScansScheduled metric reference repointed to internal/metrics directly to avoid pulling in package app. scanner_template_service stays in package app because template/ already imports scan/ (for TemplateSyncer interface) — moving template_service into scan/ would create a cycle. Tracked for follow-up after the template→scan adapter is refactored. Build + test pass.

6 files → internal/app/integration/: - integration_service (4 SCM/notification/OAuth orchestration) - notification_service - webhook_service - attachment_service - credential_import_service - secretstore_service pkg/domain/{integration,notification,webhook,attachment} aliased to *dom to disambiguate. secretstore's AuditService refs repointed to auditapp package. Shim re-exports 52 symbols (6 services + 46 input/ output types) — all existing app.XService callers compile unchanged. Build + test pass.

10 files → internal/app/asset/: - asset_service, asset_group_service, asset_import_service, asset_relationship_service, asset_type_service, branch_service, business_unit_service, component_service, sbom_import_service, relationship_suggestion_service pkg/domain/{asset,assetgroup,assettype,branch,businessunit,component, relationship} each aliased to *dom. Shim re-exports 11 services + 39 value types + 3 batch-status constants. Build + test pass.

13 files → internal/app/finding/: - vulnerability_service, finding_actions_service, finding_comment_service, finding_import_service, finding_lifecycle_scheduler, finding_source_service, finding_source_cache_service, priority_classification_service, priority_flood_guard + test, priority_change_publisher_test, bulk_action_guard + test Cross-package deps repointed: activity.FindingActivityService + activity.RecordActivityInput, integration.NotificationService, aitriage.AITriageService. FindingsExpired metric reference switched to internal/metrics package directly. Shim re-exports 14 services + 43 value types + 4 sentinel errors. Build + test pass.

5 files → internal/app/workflow/: - workflow_service, workflow_action_handlers, workflow_event_dispatcher, workflow_executor, workflow_handlers Cross-package deps repointed: finding.VulnerabilityService + finding.FindingSourceCacheService + finding.UpdateFindingStatusInput, integration.IntegrationService + NotificationService + SendNotificationInput, aitriage.AITriageService + TriageRequest, auditapp.AuditService/Context/Event + Success/Failure constructors. Shim re-exports 43 symbols (5 services + option setters + input types). Build + test pass.

pentest_service had been left in package app during the initial compliance extract because it depended on NotificationService and FindingActivityService which were not yet extracted. Both are now in internal/app/integration + internal/app/activity, so pentest can join its sibling compliance + simulation services in internal/app/compliance/. parseOptionalDate helper unified (was previously duplicated in compliance/service.go during the split); NewSuccessEvent / NewDeniedEvent / AuditService refs repointed to auditapp. Shim re-exports PentestService + 12 pentest input types + WithCachedCampaignRole context helper. Build + test pass.

8 files → internal/app/accesscontrol/: - permission_service, permission_cache_service, permission_version_service - role_service, group_service, group_sync_service (+test) - membership_cache_service, rule_service pkg/domain/{accesscontrol,permission,role,group,rule,permissionset} aliased. Audit + notification deps repointed to extracted packages. Field-name collision 'AuditContext AuditContext' pattern preserved via regex fix. Shim re-exports 11 services + 37 input types + 11 option setters. Build + test pass.

auth/ (8 files): - auth_service, sso_service, oauth_service, session_service, email_service, ws_ticket_service, tenant_smtp_resolver, tenant_storage_resolver - pkg/domain/{session,identityprovider,tenant,user} + pkg/email aliased; accesscontrol.Role/PermissionCache/Version deps repointed. tenant/ (3 files): - tenant_service, user_service, tenant_membership_adapter - Depends on extracted auth.SessionService (authapp alias) and accesscontrol packages. Shims re-export 23 auth services+errors + 18 tenant services/inputs. Build + test pass. All 14 planned clusters now extracted except the cross-cutting leftovers (adapters.go, metrics.go, security_validator, scoring_config_provider, validation_coverage) which stay in package app as thin bridges/helpers.

…ility + rename shims CI was failing because commits 6b79f84 (accesscontrol) and 516bd0b (auth + tenant) pushed source where the inner files still had bare references to AuditService / AuditContext / NewSuccessEvent / PermissionCacheService / SessionService etc. — types that only exist in package app. The in-cluster files needed those refs repointed to auditapp.X / accesscontrol.X / authapp.X but the corrections lived in my working tree and never made it into the commit. This commit carries: - auth/tenant/accesscontrol in-cluster files: all bare audit + accesscontrol + session refs now qualified with their sibling package import (auditapp, accesscontrol, authapp). Field-name collision 'AuditContext *AuditContext' preserved. - capability/: 13th subpackage extracted. capability_service (536 lines) → internal/app/capability/service.go. Shim at internal/app/capability_service.go re-exports 7 symbols. - ws_ticket_service_test moved into auth/ where its impl lives. - 5 shim files renamed for consistency with target cluster name: permission_service.go → accesscontrol_service.go vulnerability_service.go → finding_service.go finding_activity_service.go → activity_service.go ai_triage_service.go → aitriage_service.go scan_scheduler.go → scan_service.go (git detects as rename so history follows via git log --follow.) Build + vet + full test suite pass with GOWORK=off.

…lusters 45 files renamed inside internal/app/<cluster>/ to drop the redundant _service suffix (folder name already says it's a service package). Follows Effective Go convention — file name describes WHAT (concept), not the layer. Example: auth/auth_service.go → auth/service.go, accesscontrol/permission_service.go → accesscontrol/permission.go. Struct names keep the *Service suffix (AuthService, AssetService) — only the filenames change. Also moved 2 cross-cutting helpers out of package app into their rightful clusters: - scoring_config_provider.go → asset/ (it implements asset.ScoringConfigProvider, belongs with asset domain) - validation_coverage.go (+ 2 tests) → validation/ (CTEM coverage SLO checker — belongs with the validation domain that already owns executor/evidence/proof-of-fix) contains() helper in coverage_enforce_test.go renamed to containsStr to avoid collision with existing proof_of_fix.go:contains([]ExecutorKind, ExecutorKind). Shim aliases preserved in root so every external app.X caller still compiles. New shim for ValidationCoverage + CoverageThresholds + Enforce + DefaultThresholds + ErrCoverageBelowSLO. asset_service.go shim gained TenantScoringConfigProvider + MapTenantToAssetScoringConfig. CLAUDE.md updated with two new conventions: 8) File naming inside internal/app/<cluster>/ (drop _service suffix) 9) Layer-mirrored package naming (pkg/domain/<X> ↔ internal/app/<X> with 'dom' alias when both imported in one file) Build + vet + full test suite pass.

Problem 1 — missing from catalogue. The sidebar had ~20 shipped features (attack_surface, scope_config, ctem_cycles, attacker_profiles, compensating_controls, workflows, ctem_maturity, priority_rules, business_services, business_impact, risk_analysis, risk_scoring, executive_summary, mitre_coverage, sbom_export, scanner_templates, template_sources, scan_pipelines, remediation_tasks, relationships) that were never registered in the modules table. Tenant admins could not enable/disable them because the Settings > Modules screen iterates the modules rows. Problem 2 — no dependency graph. 'ai_triage requires findings' lived only in an engineer's head. A tenant could disable findings with ai_triage still on and end up with a triage UI that had nothing to triage. Fix 1 — migration 000161_modules_catalog_sync: back-fills the 20 missing rows. ON CONFLICT DO NOTHING so operators who already customised a name/description aren't overwritten. Fix 2 — pkg/domain/module/dependency.go: platform-wide static graph, same pattern as the existing ModulePermissionMapping + RolePermissions — spec-not-data, lives in code so CI catches typos and cycles. Two edge kinds: DependencyHard — tenant CANNOT disable target while dependent remains enabled (blocker). DependencySoft — tenant CAN disable, UI surfaces a warning because dependent will degrade. Plus helpers CanDisable, RequiredToEnable, TransitiveDependencies, DetectCycle (CI acyclicity gate). Fix 3 — ModuleService.UpdateTenantModules: builds the post-update enabled-set BEFORE applying, runs CanDisable / RequiredToEnable against it, rejects 400 with a human-readable list of dependent names. Deps that reference modules absent from the registry are skipped (cross-edition tolerance). Fix 4 — GET /api/v1/tenants/{tenant}/settings/modules/graph: returns the static dependency graph as flat edge list for UI. Fix 5 — tests pkg/domain/module/dependency_test.go: 10 cases including DetectCycle (CI gate), ReferencedModulesExist (catches typo'd ids), hard-vs-soft blocker semantics, transitive walk that skips soft edges. Fix 6 — constants in pkg/domain/module/module.go extended with the 20 new IDs so typed consts are available everywhere. Build + vet + full test suite pass.

53-file gofmt sweep after the cluster refactor (commits f56f341 → 59bbffa). Pure whitespace normalisation — column alignment inside the alias shims + goimports ordering inside service.go files. Caught by 'gofmt -l internal/app/ pkg/domain/module/ pkg/httpsec/ pkg/jwt/ pkg/crypto/' during the full-tree regression audit. The older files-in-package-app followed stdlib alignment (tabs); after split the aliased fields got sub-tab alignment which gofmt prefers — converged here. No behaviour change. Build + vet + full test suite pass.

…k, data repair PM/TechLead review — the original module-toggle flow returned errors as plain strings, so the UI could not render a dependency-aware confirmation dialog. Admins saw a red banner with a comma-separated list of names and had no handle to cascade-disable. Soft warnings were silently dropped by the service — a tenant disabling threat_intel while priority_rules stayed on had no idea priority_rules would degrade. Security review — validate+upsert was two separate DB calls with no lock between. Two admins flipping related modules concurrently could interleave to produce findings=off + ai_triage=on, violating the invariant the static graph is supposed to enforce (TOCTOU). UX review — the only writable endpoint was PATCH. There was no way to ask 'what would happen if I disabled X?' before committing. Dry-run was implicit inside the PATCH error path. DB review — GetTenantModuleConfig called getTenantDisabledModules once; UpdateTenantModules wrote the upsert, then called buildTenantModuleConfig which called getTenantDisabledModules AGAIN. Duplicate query per toggle request. Fix summary: 1. New structured *module.ToggleError carrying module_id, name, action, blockers[], required[] as data. Handler serialises via writeToggleErrorJSON — 400 with JSON body the UI can parse. 2. New POST /settings/modules/validate — dry-run, returns ValidationIssues { blockers, warnings, required } WITHOUT persisting. Wired to module.Service.ValidateToggle. 3. Soft warnings now surface in the PATCH success path via TenantModuleConfigOutput.Warnings. UI toasts 'FYI: X will degrade'. 4. Per-tenant sync.Mutex map serialises concurrent UpdateTenantModules on the same tenant. Different tenants don't contend. Single-replica safe; multi-replica will need a cross-process lock — documented inline. 5. buildTenantModuleConfigFromMaps accepts the pre-loaded disabled map, skipping the duplicate getTenantDisabledModules query in the update path. 6. Migration 000162 repair-orphan-tenant-modules: walks the hard- edge transitive closure and disables every dependent of an already-disabled module. Uses a recursive CTE so cascade depth is not bounded. Skipped for core modules. Rerun-safe. 7. Tests — 5 new cases: blocker ToggleError shape, missing-required ToggleError shape, soft warning surfaces in response, validate dry-run does not mutate, core disable rejected before dep check. Full regression green — GOWORK=off go build + go vet + go test ./... all pass.

- Add 9 curated presets (Minimal, Asset Inventory, VM Essentials, ASM, Offensive Security, SBOM & Supply Chain, CSPM, Compliance, CTEM Full) in pkg/domain/module/presets.go with target persona, key outcomes, and module bundles. Apply via POST .../modules/presets/{id}/apply. - Mandatory tier (agents, integrations, integrations.notifications, notification_settings, groups, api_keys) auto-included in every preset and rejected on disable — without `agents` no data ingestion is possible, treating it as feature-optional was a DoS vector. - Sub-module inheritance: enabling a parent (assets, ai_triage, integrations) implicitly enables its sub-modules so presets don't have to enumerate 24 assets.* types. - Implicit sub-module → parent cascade in CanDisable so disabling assets blocks while assets.* are still on. - Dependency graph additions: branches→components(soft), business_impact→findings(hard), workflows→findings(soft, was hard), sla→findings, policies→findings, compliance→findings, iocs→threat_intel, scan_profiles→scans. - Module version counter (mod_ver:{tid}) in Redis powers ETag headers on /me/modules and /settings/modules — 304 Not Modified when client cache matches. Bumped via notifyModuleChange on every toggle/apply/reset. - WS broadcaster fans out module.updated events on tenant:{id} channel for cross-tab realtime invalidation. - iocs added to ModulePermissionMapping (was referenced in dep graph but missing from catalogue, breaking TestReferencedModulesExist). - ETag includes tenant prefix so a misconfigured proxy/CDN can't serve tenant A's response to tenant B.

User-supplied BaseURL on GitHub/GitLab/Bitbucket/Azure DevOps integrations was passed directly to a plain http.Client without validation. An attacker creating an integration with BaseURL=http://169.254.169.254/latest/meta-data could exfiltrate AWS IMDSv1 credentials via TestConnection(). Each client now: - Calls httpsec.ValidateURL(config.BaseURL) — rejects loopback, private IPs, link-local, cloud metadata endpoints. - Uses httpsec.SafeHTTPClient(timeout) whose dialer enforces the same blocklist (defense-in-depth in case future code paths bypass the URL validation). Audit finding: CRITICAL.

Findings from the multi-role security audit, low-risk patches batched together. Each is small and orthogonal to the others. - asset_repository.GetByProperty: parameterise the JSONB key (a.properties ->> $3) instead of string-concatenating the whitelisted key into the SQL. Whitelist still constrains the value, but removing the concat pattern entirely means a future "let users add custom correlation keys" feature can't accidentally regress to injectable. - webhook_hmac middleware: replay protection. Senders MUST now include X-OpenCTEM-Timestamp (unix seconds) and the HMAC is computed over "<timestamp>.<body>" (Stripe convention). Requests outside ±5 min are rejected with 401, defeating captured-signature replay attacks. Two new tests: stale timestamp + missing timestamp. - permission_sync middleware: fail-CLOSED on stale permissions for unsafe HTTP methods (POST/PUT/PATCH/DELETE) — return 409 instead of just setting a header that non-browser clients can ignore. Read traffic continues through the cache fetch which uses fresh perms. - crypto.IsNoOp helper + integration.CreateIntegration warning when persisting credentials with the NoOp encryptor active. Boot-time check (initEncryptor) already requires APP_ALLOW_PLAINTEXT_CREDENTIALS to permit NoOp; this adds visibility at each credential write so the plaintext storage can't be silent. - config.EncryptionConfig.AllowPlaintext (env: APP_ALLOW_PLAINTEXT_ CREDENTIALS) — explicit opt-in for non-prod plaintext credential storage. initEncryptor refuses to boot in any env if the key is unset and this flag is also unset (production already refused). - bootstrap_handler.GetTenantModules ETag: include tenant_id prefix so a misconfigured proxy can't serve tenant A's response to tenant B. - migration 000162 recursive CTE: add depth limit (50) to defend against a corrupted edge graph causing infinite recursion + DB connection pool exhaustion. - services.go: wire the per-tenant ModuleVersionService (Redis-backed counter for ETag generation and future Redis payload cache).

Invitation acceptance was creating the membership with only membership.Role (owner/admin/member/viewer) and silently dropping the RBAC role_ids attached to the invitation. The inviter chose specific scoped roles, the invitee never received them — a privilege-escalation INVERSE: legitimate roles never granted. Both acceptance paths now: - Iterate invitation.RoleIDs() after AcceptInvitationTx commits. - Call roleService.AssignRole for each. Best-effort: failure on one role logs but doesn't roll back — the user is already a member, missing roles can be re-granted manually. - Audit metadata records role_ids_count for traceability. TenantService gains SetRoleService (used by services.go bootstrap where the tenant service is constructed before the role service). Audit finding: CRITICAL.

CodeQL flagged hashAgentAPIKey as using SHA-256 directly on sensitive data. The API keys themselves are 32 random bytes from crypto/rand (256 bits of entropy — uncrackable even with plain SHA-256), so the warning is conservative, but switching to a peppered HMAC closes a real defense-in-depth gap: an attacker who exfiltrates the database but NOT the application config can no longer pre-compute candidate hashes for offline brute force. Changes: - Replace plain sha256.Sum256 with crypto.HashTokenPeppered which HMAC-SHA256s the token under a server-side pepper. Empty pepper collapses back to plain SHA-256 for compatibility. - AgentService.SetPepper wires the pepper from the platform encryption key (cfg.Encryption.Key) at boot. - AuthenticateByAPIKey: peppered hash lookup first, fall back to legacy plain SHA-256 lookup so rows written before the pepper was deployed continue to authenticate. Same pattern used by internal/app/apikey/service.go (F-9). - generateAgentAPIKey + hashAgentAPIKey become methods on the service so they have access to s.pepper. No backward-compat break: the legacy fallback runs only when the peppered lookup misses, and only when a pepper is actually configured.

…ain verify Closes the primitive-adoption gaps identified by the 2026-04 security audit. Each fix has: (1) code, (2) tests, (3) Prometheus metric where applicable. A root-level security-lint CI gate prevents regression. Outbound HTTP (audit C1/C2): - Swap bare &http.Client{} for httpsec.SafeHTTPClient at 12 sites: llm/{claude,gemini,openai}, notifier/{slack,teams,telegram,webhook}, jira/client, auth/{oauth,sso}, threat/{intel_service,intel_refresher}, keycloak/validator. Each site has a comment spelling out the SSRF threat the dialer blocks. CSRF (audit C7): - Mount CSRFOptional in buildTokenTenantMiddlewares so every JWT-cookie state-changing route validates the double-submit-cookie. API-key and webhook routes are naturally excluded (different middleware chain). - Emit openctem_security_csrf_rejections_total{reason,method} on every reject branch in both CSRF + CSRFOptional. - 24-case integration test (csrf_integration_test.go) covers strict + optional modes, all 4 state-changing methods, token round-trip, and the HttpOnly=false invariant. Email verification (audit C8): - auth/sso.go parseOktaUserInfo + parseGoogleUserInfo reject responses with email_verified=false. EntraID path documents its Graph-API- directory-verified trust boundary instead. Startup sentinel (audit C11): - config.go refuses to boot non-development environments when the JWT secret or encryption key matches the docker-compose dev default. 7-case unit test asserts both the allow-list (dev accepts) and the reject-list (prod + staging reject with a clear error). AI-triage validator + budget (audit C10, RFC-008): - aitriage/validation.go stops silently coercing invalid severity to "medium". It records a ValidationWarning so the service can flag the result for human review instead of auto-applying. - TriageResult carries ValidationWarnings; NeedsReview() convenience wrapper. logTriageNeedsReview + ActionAITriageNeedsReview surface a distinct audit action. - RFC-008 Phase 1 scaffold: ai_triage_budgets table (migration 000163), BudgetService (Check/Record/Status with strict + non- strict repo-failure modes), Postgres repository with atomic UPDATE RETURNING for multi-pod correctness, wiring in services.go behind AI_TRIAGE_BUDGET_ENABLED feature flag (defaults off — zero behaviour change on this deploy). 16-case unit test pins the budget contract. Audit-chain verify (audit, RFC-007 prelude): - New controller audit_chain_verify walks every active tenant's audit_log_chain hourly. Logs an ERROR-level entry with keyword audit_chain_break and increments openctem_security_audit_chain_breaks_total per break so a SIEM can page without polling the admin verify endpoint. 7-case unit test covers happy path, mid-tenant error, cancelled context, and break-detected-but-still-processed. Rate limit (audit P6-5): - Wire TelemetryRateLimiter onto /api/v1/agent/telemetry-events. Closes the "defined but not mounted" regression class. Metrics package: - Add openctem_security_* series in internal/metrics/security_defenses.go: csrf_rejections_total, audit_chain_breaks_total, audit_chain_verify_runs_total, ai_triage_needs_review_total, ai_triage_budget_used_tokens, ai_triage_budget_exhausted_total.

- sso.go parseOktaUserInfo / parseGoogleUserInfo: start the error string with a lowercase word (staticcheck ST1005). The previous wording led with the brand name ("Okta provider ...", "Google Workspace provider ...") which reads fine but trips ST1005's "no leading capital" rule. Rephrased to start with the lowercase brand + keep the audit trail clear. - webhook_hmac.go: remove the unreferenced computeHMAC helper. It was a left-over from the pre-timestamp signing path and is no longer called by production code or tests. The timestamp variant computeHMACWithTimestamp is the single source of truth. Removing rather than nolint-suppressing because we'd rather not carry unused code.

CodeQL flagged 2 critical email-content-injection alerts on this PR. They pre-date our audit sprint (the PR just got large enough to cross CodeQL's baseline-refresh threshold), but both are genuine so we fix them here rather than defer. pkg/email/email.go: - New exported SanitizeHeaderValue(v): replaces CR and LF with a single space. Applied to every value that ends up on an SMTP header line in buildMessage: FromName, From, each recipient in To, Subject, ReplyTo, plus custom Headers key + value. Without this, an attacker-controlled Subject like "Title\r\nBcc: attacker@evil.com" smuggles an extra header past the SMTP boundary (CWE-93 / CRLF injection, CodeQL go/email-content-injection). - 5-case unit test pins the contract: CR, LF, CRLF, benign content preserved, replacement-with-space semantics. pkg/email/email.go LoggingSender: - Also ran the 4 medium go/log-injection alerts on this file by sanitising the "to" address and subject before passing them to the structured logger. Keeps a malicious address like "victim@example.com\nFAKE-LOG: admin login succeeded" from forging log lines on loggers that don't JSON-escape values. internal/infra/notifier/email.go: - Same treatment for the notifier SMTP path: Title → subject (also when prefixed with severity emoji), FromName, FromEmail, every ToEmails entry, ReplyTo. Imports emailpkg for the sanitiser instead of duplicating the 4 lines locally. internal/infra/postgres/ai_triage_budget_repository.go: - Defensive cleanup (not CodeQL-flagged): UpdateLastWarnSent and UpdateLastBlockSent now carry their own static UPDATE strings rather than sharing a column-name-interpolated Sprintf helper. Two literal SQL strings, zero runtime string formatting in SQL land, no tainted-flow surface. The former helper was safe via a whitelist map, but a future refactor that drops the whitelist would regress silently; explicit SQL avoids that class entirely.

… doc dismissals Follow-up to fix(security): email-header CRLF injection (4ceb3de). Closes the remaining CodeQL alerts that landed on this PR: SQL-query-from-user-input (3 high): - apikey_repository.go ListAPIKeys: replace map-allowlist + concatenation with explicit switch/case that assigns LITERAL column-names on every branch. Map-based allowlists are safe but CodeQL's taint tracker can't trace through them. - template_source_repository.go List: same switch/case refactor for sortBy. Also binds LIMIT/OFFSET via $N parameters instead of fmt.Sprintf %d — kind-checked by Go's typechecker already, but the binding keeps CodeQL's flow-analysis fully clean. - asset_repository.go countByFields: parameterise the JSONB key in the UNION ALL clauses. The regex whitelist is retained as an early reject but no longer feeds into the query string directly. Slice-allocation with user-influenced size (2 high): - finding_approval_repository.go: const maxApprovalsCap = 100; clamp page.Limit() into a separate `capacity` var before make(). - tenant_repository.go SearchMembersWithUserInfo: const maxMembersCap = 1000; same pattern. Defence-in-depth cap even though the HTTP handler already paginates. Log-entry-from-user-input (1 medium): - ingest/service.go Ingest: wrap report.Metadata.ID + SourceType through a new sanitizeIngestLogField (log_sanitize.go) that strips CR/LF and caps length at 512 chars. A compromised agent can no longer forge log lines via CTIS metadata. Useless-assignment (8 warnings): - Remove or annotate the trailing `argIndex++` in 8 repository query-builders (rule_source, rule, rule_bundle, finding, audit, agent, admin×2). Each was the last condition in its chain, so the post-increment had no reader. Wrapper-level false-positives (4 dismissals, documented in code): - pkg/crypto/hash.go HashToken + HashTokenBytes: full doc-comment explaining why SHA-256 is correct for 256-bit random tokens (not passwords), paired with `#nosec G401` and a pointer to dismiss the CodeQL go/weak-cryptographic-algorithm alert as a false positive. - middleware/{admin_audit,metrics,timeout}.go Write wrappers: add doc-comments explaining each is a transparent passthrough; request-layer escaping is the handler's job, not the wrapper's. Dismiss CodeQL go/reflected-xss on these lines. Tests continue to pass (ingest, postgres, middleware, crypto, email).

@@ -135,6 +135,10 @@
 	mrw.ResponseWriter.WriteHeader(code)
 }

+// Write implements http.ResponseWriter. Same CodeQL
+// go/reflected-xss false-positive note as auditResponseWriter.Write:


Splits the SSRF blocklist into two tiers so on-prem CTEM deployments can scan their own RFC1918 / ULA space without losing the cloud- metadata / loopback / CGNAT guard. Design: hardBlockedIPRanges — ALWAYS blocked, no env-var opens these: 127.0.0.0/8 loopback on the API host 169.254.0.0/16 link-local incl. AWS/GCP/Azure IMDS 100.64.0.0/10 carrier-grade NAT 0.0.0.0/8 "this" network 224.0.0.0/4 multicast 240.0.0.0/4 reserved 255.255.255.255/32 broadcast ::1/128 IPv6 loopback fe80::/10 IPv6 link-local privateIPRanges — blocked BY DEFAULT, opt-in via OPENCTEM_HTTPSEC_ALLOW_PRIVATE=1 : 10.0.0.0/8 RFC1918 class A 172.16.0.0/12 RFC1918 class B 192.168.0.0/16 RFC1918 class C fc00::/7 IPv6 ULA Critical invariant (tested): even with OPENCTEM_HTTPSEC_ALLOW_PRIVATE=1 an attacker cannot reach IMDS or loopback. A compromised admin who flips the env var still cannot weaponise the dialer to leak cloud credentials. Why env var, not config file: ops set this per-deployment (Helm values / Docker env) — a cloud-hosted API pod keeps the cloud-safe default; an internal API pod in a corporate DC opens up. Per-tenant policy can be layered later if needed. AllowPrivate() exposed so main.go can log the posture at boot. Test coverage: new cases pin (a) RFC1918 reachable with opt-in, (b) IMDS / loopback / CGNAT stay refused with opt-in, (c) alias hostnames (localhost, metadata.google.internal) stay refused in both modes, (d) default mode still blocks RFC1918. Same change lands simultaneously in sdk-go/pkg/httpsec (for SDK consumers) and agent/internal/executor/target_security.go (for scanner target validation). The three copies stay drift-free via scripts/security-lint.sh Rule 6.

…ed flow CodeQL's go/unsafe-slice-allocation re-flagged these after the clamp-with-separate-var fix: the tool keeps the "user-influenced" taint on any value that originated from an HTTP parameter, even when the value has been bounds-checked afterwards. The only path to a clean flow is to hand make() the literal constant directly. finding_approval_repository.go ListPending: approvals := make([]T, 0, maxApprovalsCap) // const 100 tenant_repository.go SearchMembersWithUserInfo: members := make([]T, 0, maxMembersCap) // const 1000 Memory cost: 100 × 8 B + 1000 × 8 B = ~8.8 KB per request when the page returns zero rows. Acceptable; any growth past the const is still handled by Go's normal slice-grow path. Semantics unchanged — make's capacity is an allocation hint, not a length. Queries returning fewer rows still produce the right result.

0xmanhnv added 25 commits April 15, 2026 16:44

fix: migration 137 down.sql missing triggers + migration 141 operator…

d361000

… precedence - 137.down: drop all 3 finding triggers (insert/update/delete), not just 1 - 141.up: add parentheses to fix AND/OR precedence in sub_type backfill

docs: add RFCs 003-005

172ecca

- RFC-003: Monorepo consolidation (api + ui + agent) - RFC-004: Priority classes P0-P3 + EPSS/KEV enrichment - RFC-005: CTEM maturity gaps (cycles, metrics, controls, escalation)

fix: update EPSS URL + add User-Agent for threat intel sync

9d87517

- EPSS CSV URL changed: epss.cyentia.com → epss.empiricalsecurity.com - Add User-Agent header to both EPSS and KEV requests (CISA blocks bare requests)

feat: RFC-005 Gap 5 — data quality scorecard API

94851e9

- GET /api/v1/dashboard/data-quality endpoint - Single CTE query: asset ownership %, finding evidence %, median last-seen age, dedup rate - DataQualityScorecard type + service + handler + route - Mock updated for test compatibility

feat: RFC-005 Gap 7 — SLA escalation controller

8ebd6ac

- Background controller runs every 15 min - Marks overdue findings as sla_status='breached' - Marks approaching-deadline findings as 'warning' (within 3 days) - Migration 000143: partial index for efficient overdue queries

docs: update RFC-005 status — Phase 1 complete (gaps 5, 7, 8)

c4af5de

docs: CTEM platform architecture + RFC-005 completed

95c80f2

- Architecture doc: 5-stage CTEM workflow, feature map, database schema overview, background controllers, outcome metrics explanation - RFC-005 marked completed (all 7 gaps implemented)

docs: rewrite all RFCs and architecture doc to 100% English

928b895

chore: remove docs moved to root docs/ directory

d8babec

fix: data quality scorecard query — use last_seen not last_seen_at

5fcf7c8

feat: wire SLA escalation + risk snapshot controllers into worker man…

bb5e217

…ager - Register SLAEscalationController (15 min interval) - Register RiskSnapshotController (6 hour interval) - Both controllers now start automatically with the application

fix: verification checklist route conflict — use separate path prefix

a85685b

github-advanced-security AI found potential problems Apr 15, 2026

View reviewed changes

Comment thread internal/infra/http/handler/attacker_profile_handler.go Fixed

Comment thread internal/infra/http/handler/compensating_control_handler.go Fixed

Comment thread internal/infra/http/handler/ctem_cycle_handler.go Fixed

0xmanhnv added 2 commits April 15, 2026 20:06

github-advanced-security AI found potential problems Apr 16, 2026

View reviewed changes

Comment thread internal/infra/http/handler/business_service_handler.go Fixed

fix: process metrics query + staticcheck S1039

fc33f10

- Use approved_at/rejected_at instead of non-existent updated_at on approvals - Remove unnecessary fmt.Sprintf (staticcheck S1039)

0xmanhnv added 12 commits April 21, 2026 02:28

github-advanced-security AI found potential problems Apr 21, 2026

View reviewed changes

Comment thread internal/app/agent/service.go Fixed

0xmanhnv added 13 commits April 21, 2026 04:30

github-advanced-security AI found potential problems Apr 22, 2026

View reviewed changes

0xmanhnv added 2 commits April 22, 2026 12:44

0xmanhnv merged commit e04ef6e into develop Apr 23, 2026
12 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/ctem prioritization and maturity#51

Feat/ctem prioritization and maturity#51
0xmanhnv merged 113 commits into
developfrom
feat/ctem-prioritization-and-maturity

0xmanhnv commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xmanhnv commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants