Feat/ctem prioritization and maturity#51
Merged
Conversation
- Add tenant_id verification to ApproveAndMerge and RejectReview - Add tenant_id to all UPDATE statements in merge (findings, relationships, etc.) - Handler passes tenantID from JWT context to repository - Align approve/reject permissions (both require AssetsDelete) - Promote sub_type from properties/TypeAliases during asset merge - Remove unused queryStringPtr helper
… precedence - 137.down: drop all 3 finding triggers (insert/update/delete), not just 1 - 141.up: add parentheses to fix AND/OR precedence in sub_type backfill
- RFC-003: Monorepo consolidation (api + ui + agent) - RFC-004: Priority classes P0-P3 + EPSS/KEV enrichment - RFC-005: CTEM maturity gaps (cycles, metrics, controls, escalation)
- EPSS CSV URL changed: epss.cyentia.com → epss.empiricalsecurity.com - Add User-Agent header to both EPSS and KEV requests (CISA blocks bare requests)
- Migration 000142: enrichment columns on findings, priority_override_rules, priority_class_audit_log, SLA p0-p3 days - Domain: priority.go (ClassifyPriority deterministic rules, P0/P1/P2/P3) - Domain: priority_rule.go (override rules with condition evaluation) - Finding entity: enrichment fields (epss, kev, priority, reachability) - Repository: 11 new columns in SELECT/INSERT/UPSERT/UPDATE/scan - Service: PriorityClassificationService (classify, batch, enrich) - Handler: FindingResponse includes enrichment + priority fields - 4 default override rules seeded per tenant
- GET /api/v1/dashboard/data-quality endpoint - Single CTE query: asset ownership %, finding evidence %, median last-seen age, dedup rate - DataQualityScorecard type + service + handler + route - Mock updated for test compatibility
- Background controller runs every 15 min - Marks overdue findings as sla_status='breached' - Marks approaching-deadline findings as 'warning' (within 3 days) - Migration 000143: partial index for efficient overdue queries
- Migration 000144: finding_verification_checklists table - Domain entity: IsComplete(), IncompleteItems(), UpdateItem() - Required items: exposure_cleared, evidence_attached, register_updated - Optional items: monitoring_added, regression_scheduled (NULL = N/A) - Gates finding verified/closed transition
- Migration 000145: risk_snapshots table (daily per-tenant) - RiskSnapshotController: runs every 6h, computes risk/findings/SLA/MTTR/priority - GET /api/v1/dashboard/risk-trend?days=90 endpoint - RiskTrendPoint type with risk_score, findings_open, SLA compliance, P0-P3 counts
- Migration 000146: compensating_controls, control_assets, control_findings - Domain entity: IsEffective(), RecordTest(), reduction_factor 0.0-1.0 - Control types: segmentation, identity, runtime, detection, other - Auto-deactivate on failed test
- Migration 000147: attacker_profiles table with 4 types - Domain entity: capabilities JSONB, assumptions, is_default - Seed 4 default profiles per tenant: external_unauth, external_stolen_creds, malicious_insider, supplier_compromise
- Migration 000148: ctem_cycles, scope_snapshots, cycle_metrics, cycle_attacker_profiles tables - Domain entity: state machine (planning → active → review → closed) - Charter JSONB (business_priorities, risk_appetite, objectives) - Scope snapshot at activation, metrics at close - Links to attacker profiles for threat model
- Architecture doc: 5-stage CTEM workflow, feature map, database schema overview, background controllers, outcome metrics explanation - RFC-005 marked completed (all 7 gaps implemented)
…y, missing index - SLA escalation: add per-tenant logging via RETURNING tenant_id - Risk snapshot: replace 12 correlated subqueries with 4 CTEs + LEFT JOIN (O(1) table scans regardless of tenant count) - Migration 000149: add tenant_id index on priority_class_audit_log
…ager - Register SLAEscalationController (15 min interval) - Register RiskSnapshotController (6 hour interval) - Both controllers now start automatically with the application
- Create EPSS/KEV adapters wrapping existing threat intel repos - Create PriorityRuleRepository (list active, create, delete) - Create PriorityAuditRepository (log changes) - Register repos in Repositories struct - Register PriorityClassificationService in Services - Interface compliance verified at compile time
…iles, CTEM cycles - CompensatingControlHandler: List, Get, Create, Update, Delete, RecordTest, LinkAssets/Findings → /api/v1/compensating-controls/* - AttackerProfileHandler: List, Get, Create, Update, Delete → /api/v1/attacker-profiles/* - CTEMCycleHandler: List, Get, Create, Update, Activate, StartReview, Close, GetScope, LinkProfile → /api/v1/ctem-cycles/* - All handlers use direct SQL with tenant_id isolation - Routes registered with permission middleware
…lers - Add PriorityClassifier interface to FindingProcessor - Auto-enrich + classify findings on ingest (Step 4c in ProcessBatch) - Fetch assets for classification context (batch) - Persist enriched fields after classification - Wire PriorityClassificationService into ingest via SetPriorityClassifier - Instantiate CompensatingControl, AttackerProfile, CTEMCycle handlers - All routes now fully functional end-to-end
- CompensatingControl LinkAssets: validate asset belongs to tenant before INSERT - CompensatingControl LinkFindings: validate finding belongs to tenant before INSERT - CTEMCycle LinkProfile: validate profile belongs to tenant before INSERT - Ingest enrichment: clarify dedup map comment (1-5 unique assets per batch)
… findings
- VerificationChecklistHandler: GET/PUT /findings/{id}/verification-checklist
Auto-creates checklist on first access, computes is_complete
- Migration 000150: CIA impact fields (confidentiality/integrity/availability)
on assets, environment field (production/staging/dev), mitre_technique_id
+ mitre_tactic on findings for ATT&CK coverage
- Route registered, handler instantiated
…reshness SLA - GET /api/v1/dashboard/executive-summary: risk change, P0/P1 resolved, SLA compliance, MTTR, crown jewels at risk, top 5 risks - GET /api/v1/dashboard/mttr-analytics: MTTR by severity + priority class - GET /api/v1/dashboard/process-metrics: approval velocity, stale assets, unassigned findings, time-to-assign - Migration 000151: freshness_status on assets, stale check index, approval velocity index
…xport Phase 3 (100% CTEM completion): - Migration 000152: business_services + business_service_assets tables, regression tracking fields on findings (is_regression, reopen_count, last_reopened_at), finding_regression_events log - BusinessServiceHandler: full CRUD + asset linking, tenant-scoped, validates asset ownership before linking - GET /api/v1/dashboard/executive-summary/export?format=csv|json downloadable executive reports - ExecutiveSummary includes regression_count + regression_rate_pct - Business Services distinct from Business Units — represent business capabilities (Payment Processing, Customer Login) that span multiple assets and BUs
- Use approved_at/rejected_at instead of non-existent updated_at on approvals - Remove unnecessary fmt.Sprintf (staticcheck S1039)
compliance_service + simulation_service → internal/app/compliance/. pentest_service stays in package app because it depends on NotificationService + FindingActivityService (not yet extracted). parseOptionalDate helper duplicated into compliance/ so package is self-contained. pkg/domain/compliance aliased to compliancedom. Build + test pass.
3 files (module_service, dashboard_service, report_schedule_service) → internal/app/module/. pkg/domain/module aliased to moduledom, pkg/domain/reportschedule aliased to reportscheduledom. Shim re-exports 3 services + 26 value types. Build + test pass.
- activity/: FindingActivityService (self-contained, no cross-svc deps) - aitriage/: AITriageService + TriageOutputValidator + PromptSanitizer (imports newly-extracted audit + activity packages; pkg/domain/aitriage aliased to aitriagedom) Shims at finding_activity_service.go + ai_triage_service.go re-export all public symbols. Pentest, vulnerability_service, workflow_action_handlers (which reference aiTriageService + activityService) resolve via shim transparently — zero caller change. Build + test pass.
3 files (scan_scheduler, scansession_service, scanprofile_service) moved into existing internal/app/scan/. scheduler.go's self-import (scansvc alias pointing to same package) removed — was a relic from the earlier monolithic service.go split. ScansScheduled metric reference repointed to internal/metrics directly to avoid pulling in package app. scanner_template_service stays in package app because template/ already imports scan/ (for TemplateSyncer interface) — moving template_service into scan/ would create a cycle. Tracked for follow-up after the template→scan adapter is refactored. Build + test pass.
6 files → internal/app/integration/:
- integration_service (4 SCM/notification/OAuth orchestration)
- notification_service
- webhook_service
- attachment_service
- credential_import_service
- secretstore_service
pkg/domain/{integration,notification,webhook,attachment} aliased to
*dom to disambiguate. secretstore's AuditService refs repointed to
auditapp package. Shim re-exports 52 symbols (6 services + 46 input/
output types) — all existing app.XService callers compile unchanged.
Build + test pass.
10 files → internal/app/asset/:
- asset_service, asset_group_service, asset_import_service,
asset_relationship_service, asset_type_service, branch_service,
business_unit_service, component_service, sbom_import_service,
relationship_suggestion_service
pkg/domain/{asset,assetgroup,assettype,branch,businessunit,component,
relationship} each aliased to *dom. Shim re-exports 11 services +
39 value types + 3 batch-status constants.
Build + test pass.
13 files → internal/app/finding/: - vulnerability_service, finding_actions_service, finding_comment_service, finding_import_service, finding_lifecycle_scheduler, finding_source_service, finding_source_cache_service, priority_classification_service, priority_flood_guard + test, priority_change_publisher_test, bulk_action_guard + test Cross-package deps repointed: activity.FindingActivityService + activity.RecordActivityInput, integration.NotificationService, aitriage.AITriageService. FindingsExpired metric reference switched to internal/metrics package directly. Shim re-exports 14 services + 43 value types + 4 sentinel errors. Build + test pass.
5 files → internal/app/workflow/: - workflow_service, workflow_action_handlers, workflow_event_dispatcher, workflow_executor, workflow_handlers Cross-package deps repointed: finding.VulnerabilityService + finding.FindingSourceCacheService + finding.UpdateFindingStatusInput, integration.IntegrationService + NotificationService + SendNotificationInput, aitriage.AITriageService + TriageRequest, auditapp.AuditService/Context/Event + Success/Failure constructors. Shim re-exports 43 symbols (5 services + option setters + input types). Build + test pass.
pentest_service had been left in package app during the initial compliance extract because it depended on NotificationService and FindingActivityService which were not yet extracted. Both are now in internal/app/integration + internal/app/activity, so pentest can join its sibling compliance + simulation services in internal/app/compliance/. parseOptionalDate helper unified (was previously duplicated in compliance/service.go during the split); NewSuccessEvent / NewDeniedEvent / AuditService refs repointed to auditapp. Shim re-exports PentestService + 12 pentest input types + WithCachedCampaignRole context helper. Build + test pass.
8 files → internal/app/accesscontrol/:
- permission_service, permission_cache_service, permission_version_service
- role_service, group_service, group_sync_service (+test)
- membership_cache_service, rule_service
pkg/domain/{accesscontrol,permission,role,group,rule,permissionset} aliased.
Audit + notification deps repointed to extracted packages. Field-name
collision 'AuditContext AuditContext' pattern preserved via regex fix.
Shim re-exports 11 services + 37 input types + 11 option setters.
Build + test pass.
auth/ (8 files):
- auth_service, sso_service, oauth_service, session_service,
email_service, ws_ticket_service, tenant_smtp_resolver,
tenant_storage_resolver
- pkg/domain/{session,identityprovider,tenant,user} + pkg/email
aliased; accesscontrol.Role/PermissionCache/Version deps repointed.
tenant/ (3 files):
- tenant_service, user_service, tenant_membership_adapter
- Depends on extracted auth.SessionService (authapp alias) and
accesscontrol packages.
Shims re-export 23 auth services+errors + 18 tenant services/inputs.
Build + test pass. All 14 planned clusters now extracted except the
cross-cutting leftovers (adapters.go, metrics.go, security_validator,
scoring_config_provider, validation_coverage) which stay in package
app as thin bridges/helpers.
…ility + rename shims CI was failing because commits 6b79f84 (accesscontrol) and 516bd0b (auth + tenant) pushed source where the inner files still had bare references to AuditService / AuditContext / NewSuccessEvent / PermissionCacheService / SessionService etc. — types that only exist in package app. The in-cluster files needed those refs repointed to auditapp.X / accesscontrol.X / authapp.X but the corrections lived in my working tree and never made it into the commit. This commit carries: - auth/tenant/accesscontrol in-cluster files: all bare audit + accesscontrol + session refs now qualified with their sibling package import (auditapp, accesscontrol, authapp). Field-name collision 'AuditContext *AuditContext' preserved. - capability/: 13th subpackage extracted. capability_service (536 lines) → internal/app/capability/service.go. Shim at internal/app/capability_service.go re-exports 7 symbols. - ws_ticket_service_test moved into auth/ where its impl lives. - 5 shim files renamed for consistency with target cluster name: permission_service.go → accesscontrol_service.go vulnerability_service.go → finding_service.go finding_activity_service.go → activity_service.go ai_triage_service.go → aitriage_service.go scan_scheduler.go → scan_service.go (git detects as rename so history follows via git log --follow.) Build + vet + full test suite pass with GOWORK=off.
…lusters
45 files renamed inside internal/app/<cluster>/ to drop the redundant
_service suffix (folder name already says it's a service package).
Follows Effective Go convention — file name describes WHAT (concept),
not the layer. Example: auth/auth_service.go → auth/service.go,
accesscontrol/permission_service.go → accesscontrol/permission.go.
Struct names keep the *Service suffix (AuthService, AssetService) —
only the filenames change.
Also moved 2 cross-cutting helpers out of package app into their
rightful clusters:
- scoring_config_provider.go → asset/ (it implements
asset.ScoringConfigProvider, belongs with asset domain)
- validation_coverage.go (+ 2 tests) → validation/ (CTEM coverage
SLO checker — belongs with the validation domain that already
owns executor/evidence/proof-of-fix)
contains() helper in coverage_enforce_test.go renamed to
containsStr to avoid collision with existing
proof_of_fix.go:contains([]ExecutorKind, ExecutorKind).
Shim aliases preserved in root so every external app.X caller still
compiles. New shim for ValidationCoverage + CoverageThresholds +
Enforce + DefaultThresholds + ErrCoverageBelowSLO. asset_service.go
shim gained TenantScoringConfigProvider + MapTenantToAssetScoringConfig.
CLAUDE.md updated with two new conventions:
8) File naming inside internal/app/<cluster>/ (drop _service suffix)
9) Layer-mirrored package naming (pkg/domain/<X> ↔ internal/app/<X>
with 'dom' alias when both imported in one file)
Build + vet + full test suite pass.
Problem 1 — missing from catalogue.
The sidebar had ~20 shipped features (attack_surface, scope_config,
ctem_cycles, attacker_profiles, compensating_controls, workflows,
ctem_maturity, priority_rules, business_services, business_impact,
risk_analysis, risk_scoring, executive_summary, mitre_coverage,
sbom_export, scanner_templates, template_sources, scan_pipelines,
remediation_tasks, relationships) that were never registered in the
modules table. Tenant admins could not enable/disable them because
the Settings > Modules screen iterates the modules rows.
Problem 2 — no dependency graph.
'ai_triage requires findings' lived only in an engineer's head. A
tenant could disable findings with ai_triage still on and end up
with a triage UI that had nothing to triage.
Fix 1 — migration 000161_modules_catalog_sync:
back-fills the 20 missing rows. ON CONFLICT DO NOTHING so operators
who already customised a name/description aren't overwritten.
Fix 2 — pkg/domain/module/dependency.go:
platform-wide static graph, same pattern as the existing
ModulePermissionMapping + RolePermissions — spec-not-data, lives in
code so CI catches typos and cycles. Two edge kinds:
DependencyHard — tenant CANNOT disable target while dependent
remains enabled (blocker).
DependencySoft — tenant CAN disable, UI surfaces a warning
because dependent will degrade.
Plus helpers CanDisable, RequiredToEnable, TransitiveDependencies,
DetectCycle (CI acyclicity gate).
Fix 3 — ModuleService.UpdateTenantModules:
builds the post-update enabled-set BEFORE applying, runs CanDisable
/ RequiredToEnable against it, rejects 400 with a human-readable
list of dependent names. Deps that reference modules absent from
the registry are skipped (cross-edition tolerance).
Fix 4 — GET /api/v1/tenants/{tenant}/settings/modules/graph:
returns the static dependency graph as flat edge list for UI.
Fix 5 — tests pkg/domain/module/dependency_test.go:
10 cases including DetectCycle (CI gate), ReferencedModulesExist
(catches typo'd ids), hard-vs-soft blocker semantics, transitive
walk that skips soft edges.
Fix 6 — constants in pkg/domain/module/module.go extended with the
20 new IDs so typed consts are available everywhere.
Build + vet + full test suite pass.
53-file gofmt sweep after the cluster refactor (commits f56f341 → 59bbffa). Pure whitespace normalisation — column alignment inside the alias shims + goimports ordering inside service.go files. Caught by 'gofmt -l internal/app/ pkg/domain/module/ pkg/httpsec/ pkg/jwt/ pkg/crypto/' during the full-tree regression audit. The older files-in-package-app followed stdlib alignment (tabs); after split the aliased fields got sub-tab alignment which gofmt prefers — converged here. No behaviour change. Build + vet + full test suite pass.
…k, data repair
PM/TechLead review — the original module-toggle flow returned errors
as plain strings, so the UI could not render a dependency-aware
confirmation dialog. Admins saw a red banner with a comma-separated
list of names and had no handle to cascade-disable. Soft warnings
were silently dropped by the service — a tenant disabling threat_intel
while priority_rules stayed on had no idea priority_rules would
degrade.
Security review — validate+upsert was two separate DB calls with no
lock between. Two admins flipping related modules concurrently could
interleave to produce findings=off + ai_triage=on, violating the
invariant the static graph is supposed to enforce (TOCTOU).
UX review — the only writable endpoint was PATCH. There was no way
to ask 'what would happen if I disabled X?' before committing.
Dry-run was implicit inside the PATCH error path.
DB review — GetTenantModuleConfig called getTenantDisabledModules
once; UpdateTenantModules wrote the upsert, then called
buildTenantModuleConfig which called getTenantDisabledModules AGAIN.
Duplicate query per toggle request.
Fix summary:
1. New structured *module.ToggleError carrying module_id, name,
action, blockers[], required[] as data. Handler serialises via
writeToggleErrorJSON — 400 with JSON body the UI can parse.
2. New POST /settings/modules/validate — dry-run, returns
ValidationIssues { blockers, warnings, required } WITHOUT
persisting. Wired to module.Service.ValidateToggle.
3. Soft warnings now surface in the PATCH success path via
TenantModuleConfigOutput.Warnings. UI toasts 'FYI: X will degrade'.
4. Per-tenant sync.Mutex map serialises concurrent
UpdateTenantModules on the same tenant. Different tenants don't
contend. Single-replica safe; multi-replica will need a
cross-process lock — documented inline.
5. buildTenantModuleConfigFromMaps accepts the pre-loaded disabled
map, skipping the duplicate getTenantDisabledModules query in
the update path.
6. Migration 000162 repair-orphan-tenant-modules: walks the hard-
edge transitive closure and disables every dependent of an
already-disabled module. Uses a recursive CTE so cascade depth
is not bounded. Skipped for core modules. Rerun-safe.
7. Tests — 5 new cases: blocker ToggleError shape, missing-required
ToggleError shape, soft warning surfaces in response, validate
dry-run does not mutate, core disable rejected before dep check.
Full regression green — GOWORK=off go build + go vet + go test
./... all pass.
- Add 9 curated presets (Minimal, Asset Inventory, VM Essentials, ASM,
Offensive Security, SBOM & Supply Chain, CSPM, Compliance, CTEM Full)
in pkg/domain/module/presets.go with target persona, key outcomes,
and module bundles. Apply via POST .../modules/presets/{id}/apply.
- Mandatory tier (agents, integrations, integrations.notifications,
notification_settings, groups, api_keys) auto-included in every
preset and rejected on disable — without `agents` no data ingestion
is possible, treating it as feature-optional was a DoS vector.
- Sub-module inheritance: enabling a parent (assets, ai_triage,
integrations) implicitly enables its sub-modules so presets don't
have to enumerate 24 assets.* types.
- Implicit sub-module → parent cascade in CanDisable so disabling
assets blocks while assets.* are still on.
- Dependency graph additions: branches→components(soft),
business_impact→findings(hard), workflows→findings(soft, was hard),
sla→findings, policies→findings, compliance→findings, iocs→threat_intel,
scan_profiles→scans.
- Module version counter (mod_ver:{tid}) in Redis powers ETag headers
on /me/modules and /settings/modules — 304 Not Modified when client
cache matches. Bumped via notifyModuleChange on every toggle/apply/reset.
- WS broadcaster fans out module.updated events on tenant:{id} channel
for cross-tab realtime invalidation.
- iocs added to ModulePermissionMapping (was referenced in dep graph
but missing from catalogue, breaking TestReferencedModulesExist).
- ETag includes tenant prefix so a misconfigured proxy/CDN can't serve
tenant A's response to tenant B.
User-supplied BaseURL on GitHub/GitLab/Bitbucket/Azure DevOps integrations was passed directly to a plain http.Client without validation. An attacker creating an integration with BaseURL=http://169.254.169.254/latest/meta-data could exfiltrate AWS IMDSv1 credentials via TestConnection(). Each client now: - Calls httpsec.ValidateURL(config.BaseURL) — rejects loopback, private IPs, link-local, cloud metadata endpoints. - Uses httpsec.SafeHTTPClient(timeout) whose dialer enforces the same blocklist (defense-in-depth in case future code paths bypass the URL validation). Audit finding: CRITICAL.
Findings from the multi-role security audit, low-risk patches batched together. Each is small and orthogonal to the others. - asset_repository.GetByProperty: parameterise the JSONB key (a.properties ->> $3) instead of string-concatenating the whitelisted key into the SQL. Whitelist still constrains the value, but removing the concat pattern entirely means a future "let users add custom correlation keys" feature can't accidentally regress to injectable. - webhook_hmac middleware: replay protection. Senders MUST now include X-OpenCTEM-Timestamp (unix seconds) and the HMAC is computed over "<timestamp>.<body>" (Stripe convention). Requests outside ±5 min are rejected with 401, defeating captured-signature replay attacks. Two new tests: stale timestamp + missing timestamp. - permission_sync middleware: fail-CLOSED on stale permissions for unsafe HTTP methods (POST/PUT/PATCH/DELETE) — return 409 instead of just setting a header that non-browser clients can ignore. Read traffic continues through the cache fetch which uses fresh perms. - crypto.IsNoOp helper + integration.CreateIntegration warning when persisting credentials with the NoOp encryptor active. Boot-time check (initEncryptor) already requires APP_ALLOW_PLAINTEXT_CREDENTIALS to permit NoOp; this adds visibility at each credential write so the plaintext storage can't be silent. - config.EncryptionConfig.AllowPlaintext (env: APP_ALLOW_PLAINTEXT_ CREDENTIALS) — explicit opt-in for non-prod plaintext credential storage. initEncryptor refuses to boot in any env if the key is unset and this flag is also unset (production already refused). - bootstrap_handler.GetTenantModules ETag: include tenant_id prefix so a misconfigured proxy can't serve tenant A's response to tenant B. - migration 000162 recursive CTE: add depth limit (50) to defend against a corrupted edge graph causing infinite recursion + DB connection pool exhaustion. - services.go: wire the per-tenant ModuleVersionService (Redis-backed counter for ETag generation and future Redis payload cache).
Invitation acceptance was creating the membership with only membership.Role (owner/admin/member/viewer) and silently dropping the RBAC role_ids attached to the invitation. The inviter chose specific scoped roles, the invitee never received them — a privilege-escalation INVERSE: legitimate roles never granted. Both acceptance paths now: - Iterate invitation.RoleIDs() after AcceptInvitationTx commits. - Call roleService.AssignRole for each. Best-effort: failure on one role logs but doesn't roll back — the user is already a member, missing roles can be re-granted manually. - Audit metadata records role_ids_count for traceability. TenantService gains SetRoleService (used by services.go bootstrap where the tenant service is constructed before the role service). Audit finding: CRITICAL.
CodeQL flagged hashAgentAPIKey as using SHA-256 directly on sensitive data. The API keys themselves are 32 random bytes from crypto/rand (256 bits of entropy — uncrackable even with plain SHA-256), so the warning is conservative, but switching to a peppered HMAC closes a real defense-in-depth gap: an attacker who exfiltrates the database but NOT the application config can no longer pre-compute candidate hashes for offline brute force. Changes: - Replace plain sha256.Sum256 with crypto.HashTokenPeppered which HMAC-SHA256s the token under a server-side pepper. Empty pepper collapses back to plain SHA-256 for compatibility. - AgentService.SetPepper wires the pepper from the platform encryption key (cfg.Encryption.Key) at boot. - AuthenticateByAPIKey: peppered hash lookup first, fall back to legacy plain SHA-256 lookup so rows written before the pepper was deployed continue to authenticate. Same pattern used by internal/app/apikey/service.go (F-9). - generateAgentAPIKey + hashAgentAPIKey become methods on the service so they have access to s.pepper. No backward-compat break: the legacy fallback runs only when the peppered lookup misses, and only when a pepper is actually configured.
…ain verify
Closes the primitive-adoption gaps identified by the 2026-04 security
audit. Each fix has: (1) code, (2) tests, (3) Prometheus metric where
applicable. A root-level security-lint CI gate prevents regression.
Outbound HTTP (audit C1/C2):
- Swap bare &http.Client{} for httpsec.SafeHTTPClient at 12 sites:
llm/{claude,gemini,openai}, notifier/{slack,teams,telegram,webhook},
jira/client, auth/{oauth,sso}, threat/{intel_service,intel_refresher},
keycloak/validator. Each site has a comment spelling out the SSRF
threat the dialer blocks.
CSRF (audit C7):
- Mount CSRFOptional in buildTokenTenantMiddlewares so every JWT-cookie
state-changing route validates the double-submit-cookie. API-key and
webhook routes are naturally excluded (different middleware chain).
- Emit openctem_security_csrf_rejections_total{reason,method} on every
reject branch in both CSRF + CSRFOptional.
- 24-case integration test (csrf_integration_test.go) covers strict +
optional modes, all 4 state-changing methods, token round-trip, and
the HttpOnly=false invariant.
Email verification (audit C8):
- auth/sso.go parseOktaUserInfo + parseGoogleUserInfo reject responses
with email_verified=false. EntraID path documents its Graph-API-
directory-verified trust boundary instead.
Startup sentinel (audit C11):
- config.go refuses to boot non-development environments when the JWT
secret or encryption key matches the docker-compose dev default.
7-case unit test asserts both the allow-list (dev accepts) and the
reject-list (prod + staging reject with a clear error).
AI-triage validator + budget (audit C10, RFC-008):
- aitriage/validation.go stops silently coercing invalid severity to
"medium". It records a ValidationWarning so the service can flag
the result for human review instead of auto-applying.
- TriageResult carries ValidationWarnings; NeedsReview() convenience
wrapper. logTriageNeedsReview + ActionAITriageNeedsReview surface
a distinct audit action.
- RFC-008 Phase 1 scaffold: ai_triage_budgets table (migration
000163), BudgetService (Check/Record/Status with strict + non-
strict repo-failure modes), Postgres repository with atomic UPDATE
RETURNING for multi-pod correctness, wiring in services.go behind
AI_TRIAGE_BUDGET_ENABLED feature flag (defaults off — zero
behaviour change on this deploy). 16-case unit test pins the
budget contract.
Audit-chain verify (audit, RFC-007 prelude):
- New controller audit_chain_verify walks every active tenant's
audit_log_chain hourly. Logs an ERROR-level entry with keyword
audit_chain_break and increments
openctem_security_audit_chain_breaks_total per break so a SIEM
can page without polling the admin verify endpoint. 7-case unit
test covers happy path, mid-tenant error, cancelled context,
and break-detected-but-still-processed.
Rate limit (audit P6-5):
- Wire TelemetryRateLimiter onto /api/v1/agent/telemetry-events.
Closes the "defined but not mounted" regression class.
Metrics package:
- Add openctem_security_* series in internal/metrics/security_defenses.go:
csrf_rejections_total, audit_chain_breaks_total,
audit_chain_verify_runs_total, ai_triage_needs_review_total,
ai_triage_budget_used_tokens, ai_triage_budget_exhausted_total.
- sso.go parseOktaUserInfo / parseGoogleUserInfo: start the error
string with a lowercase word (staticcheck ST1005). The previous
wording led with the brand name ("Okta provider ...", "Google
Workspace provider ...") which reads fine but trips ST1005's
"no leading capital" rule. Rephrased to start with the lowercase
brand + keep the audit trail clear.
- webhook_hmac.go: remove the unreferenced computeHMAC helper.
It was a left-over from the pre-timestamp signing path and is
no longer called by production code or tests. The timestamp
variant computeHMACWithTimestamp is the single source of truth.
Removing rather than nolint-suppressing because we'd rather not
carry unused code.
CodeQL flagged 2 critical email-content-injection alerts on this PR.
They pre-date our audit sprint (the PR just got large enough to cross
CodeQL's baseline-refresh threshold), but both are genuine so we fix
them here rather than defer.
pkg/email/email.go:
- New exported SanitizeHeaderValue(v): replaces CR and LF with a
single space. Applied to every value that ends up on an SMTP
header line in buildMessage: FromName, From, each recipient in
To, Subject, ReplyTo, plus custom Headers key + value. Without
this, an attacker-controlled Subject like
"Title\r\nBcc: attacker@evil.com"
smuggles an extra header past the SMTP boundary
(CWE-93 / CRLF injection, CodeQL go/email-content-injection).
- 5-case unit test pins the contract: CR, LF, CRLF, benign content
preserved, replacement-with-space semantics.
pkg/email/email.go LoggingSender:
- Also ran the 4 medium go/log-injection alerts on this file by
sanitising the "to" address and subject before passing them to
the structured logger. Keeps a malicious address like
"victim@example.com\nFAKE-LOG: admin login succeeded"
from forging log lines on loggers that don't JSON-escape values.
internal/infra/notifier/email.go:
- Same treatment for the notifier SMTP path: Title → subject (also
when prefixed with severity emoji), FromName, FromEmail, every
ToEmails entry, ReplyTo. Imports emailpkg for the sanitiser
instead of duplicating the 4 lines locally.
internal/infra/postgres/ai_triage_budget_repository.go:
- Defensive cleanup (not CodeQL-flagged): UpdateLastWarnSent and
UpdateLastBlockSent now carry their own static UPDATE strings
rather than sharing a column-name-interpolated Sprintf helper.
Two literal SQL strings, zero runtime string formatting in SQL
land, no tainted-flow surface. The former helper was safe via a
whitelist map, but a future refactor that drops the whitelist
would regress silently; explicit SQL avoids that class entirely.
… doc dismissals Follow-up to fix(security): email-header CRLF injection (4ceb3de). Closes the remaining CodeQL alerts that landed on this PR: SQL-query-from-user-input (3 high): - apikey_repository.go ListAPIKeys: replace map-allowlist + concatenation with explicit switch/case that assigns LITERAL column-names on every branch. Map-based allowlists are safe but CodeQL's taint tracker can't trace through them. - template_source_repository.go List: same switch/case refactor for sortBy. Also binds LIMIT/OFFSET via $N parameters instead of fmt.Sprintf %d — kind-checked by Go's typechecker already, but the binding keeps CodeQL's flow-analysis fully clean. - asset_repository.go countByFields: parameterise the JSONB key in the UNION ALL clauses. The regex whitelist is retained as an early reject but no longer feeds into the query string directly. Slice-allocation with user-influenced size (2 high): - finding_approval_repository.go: const maxApprovalsCap = 100; clamp page.Limit() into a separate `capacity` var before make(). - tenant_repository.go SearchMembersWithUserInfo: const maxMembersCap = 1000; same pattern. Defence-in-depth cap even though the HTTP handler already paginates. Log-entry-from-user-input (1 medium): - ingest/service.go Ingest: wrap report.Metadata.ID + SourceType through a new sanitizeIngestLogField (log_sanitize.go) that strips CR/LF and caps length at 512 chars. A compromised agent can no longer forge log lines via CTIS metadata. Useless-assignment (8 warnings): - Remove or annotate the trailing `argIndex++` in 8 repository query-builders (rule_source, rule, rule_bundle, finding, audit, agent, admin×2). Each was the last condition in its chain, so the post-increment had no reader. Wrapper-level false-positives (4 dismissals, documented in code): - pkg/crypto/hash.go HashToken + HashTokenBytes: full doc-comment explaining why SHA-256 is correct for 256-bit random tokens (not passwords), paired with `#nosec G401` and a pointer to dismiss the CodeQL go/weak-cryptographic-algorithm alert as a false positive. - middleware/{admin_audit,metrics,timeout}.go Write wrappers: add doc-comments explaining each is a transparent passthrough; request-layer escaping is the handler's job, not the wrapper's. Dismiss CodeQL go/reflected-xss on these lines. Tests continue to pass (ingest, postgres, middleware, crypto, email).
| @@ -135,6 +135,10 @@ | |||
| mrw.ResponseWriter.WriteHeader(code) | |||
| } | |||
|
|
|||
| // Write implements http.ResponseWriter. Same CodeQL | |||
| // go/reflected-xss false-positive note as auditResponseWriter.Write: | |||
Splits the SSRF blocklist into two tiers so on-prem CTEM deployments
can scan their own RFC1918 / ULA space without losing the cloud-
metadata / loopback / CGNAT guard.
Design:
hardBlockedIPRanges — ALWAYS blocked, no env-var opens these:
127.0.0.0/8 loopback on the API host
169.254.0.0/16 link-local incl. AWS/GCP/Azure IMDS
100.64.0.0/10 carrier-grade NAT
0.0.0.0/8 "this" network
224.0.0.0/4 multicast
240.0.0.0/4 reserved
255.255.255.255/32 broadcast
::1/128 IPv6 loopback
fe80::/10 IPv6 link-local
privateIPRanges — blocked BY DEFAULT, opt-in via
OPENCTEM_HTTPSEC_ALLOW_PRIVATE=1 :
10.0.0.0/8 RFC1918 class A
172.16.0.0/12 RFC1918 class B
192.168.0.0/16 RFC1918 class C
fc00::/7 IPv6 ULA
Critical invariant (tested): even with OPENCTEM_HTTPSEC_ALLOW_PRIVATE=1
an attacker cannot reach IMDS or loopback. A compromised admin who
flips the env var still cannot weaponise the dialer to leak cloud
credentials.
Why env var, not config file: ops set this per-deployment (Helm
values / Docker env) — a cloud-hosted API pod keeps the cloud-safe
default; an internal API pod in a corporate DC opens up. Per-tenant
policy can be layered later if needed.
AllowPrivate() exposed so main.go can log the posture at boot.
Test coverage: new cases pin (a) RFC1918 reachable with opt-in,
(b) IMDS / loopback / CGNAT stay refused with opt-in, (c) alias
hostnames (localhost, metadata.google.internal) stay refused in both
modes, (d) default mode still blocks RFC1918.
Same change lands simultaneously in sdk-go/pkg/httpsec (for SDK
consumers) and agent/internal/executor/target_security.go (for
scanner target validation). The three copies stay drift-free via
scripts/security-lint.sh Rule 6.
…ed flow CodeQL's go/unsafe-slice-allocation re-flagged these after the clamp-with-separate-var fix: the tool keeps the "user-influenced" taint on any value that originated from an HTTP parameter, even when the value has been bounds-checked afterwards. The only path to a clean flow is to hand make() the literal constant directly. finding_approval_repository.go ListPending: approvals := make([]T, 0, maxApprovalsCap) // const 100 tenant_repository.go SearchMembersWithUserInfo: members := make([]T, 0, maxMembersCap) // const 1000 Memory cost: 100 × 8 B + 1000 × 8 B = ~8.8 KB per request when the page returns zero rows. Acceptable; any growth past the const is still handled by Go's normal slice-grow path. Semantics unchanged — make's capacity is an allocation hint, not a length. Queries returning fewer rows still produce the right result.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.