Skip to content

Develop: v0.2.0#54

Merged
0xmanhnv merged 121 commits into
mainfrom
develop
Apr 23, 2026
Merged

Develop: v0.2.0#54
0xmanhnv merged 121 commits into
mainfrom
develop

Conversation

@0xmanhnv
Copy link
Copy Markdown
Collaborator

No description provided.

0xmanhnv added 30 commits April 15, 2026 10:13
BREAKING: API no longer depends on github.com/openctemio/sdk-go.
Instead imports github.com/openctemio/ctis (4K lines, zero deps).

Changes:
- Replace 20 files: sdk-go/pkg/ctis → ctis module
- Replace 2 files: sdk-go/pkg/shared/severity → ctis/severity
- Replace 1 file: sdk-go/pkg/shared/fingerprint → ctis/fingerprint
- Copy adapters (SARIF, Trivy, Nuclei, etc.) into internal/infra/adapters/
- Inline ChunkData struct (1 struct, was only usage of chunk package)
- Remove github.com/openctemio/sdk-go from go.mod entirely

Result:
- API depends on ctis (4K lines, stdlib only) instead of SDK-Go (50K lines, 40 deps)
- SDK-Go updates no longer trigger API rebuilds
- Agent: ZERO changes (still imports SDK-Go normally)
- CLAUDE.md: add Recent Changes section (RFC-001, RFC-002, metadata removal, sub_type, migration count 141)
- asset-ip-hostname-correlation.md: mark as SUPERSEDED by RFC-001
- asset-schema.md: metadata → properties
- asset-properties-schema.md: metadata → properties (single column)
- Remove duplicate RFC-002-remove-sdk-dependency.md
- Add status to RFC-001 appendix
- Add tenant_id verification to ApproveAndMerge and RejectReview
- Add tenant_id to all UPDATE statements in merge (findings, relationships, etc.)
- Handler passes tenantID from JWT context to repository
- Align approve/reject permissions (both require AssetsDelete)
- Promote sub_type from properties/TypeAliases during asset merge
- Remove unused queryStringPtr helper
… precedence

- 137.down: drop all 3 finding triggers (insert/update/delete), not just 1
- 141.up: add parentheses to fix AND/OR precedence in sub_type backfill
- RFC-003: Monorepo consolidation (api + ui + agent)
- RFC-004: Priority classes P0-P3 + EPSS/KEV enrichment
- RFC-005: CTEM maturity gaps (cycles, metrics, controls, escalation)
- EPSS CSV URL changed: epss.cyentia.com → epss.empiricalsecurity.com
- Add User-Agent header to both EPSS and KEV requests (CISA blocks bare requests)
- Migration 000142: enrichment columns on findings, priority_override_rules,
  priority_class_audit_log, SLA p0-p3 days
- Domain: priority.go (ClassifyPriority deterministic rules, P0/P1/P2/P3)
- Domain: priority_rule.go (override rules with condition evaluation)
- Finding entity: enrichment fields (epss, kev, priority, reachability)
- Repository: 11 new columns in SELECT/INSERT/UPSERT/UPDATE/scan
- Service: PriorityClassificationService (classify, batch, enrich)
- Handler: FindingResponse includes enrichment + priority fields
- 4 default override rules seeded per tenant
- GET /api/v1/dashboard/data-quality endpoint
- Single CTE query: asset ownership %, finding evidence %, median
  last-seen age, dedup rate
- DataQualityScorecard type + service + handler + route
- Mock updated for test compatibility
- Background controller runs every 15 min
- Marks overdue findings as sla_status='breached'
- Marks approaching-deadline findings as 'warning' (within 3 days)
- Migration 000143: partial index for efficient overdue queries
- Migration 000144: finding_verification_checklists table
- Domain entity: IsComplete(), IncompleteItems(), UpdateItem()
- Required items: exposure_cleared, evidence_attached, register_updated
- Optional items: monitoring_added, regression_scheduled (NULL = N/A)
- Gates finding verified/closed transition
- Migration 000145: risk_snapshots table (daily per-tenant)
- RiskSnapshotController: runs every 6h, computes risk/findings/SLA/MTTR/priority
- GET /api/v1/dashboard/risk-trend?days=90 endpoint
- RiskTrendPoint type with risk_score, findings_open, SLA compliance, P0-P3 counts
- Migration 000146: compensating_controls, control_assets, control_findings
- Domain entity: IsEffective(), RecordTest(), reduction_factor 0.0-1.0
- Control types: segmentation, identity, runtime, detection, other
- Auto-deactivate on failed test
- Migration 000147: attacker_profiles table with 4 types
- Domain entity: capabilities JSONB, assumptions, is_default
- Seed 4 default profiles per tenant: external_unauth, external_stolen_creds,
  malicious_insider, supplier_compromise
- Migration 000148: ctem_cycles, scope_snapshots, cycle_metrics,
  cycle_attacker_profiles tables
- Domain entity: state machine (planning → active → review → closed)
- Charter JSONB (business_priorities, risk_appetite, objectives)
- Scope snapshot at activation, metrics at close
- Links to attacker profiles for threat model
- Architecture doc: 5-stage CTEM workflow, feature map, database schema overview,
  background controllers, outcome metrics explanation
- RFC-005 marked completed (all 7 gaps implemented)
…y, missing index

- SLA escalation: add per-tenant logging via RETURNING tenant_id
- Risk snapshot: replace 12 correlated subqueries with 4 CTEs + LEFT JOIN
  (O(1) table scans regardless of tenant count)
- Migration 000149: add tenant_id index on priority_class_audit_log
…ager

- Register SLAEscalationController (15 min interval)
- Register RiskSnapshotController (6 hour interval)
- Both controllers now start automatically with the application
- Create EPSS/KEV adapters wrapping existing threat intel repos
- Create PriorityRuleRepository (list active, create, delete)
- Create PriorityAuditRepository (log changes)
- Register repos in Repositories struct
- Register PriorityClassificationService in Services
- Interface compliance verified at compile time
…iles, CTEM cycles

- CompensatingControlHandler: List, Get, Create, Update, Delete, RecordTest, LinkAssets/Findings
  → /api/v1/compensating-controls/*
- AttackerProfileHandler: List, Get, Create, Update, Delete
  → /api/v1/attacker-profiles/*
- CTEMCycleHandler: List, Get, Create, Update, Activate, StartReview, Close, GetScope, LinkProfile
  → /api/v1/ctem-cycles/*
- All handlers use direct SQL with tenant_id isolation
- Routes registered with permission middleware
…lers

- Add PriorityClassifier interface to FindingProcessor
- Auto-enrich + classify findings on ingest (Step 4c in ProcessBatch)
- Fetch assets for classification context (batch)
- Persist enriched fields after classification
- Wire PriorityClassificationService into ingest via SetPriorityClassifier
- Instantiate CompensatingControl, AttackerProfile, CTEMCycle handlers
- All routes now fully functional end-to-end
- CompensatingControl LinkAssets: validate asset belongs to tenant before INSERT
- CompensatingControl LinkFindings: validate finding belongs to tenant before INSERT
- CTEMCycle LinkProfile: validate profile belongs to tenant before INSERT
- Ingest enrichment: clarify dedup map comment (1-5 unique assets per batch)
0xmanhnv added 28 commits April 21, 2026 02:28
compliance_service + simulation_service → internal/app/compliance/.
pentest_service stays in package app because it depends on
NotificationService + FindingActivityService (not yet extracted).
parseOptionalDate helper duplicated into compliance/ so package is
self-contained. pkg/domain/compliance aliased to compliancedom.

Build + test pass.
3 files (module_service, dashboard_service, report_schedule_service)
→ internal/app/module/. pkg/domain/module aliased to moduledom,
pkg/domain/reportschedule aliased to reportscheduledom. Shim re-exports
3 services + 26 value types.

Build + test pass.
- activity/: FindingActivityService (self-contained, no cross-svc deps)
- aitriage/: AITriageService + TriageOutputValidator + PromptSanitizer
  (imports newly-extracted audit + activity packages; pkg/domain/aitriage
  aliased to aitriagedom)

Shims at finding_activity_service.go + ai_triage_service.go re-export
all public symbols. Pentest, vulnerability_service, workflow_action_handlers
(which reference aiTriageService + activityService) resolve via shim
transparently — zero caller change.

Build + test pass.
3 files (scan_scheduler, scansession_service, scanprofile_service)
moved into existing internal/app/scan/. scheduler.go's self-import
(scansvc alias pointing to same package) removed — was a relic from
the earlier monolithic service.go split. ScansScheduled metric
reference repointed to internal/metrics directly to avoid pulling
in package app.

scanner_template_service stays in package app because template/
already imports scan/ (for TemplateSyncer interface) — moving
template_service into scan/ would create a cycle. Tracked for
follow-up after the template→scan adapter is refactored.

Build + test pass.
6 files → internal/app/integration/:
- integration_service (4 SCM/notification/OAuth orchestration)
- notification_service
- webhook_service
- attachment_service
- credential_import_service
- secretstore_service

pkg/domain/{integration,notification,webhook,attachment} aliased to
*dom to disambiguate. secretstore's AuditService refs repointed to
auditapp package. Shim re-exports 52 symbols (6 services + 46 input/
output types) — all existing app.XService callers compile unchanged.

Build + test pass.
10 files → internal/app/asset/:
- asset_service, asset_group_service, asset_import_service,
  asset_relationship_service, asset_type_service, branch_service,
  business_unit_service, component_service, sbom_import_service,
  relationship_suggestion_service

pkg/domain/{asset,assetgroup,assettype,branch,businessunit,component,
relationship} each aliased to *dom. Shim re-exports 11 services +
39 value types + 3 batch-status constants.

Build + test pass.
13 files → internal/app/finding/:
- vulnerability_service, finding_actions_service, finding_comment_service,
  finding_import_service, finding_lifecycle_scheduler, finding_source_service,
  finding_source_cache_service, priority_classification_service,
  priority_flood_guard + test, priority_change_publisher_test,
  bulk_action_guard + test

Cross-package deps repointed: activity.FindingActivityService +
activity.RecordActivityInput, integration.NotificationService,
aitriage.AITriageService. FindingsExpired metric reference switched to
internal/metrics package directly.

Shim re-exports 14 services + 43 value types + 4 sentinel errors.

Build + test pass.
5 files → internal/app/workflow/:
- workflow_service, workflow_action_handlers, workflow_event_dispatcher,
  workflow_executor, workflow_handlers

Cross-package deps repointed: finding.VulnerabilityService +
finding.FindingSourceCacheService + finding.UpdateFindingStatusInput,
integration.IntegrationService + NotificationService +
SendNotificationInput, aitriage.AITriageService + TriageRequest,
auditapp.AuditService/Context/Event + Success/Failure constructors.

Shim re-exports 43 symbols (5 services + option setters + input types).

Build + test pass.
pentest_service had been left in package app during the initial
compliance extract because it depended on NotificationService and
FindingActivityService which were not yet extracted. Both are now
in internal/app/integration + internal/app/activity, so pentest
can join its sibling compliance + simulation services in
internal/app/compliance/.

parseOptionalDate helper unified (was previously duplicated in
compliance/service.go during the split); NewSuccessEvent /
NewDeniedEvent / AuditService refs repointed to auditapp.

Shim re-exports PentestService + 12 pentest input types +
WithCachedCampaignRole context helper.

Build + test pass.
8 files → internal/app/accesscontrol/:
- permission_service, permission_cache_service, permission_version_service
- role_service, group_service, group_sync_service (+test)
- membership_cache_service, rule_service

pkg/domain/{accesscontrol,permission,role,group,rule,permissionset} aliased.
Audit + notification deps repointed to extracted packages. Field-name
collision 'AuditContext AuditContext' pattern preserved via regex fix.

Shim re-exports 11 services + 37 input types + 11 option setters.

Build + test pass.
auth/ (8 files):
- auth_service, sso_service, oauth_service, session_service,
  email_service, ws_ticket_service, tenant_smtp_resolver,
  tenant_storage_resolver
- pkg/domain/{session,identityprovider,tenant,user} + pkg/email
  aliased; accesscontrol.Role/PermissionCache/Version deps repointed.

tenant/ (3 files):
- tenant_service, user_service, tenant_membership_adapter
- Depends on extracted auth.SessionService (authapp alias) and
  accesscontrol packages.

Shims re-export 23 auth services+errors + 18 tenant services/inputs.

Build + test pass. All 14 planned clusters now extracted except the
cross-cutting leftovers (adapters.go, metrics.go, security_validator,
scoring_config_provider, validation_coverage) which stay in package
app as thin bridges/helpers.
…ility + rename shims

CI was failing because commits 6b79f84 (accesscontrol) and 516bd0b
(auth + tenant) pushed source where the inner files still had bare
references to AuditService / AuditContext / NewSuccessEvent /
PermissionCacheService / SessionService etc. — types that only exist
in package app. The in-cluster files needed those refs repointed to
auditapp.X / accesscontrol.X / authapp.X but the corrections lived in
my working tree and never made it into the commit.

This commit carries:

- auth/tenant/accesscontrol in-cluster files: all bare audit +
  accesscontrol + session refs now qualified with their sibling
  package import (auditapp, accesscontrol, authapp). Field-name
  collision 'AuditContext *AuditContext' preserved.

- capability/: 13th subpackage extracted. capability_service (536
  lines) → internal/app/capability/service.go. Shim at
  internal/app/capability_service.go re-exports 7 symbols.

- ws_ticket_service_test moved into auth/ where its impl lives.

- 5 shim files renamed for consistency with target cluster name:
    permission_service.go      → accesscontrol_service.go
    vulnerability_service.go   → finding_service.go
    finding_activity_service.go → activity_service.go
    ai_triage_service.go       → aitriage_service.go
    scan_scheduler.go          → scan_service.go
  (git detects as rename so history follows via git log --follow.)

Build + vet + full test suite pass with GOWORK=off.
…lusters

45 files renamed inside internal/app/<cluster>/ to drop the redundant
_service suffix (folder name already says it's a service package).
Follows Effective Go convention — file name describes WHAT (concept),
not the layer. Example: auth/auth_service.go → auth/service.go,
accesscontrol/permission_service.go → accesscontrol/permission.go.

Struct names keep the *Service suffix (AuthService, AssetService) —
only the filenames change.

Also moved 2 cross-cutting helpers out of package app into their
rightful clusters:
  - scoring_config_provider.go → asset/ (it implements
    asset.ScoringConfigProvider, belongs with asset domain)
  - validation_coverage.go (+ 2 tests) → validation/ (CTEM coverage
    SLO checker — belongs with the validation domain that already
    owns executor/evidence/proof-of-fix)

contains() helper in coverage_enforce_test.go renamed to
containsStr to avoid collision with existing
proof_of_fix.go:contains([]ExecutorKind, ExecutorKind).

Shim aliases preserved in root so every external app.X caller still
compiles. New shim for ValidationCoverage + CoverageThresholds +
Enforce + DefaultThresholds + ErrCoverageBelowSLO. asset_service.go
shim gained TenantScoringConfigProvider + MapTenantToAssetScoringConfig.

CLAUDE.md updated with two new conventions:
  8) File naming inside internal/app/<cluster>/ (drop _service suffix)
  9) Layer-mirrored package naming (pkg/domain/<X> ↔ internal/app/<X>
     with 'dom' alias when both imported in one file)

Build + vet + full test suite pass.
Problem 1 — missing from catalogue.
The sidebar had ~20 shipped features (attack_surface, scope_config,
ctem_cycles, attacker_profiles, compensating_controls, workflows,
ctem_maturity, priority_rules, business_services, business_impact,
risk_analysis, risk_scoring, executive_summary, mitre_coverage,
sbom_export, scanner_templates, template_sources, scan_pipelines,
remediation_tasks, relationships) that were never registered in the
modules table. Tenant admins could not enable/disable them because
the Settings > Modules screen iterates the modules rows.

Problem 2 — no dependency graph.
'ai_triage requires findings' lived only in an engineer's head. A
tenant could disable findings with ai_triage still on and end up
with a triage UI that had nothing to triage.

Fix 1 — migration 000161_modules_catalog_sync:
  back-fills the 20 missing rows. ON CONFLICT DO NOTHING so operators
  who already customised a name/description aren't overwritten.

Fix 2 — pkg/domain/module/dependency.go:
  platform-wide static graph, same pattern as the existing
  ModulePermissionMapping + RolePermissions — spec-not-data, lives in
  code so CI catches typos and cycles. Two edge kinds:
    DependencyHard — tenant CANNOT disable target while dependent
                     remains enabled (blocker).
    DependencySoft — tenant CAN disable, UI surfaces a warning
                     because dependent will degrade.
  Plus helpers CanDisable, RequiredToEnable, TransitiveDependencies,
  DetectCycle (CI acyclicity gate).

Fix 3 — ModuleService.UpdateTenantModules:
  builds the post-update enabled-set BEFORE applying, runs CanDisable
  / RequiredToEnable against it, rejects 400 with a human-readable
  list of dependent names. Deps that reference modules absent from
  the registry are skipped (cross-edition tolerance).

Fix 4 — GET /api/v1/tenants/{tenant}/settings/modules/graph:
  returns the static dependency graph as flat edge list for UI.

Fix 5 — tests pkg/domain/module/dependency_test.go:
  10 cases including DetectCycle (CI gate), ReferencedModulesExist
  (catches typo'd ids), hard-vs-soft blocker semantics, transitive
  walk that skips soft edges.

Fix 6 — constants in pkg/domain/module/module.go extended with the
  20 new IDs so typed consts are available everywhere.

Build + vet + full test suite pass.
53-file gofmt sweep after the cluster refactor (commits f56f34159bbffa). Pure whitespace normalisation — column alignment inside
the alias shims + goimports ordering inside service.go files.

Caught by 'gofmt -l internal/app/ pkg/domain/module/ pkg/httpsec/
pkg/jwt/ pkg/crypto/' during the full-tree regression audit. The
older files-in-package-app followed stdlib alignment (tabs); after
split the aliased fields got sub-tab alignment which gofmt prefers
— converged here.

No behaviour change. Build + vet + full test suite pass.
…k, data repair

PM/TechLead review — the original module-toggle flow returned errors
as plain strings, so the UI could not render a dependency-aware
confirmation dialog. Admins saw a red banner with a comma-separated
list of names and had no handle to cascade-disable. Soft warnings
were silently dropped by the service — a tenant disabling threat_intel
while priority_rules stayed on had no idea priority_rules would
degrade.

Security review — validate+upsert was two separate DB calls with no
lock between. Two admins flipping related modules concurrently could
interleave to produce findings=off + ai_triage=on, violating the
invariant the static graph is supposed to enforce (TOCTOU).

UX review — the only writable endpoint was PATCH. There was no way
to ask 'what would happen if I disabled X?' before committing.
Dry-run was implicit inside the PATCH error path.

DB review — GetTenantModuleConfig called getTenantDisabledModules
once; UpdateTenantModules wrote the upsert, then called
buildTenantModuleConfig which called getTenantDisabledModules AGAIN.
Duplicate query per toggle request.

Fix summary:

1. New structured *module.ToggleError carrying module_id, name,
   action, blockers[], required[] as data. Handler serialises via
   writeToggleErrorJSON — 400 with JSON body the UI can parse.

2. New POST /settings/modules/validate — dry-run, returns
   ValidationIssues { blockers, warnings, required } WITHOUT
   persisting. Wired to module.Service.ValidateToggle.

3. Soft warnings now surface in the PATCH success path via
   TenantModuleConfigOutput.Warnings. UI toasts 'FYI: X will degrade'.

4. Per-tenant sync.Mutex map serialises concurrent
   UpdateTenantModules on the same tenant. Different tenants don't
   contend. Single-replica safe; multi-replica will need a
   cross-process lock — documented inline.

5. buildTenantModuleConfigFromMaps accepts the pre-loaded disabled
   map, skipping the duplicate getTenantDisabledModules query in
   the update path.

6. Migration 000162 repair-orphan-tenant-modules: walks the hard-
   edge transitive closure and disables every dependent of an
   already-disabled module. Uses a recursive CTE so cascade depth
   is not bounded. Skipped for core modules. Rerun-safe.

7. Tests — 5 new cases: blocker ToggleError shape, missing-required
   ToggleError shape, soft warning surfaces in response, validate
   dry-run does not mutate, core disable rejected before dep check.

Full regression green — GOWORK=off go build + go vet + go test
./... all pass.
- Add 9 curated presets (Minimal, Asset Inventory, VM Essentials, ASM,
  Offensive Security, SBOM & Supply Chain, CSPM, Compliance, CTEM Full)
  in pkg/domain/module/presets.go with target persona, key outcomes,
  and module bundles. Apply via POST .../modules/presets/{id}/apply.
- Mandatory tier (agents, integrations, integrations.notifications,
  notification_settings, groups, api_keys) auto-included in every
  preset and rejected on disable — without `agents` no data ingestion
  is possible, treating it as feature-optional was a DoS vector.
- Sub-module inheritance: enabling a parent (assets, ai_triage,
  integrations) implicitly enables its sub-modules so presets don't
  have to enumerate 24 assets.* types.
- Implicit sub-module → parent cascade in CanDisable so disabling
  assets blocks while assets.* are still on.
- Dependency graph additions: branches→components(soft),
  business_impact→findings(hard), workflows→findings(soft, was hard),
  sla→findings, policies→findings, compliance→findings, iocs→threat_intel,
  scan_profiles→scans.
- Module version counter (mod_ver:{tid}) in Redis powers ETag headers
  on /me/modules and /settings/modules — 304 Not Modified when client
  cache matches. Bumped via notifyModuleChange on every toggle/apply/reset.
- WS broadcaster fans out module.updated events on tenant:{id} channel
  for cross-tab realtime invalidation.
- iocs added to ModulePermissionMapping (was referenced in dep graph
  but missing from catalogue, breaking TestReferencedModulesExist).
- ETag includes tenant prefix so a misconfigured proxy/CDN can't serve
  tenant A's response to tenant B.
User-supplied BaseURL on GitHub/GitLab/Bitbucket/Azure DevOps
integrations was passed directly to a plain http.Client without
validation. An attacker creating an integration with
BaseURL=http://169.254.169.254/latest/meta-data could exfiltrate
AWS IMDSv1 credentials via TestConnection().

Each client now:
- Calls httpsec.ValidateURL(config.BaseURL) — rejects loopback,
  private IPs, link-local, cloud metadata endpoints.
- Uses httpsec.SafeHTTPClient(timeout) whose dialer enforces the
  same blocklist (defense-in-depth in case future code paths bypass
  the URL validation).

Audit finding: CRITICAL.
Findings from the multi-role security audit, low-risk patches batched
together. Each is small and orthogonal to the others.

- asset_repository.GetByProperty: parameterise the JSONB key
  (a.properties ->> $3) instead of string-concatenating the whitelisted
  key into the SQL. Whitelist still constrains the value, but removing
  the concat pattern entirely means a future "let users add custom
  correlation keys" feature can't accidentally regress to injectable.

- webhook_hmac middleware: replay protection. Senders MUST now include
  X-OpenCTEM-Timestamp (unix seconds) and the HMAC is computed over
  "<timestamp>.<body>" (Stripe convention). Requests outside ±5 min are
  rejected with 401, defeating captured-signature replay attacks. Two
  new tests: stale timestamp + missing timestamp.

- permission_sync middleware: fail-CLOSED on stale permissions for
  unsafe HTTP methods (POST/PUT/PATCH/DELETE) — return 409 instead of
  just setting a header that non-browser clients can ignore. Read
  traffic continues through the cache fetch which uses fresh perms.

- crypto.IsNoOp helper + integration.CreateIntegration warning when
  persisting credentials with the NoOp encryptor active. Boot-time
  check (initEncryptor) already requires APP_ALLOW_PLAINTEXT_CREDENTIALS
  to permit NoOp; this adds visibility at each credential write so the
  plaintext storage can't be silent.

- config.EncryptionConfig.AllowPlaintext (env: APP_ALLOW_PLAINTEXT_
  CREDENTIALS) — explicit opt-in for non-prod plaintext credential
  storage. initEncryptor refuses to boot in any env if the key is
  unset and this flag is also unset (production already refused).

- bootstrap_handler.GetTenantModules ETag: include tenant_id prefix so
  a misconfigured proxy can't serve tenant A's response to tenant B.

- migration 000162 recursive CTE: add depth limit (50) to defend
  against a corrupted edge graph causing infinite recursion + DB
  connection pool exhaustion.

- services.go: wire the per-tenant ModuleVersionService (Redis-backed
  counter for ETag generation and future Redis payload cache).
Invitation acceptance was creating the membership with only
membership.Role (owner/admin/member/viewer) and silently dropping
the RBAC role_ids attached to the invitation. The inviter chose
specific scoped roles, the invitee never received them — a
privilege-escalation INVERSE: legitimate roles never granted.

Both acceptance paths now:
- Iterate invitation.RoleIDs() after AcceptInvitationTx commits.
- Call roleService.AssignRole for each. Best-effort: failure on
  one role logs but doesn't roll back — the user is already a
  member, missing roles can be re-granted manually.
- Audit metadata records role_ids_count for traceability.

TenantService gains SetRoleService (used by services.go bootstrap
where the tenant service is constructed before the role service).

Audit finding: CRITICAL.
CodeQL flagged hashAgentAPIKey as using SHA-256 directly on
sensitive data. The API keys themselves are 32 random bytes from
crypto/rand (256 bits of entropy — uncrackable even with plain
SHA-256), so the warning is conservative, but switching to a
peppered HMAC closes a real defense-in-depth gap: an attacker who
exfiltrates the database but NOT the application config can no
longer pre-compute candidate hashes for offline brute force.

Changes:
- Replace plain sha256.Sum256 with crypto.HashTokenPeppered which
  HMAC-SHA256s the token under a server-side pepper. Empty pepper
  collapses back to plain SHA-256 for compatibility.
- AgentService.SetPepper wires the pepper from the platform
  encryption key (cfg.Encryption.Key) at boot.
- AuthenticateByAPIKey: peppered hash lookup first, fall back to
  legacy plain SHA-256 lookup so rows written before the pepper
  was deployed continue to authenticate. Same pattern used by
  internal/app/apikey/service.go (F-9).
- generateAgentAPIKey + hashAgentAPIKey become methods on the
  service so they have access to s.pepper.

No backward-compat break: the legacy fallback runs only when the
peppered lookup misses, and only when a pepper is actually
configured.
…ain verify

Closes the primitive-adoption gaps identified by the 2026-04 security
audit. Each fix has: (1) code, (2) tests, (3) Prometheus metric where
applicable. A root-level security-lint CI gate prevents regression.

Outbound HTTP (audit C1/C2):
- Swap bare &http.Client{} for httpsec.SafeHTTPClient at 12 sites:
  llm/{claude,gemini,openai}, notifier/{slack,teams,telegram,webhook},
  jira/client, auth/{oauth,sso}, threat/{intel_service,intel_refresher},
  keycloak/validator. Each site has a comment spelling out the SSRF
  threat the dialer blocks.

CSRF (audit C7):
- Mount CSRFOptional in buildTokenTenantMiddlewares so every JWT-cookie
  state-changing route validates the double-submit-cookie. API-key and
  webhook routes are naturally excluded (different middleware chain).
- Emit openctem_security_csrf_rejections_total{reason,method} on every
  reject branch in both CSRF + CSRFOptional.
- 24-case integration test (csrf_integration_test.go) covers strict +
  optional modes, all 4 state-changing methods, token round-trip, and
  the HttpOnly=false invariant.

Email verification (audit C8):
- auth/sso.go parseOktaUserInfo + parseGoogleUserInfo reject responses
  with email_verified=false. EntraID path documents its Graph-API-
  directory-verified trust boundary instead.

Startup sentinel (audit C11):
- config.go refuses to boot non-development environments when the JWT
  secret or encryption key matches the docker-compose dev default.
  7-case unit test asserts both the allow-list (dev accepts) and the
  reject-list (prod + staging reject with a clear error).

AI-triage validator + budget (audit C10, RFC-008):
- aitriage/validation.go stops silently coercing invalid severity to
  "medium". It records a ValidationWarning so the service can flag
  the result for human review instead of auto-applying.
- TriageResult carries ValidationWarnings; NeedsReview() convenience
  wrapper. logTriageNeedsReview + ActionAITriageNeedsReview surface
  a distinct audit action.
- RFC-008 Phase 1 scaffold: ai_triage_budgets table (migration
  000163), BudgetService (Check/Record/Status with strict + non-
  strict repo-failure modes), Postgres repository with atomic UPDATE
  RETURNING for multi-pod correctness, wiring in services.go behind
  AI_TRIAGE_BUDGET_ENABLED feature flag (defaults off — zero
  behaviour change on this deploy). 16-case unit test pins the
  budget contract.

Audit-chain verify (audit, RFC-007 prelude):
- New controller audit_chain_verify walks every active tenant's
  audit_log_chain hourly. Logs an ERROR-level entry with keyword
  audit_chain_break and increments
  openctem_security_audit_chain_breaks_total per break so a SIEM
  can page without polling the admin verify endpoint. 7-case unit
  test covers happy path, mid-tenant error, cancelled context,
  and break-detected-but-still-processed.

Rate limit (audit P6-5):
- Wire TelemetryRateLimiter onto /api/v1/agent/telemetry-events.
  Closes the "defined but not mounted" regression class.

Metrics package:
- Add openctem_security_* series in internal/metrics/security_defenses.go:
  csrf_rejections_total, audit_chain_breaks_total,
  audit_chain_verify_runs_total, ai_triage_needs_review_total,
  ai_triage_budget_used_tokens, ai_triage_budget_exhausted_total.
- sso.go parseOktaUserInfo / parseGoogleUserInfo: start the error
  string with a lowercase word (staticcheck ST1005). The previous
  wording led with the brand name ("Okta provider ...", "Google
  Workspace provider ...") which reads fine but trips ST1005's
  "no leading capital" rule. Rephrased to start with the lowercase
  brand + keep the audit trail clear.

- webhook_hmac.go: remove the unreferenced computeHMAC helper.
  It was a left-over from the pre-timestamp signing path and is
  no longer called by production code or tests. The timestamp
  variant computeHMACWithTimestamp is the single source of truth.
  Removing rather than nolint-suppressing because we'd rather not
  carry unused code.
CodeQL flagged 2 critical email-content-injection alerts on this PR.
They pre-date our audit sprint (the PR just got large enough to cross
CodeQL's baseline-refresh threshold), but both are genuine so we fix
them here rather than defer.

pkg/email/email.go:
- New exported SanitizeHeaderValue(v): replaces CR and LF with a
  single space. Applied to every value that ends up on an SMTP
  header line in buildMessage: FromName, From, each recipient in
  To, Subject, ReplyTo, plus custom Headers key + value. Without
  this, an attacker-controlled Subject like
    "Title\r\nBcc: attacker@evil.com"
  smuggles an extra header past the SMTP boundary
  (CWE-93 / CRLF injection, CodeQL go/email-content-injection).
- 5-case unit test pins the contract: CR, LF, CRLF, benign content
  preserved, replacement-with-space semantics.

pkg/email/email.go LoggingSender:
- Also ran the 4 medium go/log-injection alerts on this file by
  sanitising the "to" address and subject before passing them to
  the structured logger. Keeps a malicious address like
    "victim@example.com\nFAKE-LOG: admin login succeeded"
  from forging log lines on loggers that don't JSON-escape values.

internal/infra/notifier/email.go:
- Same treatment for the notifier SMTP path: Title → subject (also
  when prefixed with severity emoji), FromName, FromEmail, every
  ToEmails entry, ReplyTo. Imports emailpkg for the sanitiser
  instead of duplicating the 4 lines locally.

internal/infra/postgres/ai_triage_budget_repository.go:
- Defensive cleanup (not CodeQL-flagged): UpdateLastWarnSent and
  UpdateLastBlockSent now carry their own static UPDATE strings
  rather than sharing a column-name-interpolated Sprintf helper.
  Two literal SQL strings, zero runtime string formatting in SQL
  land, no tainted-flow surface. The former helper was safe via a
  whitelist map, but a future refactor that drops the whitelist
  would regress silently; explicit SQL avoids that class entirely.
… doc dismissals

Follow-up to fix(security): email-header CRLF injection (4ceb3de). Closes
the remaining CodeQL alerts that landed on this PR:

SQL-query-from-user-input (3 high):
- apikey_repository.go ListAPIKeys: replace map-allowlist +
  concatenation with explicit switch/case that assigns LITERAL
  column-names on every branch. Map-based allowlists are safe but
  CodeQL's taint tracker can't trace through them.
- template_source_repository.go List: same switch/case refactor for
  sortBy. Also binds LIMIT/OFFSET via $N parameters instead of
  fmt.Sprintf %d — kind-checked by Go's typechecker already, but the
  binding keeps CodeQL's flow-analysis fully clean.
- asset_repository.go countByFields: parameterise the JSONB key in
  the UNION ALL clauses. The regex whitelist is retained as an early
  reject but no longer feeds into the query string directly.

Slice-allocation with user-influenced size (2 high):
- finding_approval_repository.go: const maxApprovalsCap = 100;
  clamp page.Limit() into a separate `capacity` var before make().
- tenant_repository.go SearchMembersWithUserInfo: const
  maxMembersCap = 1000; same pattern. Defence-in-depth cap even
  though the HTTP handler already paginates.

Log-entry-from-user-input (1 medium):
- ingest/service.go Ingest: wrap report.Metadata.ID +
  SourceType through a new sanitizeIngestLogField (log_sanitize.go)
  that strips CR/LF and caps length at 512 chars. A compromised
  agent can no longer forge log lines via CTIS metadata.

Useless-assignment (8 warnings):
- Remove or annotate the trailing `argIndex++` in 8 repository
  query-builders (rule_source, rule, rule_bundle, finding,
  audit, agent, admin×2). Each was the last condition in its chain,
  so the post-increment had no reader.

Wrapper-level false-positives (4 dismissals, documented in code):
- pkg/crypto/hash.go HashToken + HashTokenBytes: full doc-comment
  explaining why SHA-256 is correct for 256-bit random tokens (not
  passwords), paired with `#nosec G401` and a pointer to dismiss the
  CodeQL go/weak-cryptographic-algorithm alert as a false positive.
- middleware/{admin_audit,metrics,timeout}.go Write wrappers: add
  doc-comments explaining each is a transparent passthrough;
  request-layer escaping is the handler's job, not the wrapper's.
  Dismiss CodeQL go/reflected-xss on these lines.

Tests continue to pass (ingest, postgres, middleware, crypto, email).
Splits the SSRF blocklist into two tiers so on-prem CTEM deployments
can scan their own RFC1918 / ULA space without losing the cloud-
metadata / loopback / CGNAT guard.

Design:

  hardBlockedIPRanges — ALWAYS blocked, no env-var opens these:
    127.0.0.0/8        loopback on the API host
    169.254.0.0/16     link-local incl. AWS/GCP/Azure IMDS
    100.64.0.0/10      carrier-grade NAT
    0.0.0.0/8          "this" network
    224.0.0.0/4        multicast
    240.0.0.0/4        reserved
    255.255.255.255/32 broadcast
    ::1/128            IPv6 loopback
    fe80::/10          IPv6 link-local

  privateIPRanges — blocked BY DEFAULT, opt-in via
  OPENCTEM_HTTPSEC_ALLOW_PRIVATE=1 :
    10.0.0.0/8         RFC1918 class A
    172.16.0.0/12      RFC1918 class B
    192.168.0.0/16     RFC1918 class C
    fc00::/7           IPv6 ULA

Critical invariant (tested): even with OPENCTEM_HTTPSEC_ALLOW_PRIVATE=1
an attacker cannot reach IMDS or loopback. A compromised admin who
flips the env var still cannot weaponise the dialer to leak cloud
credentials.

Why env var, not config file: ops set this per-deployment (Helm
values / Docker env) — a cloud-hosted API pod keeps the cloud-safe
default; an internal API pod in a corporate DC opens up. Per-tenant
policy can be layered later if needed.

AllowPrivate() exposed so main.go can log the posture at boot.

Test coverage: new cases pin (a) RFC1918 reachable with opt-in,
(b) IMDS / loopback / CGNAT stay refused with opt-in, (c) alias
hostnames (localhost, metadata.google.internal) stay refused in both
modes, (d) default mode still blocks RFC1918.

Same change lands simultaneously in sdk-go/pkg/httpsec (for SDK
consumers) and agent/internal/executor/target_security.go (for
scanner target validation). The three copies stay drift-free via
scripts/security-lint.sh Rule 6.
…ed flow

CodeQL's go/unsafe-slice-allocation re-flagged these after the
clamp-with-separate-var fix: the tool keeps the "user-influenced"
taint on any value that originated from an HTTP parameter, even
when the value has been bounds-checked afterwards. The only path
to a clean flow is to hand make() the literal constant directly.

finding_approval_repository.go ListPending:
  approvals := make([]T, 0, maxApprovalsCap)  // const 100
tenant_repository.go SearchMembersWithUserInfo:
  members   := make([]T, 0, maxMembersCap)     // const 1000

Memory cost: 100 × 8 B + 1000 × 8 B = ~8.8 KB per request when the
page returns zero rows. Acceptable; any growth past the const is
still handled by Go's normal slice-grow path.

Semantics unchanged — make's capacity is an allocation hint, not a
length. Queries returning fewer rows still produce the right result.
…aturity

Feat/ctem prioritization and maturity
@@ -135,6 +135,10 @@
mrw.ResponseWriter.WriteHeader(code)
}

// Write implements http.ResponseWriter. Same CodeQL
// go/reflected-xss false-positive note as auditResponseWriter.Write:
@0xmanhnv 0xmanhnv merged commit 2f42f23 into main Apr 23, 2026
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants