Conversation
…imezone bug Phase 1.1 + 1.2 of scan feature improvements: Migration 000102: - Add scans.agent_preference column (was in domain entity but missing from DB) - Add scans.profile_id column linking to scan_profiles - Add scans.timeout_seconds column (default 3600, max 86400) - Add FK scans.last_run_id -> pipeline_runs (referential integrity) - Add FK pipeline_runs.scan_id -> scans (referential integrity) Domain: - Add Scan.ProfileID and Scan.TimeoutSeconds fields - Add SetProfileID() and SetTimeoutSeconds() methods - Fix calculateNextRun() to honor ScheduleTimezone (was using local time) - Add proper time-of-day support for daily/weekly/monthly schedules - Add DefaultScanTimeoutSeconds (3600) and MaxScanTimeoutSeconds (86400) constants Repository: - Consolidate scanFromRow/scanFromRows into shared readScan helper - Add new columns to INSERT, UPDATE, and SELECT queries - Add WithProfileRepo service option for cross-tenant profile validation Service: - CreateScanInput accepts profile_id, timeout_seconds - Validate scan profile exists and belongs to tenant before linking - System profiles (no tenant) allowed for everyone Handler: - CreateScanRequest exposes agent_preference, profile_id, timeout_seconds - Wire request fields through to service input CRITICAL: Run migration 000102 before deploying this binary, otherwise SELECT queries will fail with "column does not exist" errors.
Phase 1.5 + 1.6 of scan feature improvements: Phase 1.5 — Scan timeout enforcement: - Add MarkTimedOutRuns method on pipeline.RunRepository - Single SQL UPDATE joins pipeline_runs and scans, marks runs as timeout when EXTRACT(EPOCH FROM (NOW() - started_at)) > scan.timeout_seconds - New ScanTimeoutController runs every 60s (registered alongside JobRecovery) - Uses scan's per-config timeout instead of one-size-fits-all global threshold Phase 1.6 — Distributed scheduler lock: - Add TryLockScanForScheduler/UnlockScanForScheduler on scan.Repository - Uses pg_try_advisory_lock(int4, int4) with namespaced key (SCAN namespace) - Per-scan locking — multiple API replicas can process different scans in parallel - Scheduler acquires lock before triggering, releases via defer - Skips scan with debug log if already locked by another instance - FNV-1a 32-bit hash converts UUID to lock key (rare collisions only cause brief serialization, not correctness issues) Tests: - Update 3 mock implementations with new repository methods - All unit + integration tests pass
Phase 1.8 — Plug all the holes from Phase 1: API surface fixes (the audit found these were missing): - ScanDetailResponse now exposes agent_preference, profile_id, timeout_seconds (clients can finally see fields they POSTed) - toScanResponse populates the new fields - UpdateScanRequest/UpdateScanInput accept profile_id, timeout_seconds, agent_preference, max_retries, retry_backoff_seconds (was create-only) - UpdateScan service properly handles all the new fields with sentinel semantics for profile_id (empty string = unlink) - ScanConfigExport/ImportConfig handle profile_id and timeout_seconds (export-import roundtrip no longer loses data) - Frontend ScanConfig type, CreateScanConfigRequest, UpdateScanConfigRequest expose the new fields Security hardening (from cybersecurity audit): - Add MinScanTimeoutSeconds = 30 constant + domain-layer enforcement in SetTimeoutSeconds (defense-in-depth — bypass HTTP validation still gets the floor) - Validation tags now require min=30 (was min=1, allowed DoS via 1s timeouts) - New scanprofile.Repository.GetAccessibleByID enforces tenant scope at SQL layer using `WHERE id = ? AND (tenant_id = ? OR tenant_id IS NULL)` — app-layer check is replaced with SQL-layer check (audit said "rely on SQL, not application logic") - Scan service uses GetAccessibleByID for both Create and Update profile linking — system profiles still allowed, cross-tenant access returns ErrNotFound (no info leak via error messages) Migration 103 (DB performance + Phase 2.3 schema): - Add idx_scans_last_run for FK referential integrity check on pipeline_run delete (audit said missing index causes table scan) - Add partial idx_pipeline_runs_pending_started for ScanTimeoutController sweeper query - Drop unused idx_scans_profile (no query filters by profile_id) - Add max_retries (0-10) and retry_backoff_seconds (10-86400) columns - Add retry_attempt column on pipeline_runs Phase 2.1 — Quality gate evaluation now actually works: - Trigger paths (workflow + single scan) propagate sc.ProfileID to run.ScanProfileID. The pipeline service already evaluates the quality gate from this field on run completion — it just never had a value before because the link wasn't wired. - End-to-end: scan has profile_id → run inherits ScanProfileID → quality gate evaluated when run completes → result stored in pipeline_runs.quality_gate_result Phase 2.2 — Job cancellation: - Add command.Repository.CancelByPipelineRunID — single SQL UPDATE that cancels all non-terminal commands belonging to a pipeline run via step_runs.pipeline_run_id JOIN, scoped by tenant - pipeline.Service.CancelRun now calls CancelByPipelineRunID after marking the run as canceled, so in-flight commands are aborted instead of waiting for JobRecoveryController to time them out (10-30 min) Phase 2.3 — Retry logic with exponential backoff: - Domain: Scan.MaxRetries, Scan.RetryBackoffSeconds, SetRetryConfig, CalculateRetryDelay (exponential), ShouldRetry - Domain: Run.RetryAttempt tracks the current attempt (0 = first) - Repository: pipeline.RunRepository.ListPendingRetries — single query using DISTINCT ON + JOIN to find latest failed run per scan and check exponential backoff against retry_backoff_seconds * 2^retry_attempt - New ScanRetryController polls every 60s, dispatches retries via the RetryDispatcher interface. Scan service implements RetryDispatcher via RetryScanRun method. - New run gets retry_attempt = previous + 1, eventually stopping at max_retries Repository refactoring: - Scan repository handles all new columns in INSERT/UPDATE/SELECT - Pipeline_run repository handles retry_attempt in INSERT/SELECT (both Create and CreateRunIfUnderLimit paths) Tests: - Update mocks for ListPendingRetries, CancelByPipelineRunID, GetAccessibleByID. All unit + integration tests pass. CRITICAL: Run migration 102 AND 103 before/with this binary deploy. Migration 102 adds the columns the binary references in SELECT queries.
Covers all the new functionality from Phase 1.x and Phase 2.x: - Section 1-5: Setup (auth, evil tenant, asset, group, profile) - Section 6: Phase 1.1+1.2 — create scan with new fields - Section 7: Phase 1.8d — min timeout 30s + max 86400 validation - Section 8: Phase 1.8d — cross-tenant profile rejection (security) - Section 9: Phase 1.8b — UpdateScan handles profile_id (link/unlink), agent_preference, timeout_seconds, max_retries, retry_backoff_seconds with sentinel semantics and bound validation - Section 10: Phase 1.8c — export/import roundtrip preserves new fields - Section 11: Phase 2.1 — quality gate evaluation wiring (trigger) - Section 12: Phase 1.3 — findings drill-down via /findings?scan_id= - Section 13: Phase 2.2 — cancel run cascades to commands - Section 14: tenant isolation on GET/UPDATE/DELETE (cross-tenant) - Section 15: timezone validation, agent_preference enum, edge cases - Section 16: stats and listing - Section 17: cleanup The script uses two tenants (primary + evil) to verify cross-tenant isolation. Each test asserts both happy-path and security-path behavior.
User feedback: triggering a paused scan returned "scan is not validate" — confusing.
Changes:
- trigger.go: replace generic "scan is not active" with state-specific
messages explaining WHY and HOW to fix:
paused -> "Cannot trigger '<name>' because it is paused. Resume the
scan first to trigger it."
disabled -> "Cannot trigger '<name>' because it is disabled. Activate
the scan first to trigger it."
Returns DomainError with code SCAN_NOT_TRIGGERABLE for client-side handling.
- scan_handler.go: introduce cleanErrorMessage helper that extracts the
user-facing .Message field from shared.DomainError instead of dumping
raw err.Error() (which prepends the code and appends ": validation").
All ValidationError, AlreadyExists, etc. responses now show clean text.
Critical fixes from the multi-role audit:
PM/TechLead audit:
- GAP 1: ScanConfigExport missing max_retries/retry_backoff_seconds.
Export-import roundtrip would silently lose retry config.
Added fields to struct + ExportConfig + ImportConfig wiring.
- GAP 4: pipeline_run_repository.Update() missing retry_attempt in SET clause.
After RetryScanRun set run.RetryAttempt and called Update(), the value
was silently dropped — retry counter would always read 0 from DB.
Added retry_attempt to UPDATE SET + parameter list.
- GAP 6: No tests for retry domain logic. Added scan_retry_test.go covering:
* SetRetryConfig: defaults, valid, negative, over-max, zero/min/max backoff,
boundary values, UpdatedAt mutation
* CalculateRetryDelay: 1st/2nd/3rd retry, exponential growth, cap at max
* ShouldRetry: max=0 never retries, attempts 0..max allowed, max+1 denied
* Clone: preserves retry config, independent from original
Security audit:
- GAP 5: ScanRetryController could dispatch duplicate retries on crash/restart
or with multiple controller instances.
Fixed via:
* New retry_dispatched_at column on pipeline_runs (migration 103)
* ListPendingRetries now uses CTE + FOR UPDATE SKIP LOCKED + atomic UPDATE
to set retry_dispatched_at = NOW(). The candidate is "claimed" before
being returned, preventing concurrent dispatch.
* Eligibility filter excludes runs where retry_dispatched_at IS NOT NULL.
* Index idx_pipeline_runs_retry_pending optimizes the lookup.
User-reported UX:
- Trigger error "scan is not validate" was unhelpful.
Fixed in commit 9d64188: state-specific messages explaining WHY (paused/
disabled) and HOW to fix (resume/activate). cleanErrorMessage helper
in handler strips DomainError code prefix and wrapped error suffix.
Tests:
- All unit tests pass (added 4 retry test groups, 24 test cases)
…verride
Solves the chicken-and-egg problem: new deployments without SMTP couldn't
register users because verification emails couldn't be sent, but verification
was required to log in.
Resolution order (most specific wins):
1. Per-tenant SecuritySettings.EmailVerificationMode
- "always" → require even if SMTP missing (operator must configure SMTP)
- "never" → skip verification (closed/internal deployments)
- "auto" → smart detection (default)
2. SMTP availability auto-detection
- tenant SMTP configured → require verification
- system SMTP configured → require verification
- no SMTP at all → skip (graceful fallback)
3. Global env config (AUTH_REQUIRE_EMAIL_VERIFICATION) — fallback only
Changes:
- Domain: SecuritySettings.EmailVerificationMode + EmailVerificationMode enum
with "auto" / "always" / "never" values
- DefaultSettings: EmailVerificationMode defaults to "auto"
- SecuritySettings.Validate: rejects invalid modes
- AuthService.shouldRequireEmailVerification: encapsulates the resolution logic
- AuthService.SetSMTPChecker: optional dependency injection
- AuthService.Register: uses smart check, returns RequiresVerification reflecting actual decision
- AuthService.Login: only blocks unverified users when verification is currently required
- EmailService now implements SMTPAvailabilityCheck (HasSystemSMTP, HasTenantSMTP)
- main.go: wires Email service into AuthService for smart detection
Migration: NONE — uses existing JSONB tenants.settings column.
Existing tenants automatically get "auto" behavior (empty mode = auto path).
Production behavior:
- New deployment without SMTP → first user registers + auto-verified + can login
- After admin configures SMTP → next users get verification email as expected
- Tenant admin can force "always" or "never" via tenant settings UI (TODO: UI)
Operators can now edit agent config templates (yaml/env/docker/cli)
without rebuilding the frontend. Templates live in
configs/agent-templates/*.tmpl on the API host.
Backend:
- New configs/agent-templates/{yaml,env,docker,cli}.tmpl files using
Go text/template syntax with helper funcs (toScannerName, firstTool,
slugify)
- AgentConfigTemplateService loads templates from filesystem at startup,
supports Reload() for live editing without restart, falls back to
built-in defaults if any file is missing
- New GET /api/v1/agents/{id}/config-templates endpoint returns rendered
templates as JSON
- AgentHandler gains SetTemplateService and SetPublicAPIURL injection
- New AgentConfig config block: AGENT_CONFIG_TEMPLATES_DIR (default
configs/agent-templates) and AGENT_PUBLIC_API_URL (fallback to APP_URL)
- Dockerfile copies configs/ into the production image so the templates
ship with the binary
Addresses 7 critical/high findings from the multi-role audit.
SECURITY CRITICAL — API key in URL (credential leak):
- Move agent config templates API key from query parameter to
X-Agent-API-Key header. Query strings are logged by proxies, load
balancers, CDNs, browser history, referer headers — embedding a
credential there is a known leakage vector.
- Backend reads from header now (agent_handler.go:670).
- Frontend sends via header with AbortController for clean unmount.
- Next.js proxy forwards x-agent-api-key in route.ts allowlist.
SECURITY HIGH — Login email verification logic:
- Old: Login passed empty tenantID to shouldRequireEmailVerification, so
the per-tenant override never applied. Tenant with mode="always" could
not enforce verification at login.
- New: shouldRequireEmailVerificationForUser walks the user's tenant
memberships and applies the strictest mode (always > never > auto).
- Falls back to global SMTP detection only if user has no memberships.
SECURITY HIGH — SMTP downgrade protection:
- If a user has a verification token (issued at registration when SMTP
was required), Login NOW always denies even if SMTP is later removed.
- Prevents an attacker from triggering SMTP removal to auto-verify
accounts stuck pending verification.
PERFORMANCE HIGH — Missing index on commands.step_run_id:
- CancelByPipelineRunID JOINs commands → step_runs via step_run_id
but commands had no index on that column. Full table scan on every
cancel.
- Migration 104 adds idx_commands_step_run_id (partial, NOT NULL).
PERFORMANCE HIGH — Non-sargable retry query:
- Old idx_pipeline_runs_retry_pending was (scan_id, completed_at), but
the query orders by completed_at and filters on retry_dispatched_at
IS NULL. The previous index was missing completed_at IS NOT NULL in
its WHERE clause and had wrong column order.
- Migration 104 drops the old index and creates idx_pipeline_runs_retry_eligible
on (completed_at ASC, scan_id) WHERE status='failed' AND
retry_dispatched_at IS NULL AND completed_at IS NOT NULL.
- Also adds idx_pipeline_runs_tenant_status_created for paginated lists.
FEATURE HIGH — EmailVerificationMode in API:
- Add field to UpdateSecuritySettingsRequest, UpdateSecuritySettingsInput,
SecuritySettingsResponse — operators can now actually configure
per-tenant email verification mode via the API.
- Validation: oneof=auto always never
- Empty value preserves existing mode (don't accidentally reset to auto).
UX MEDIUM — Disable agent config buttons when content empty:
- Added disabled={!yamlConfig} (etc) so spamming buttons during initial
render is impossible. Added aria-label for accessibility.
- Collapse GetAggregateStats from 6 round-trips to 1 (CTE + UNION ALL). Postgres plans a single scan of the filtered set for all aggregates; result is iterated as (category, key, value::float8) rows. ~83% reduction in DB traffic for the asset stats endpoint, which is hit on every page render of /assets/* (every 30s per tenant via SWR). - GetAggregateStats now accepts a `tags` filter ([]string, overlap operator) so /assets/stats?types=...&tags=... mirrors List() semantics. This lets the UI keep stat cards consistent with whatever the user filtered to. - UpdateAssetInput + UpdateAssetRequest now carry owner_ref so the field is editable post-create (previously create-only). Wired through service to entity.SetOwnerRef. - Mock test files updated for the new repo signature.
GET /api/v1/agents/stats returns aggregate counts grouped by status,
health, type, and execution_mode for the calling tenant. Computed in a
single query using a CTE + UNION ALL — no client-side .filter().length
over a paginated list, which previously undercounted as soon as a tenant
had more than one page (per_page default = 20) of agents.
The endpoint also returns active_jobs (SUM(current_jobs) for online
daemon agents) and online_active for the headline cards.
Route is registered before /{id} so chi doesn't claim "stats" as a path
parameter. All six agent repository mocks gain a stub implementation.
…o 502
parseQueryArray
Adds maxQueryArrayItems (100) and maxQueryArrayItemLen (200) caps so a
pathological ?tags=a,b,...10000-items query can no longer allocate
unbounded slices or pq arrays. Defense-in-depth raw input cap before
splitting. Trim happens after the cap, so length limits are enforced.
integration_handler.handleServiceError
When the SCM client returns a network error (DNS lookup failed, refused,
i/o timeout, TLS handshake, EOF, network unreachable), the API used to
log the raw error and return 500 INTERNAL_ERROR — leaking internal
hostnames and DNS resolver addresses. Now: log details server-side at
WARN, return 502 INTEGRATION_UNREACHABLE with a generic message
("Could not reach the integration provider..."). Aligns with the
CLAUDE.md "generic error messages" rule.
…tats filter
Owner endpoints (POST/PUT/DELETE /assets/{id}/owners):
- Add per-request tenant-asset cross-check via new requireAssetInTenant
helper. Closes IDOR vector where a user with assets:write in tenant A
could craft a request against an asset_id in tenant B.
- Inject asset.Repository into AssetOwnerHandler.
Asset relationship Create handler:
- Use chi URLParam {id} as authoritative source asset id; reject body
source_asset_id when it disagrees with the URL.
- Add placement mutex in service: runs_on and deployed_to are mutually
exclusive for the same source/target pair (UI hides candidates,
backend enforces for non-UI clients).
Finding stats endpoint:
- Add optional asset_id query param so the severity cards on
/findings?assetId=… reflect the filtered table instead of global
tenant counts. Plumbed through repo (4-arg GetStats), service
(GetFindingStatsInput.AssetID) and handler. All 8 mock implementations
in tests/ updated to match the new signature.
Pentest campaign members repository:
- Fix 'column u.display_name does not exist' error: rename to u.name
(same pattern as the earlier fix in access_control_repository).
Compliance module:
- Add ModuleCompliance constant and entry to UserFacingModuleIDs so
tenant admins can disable it from Settings → Modules.
- ModulePermissionMapping entry now uses the constant instead of the
bare 'compliance' string for consistency.
New migrations:
- 000105_compliance_module_seed: insert the missing compliance row into
the modules registry. The compliance feature was implemented end-to-end
but missing from the seed, so the sidebar entry pointed at a phantom.
- 000106_users_trgm_search_indexes: GIN trigram indexes on users(name)
and users(email) so the owner picker server-side ILIKE search scales
past sequential-scan range. CONCURRENTLY removed because golang-migrate
wraps each file in a transaction; comment documents the manual workaround
for very large existing deployments.
Relationship types codegen: - New configs/relationship-types.yaml is the single source of truth for every relationship type ID, label, description, and constraint. - New cmd/gen-relationships Go program reads the YAML and emits two generated files: pkg/domain/asset/relationship_types_generated.go (Go enum + RelationshipTypeRegistry + categories) and ../ui/src/features/assets/types/relationship.types.generated.ts (TS union + GENERATED_RELATIONSHIP_LABELS + constraint maps). - Manual constants in pkg/domain/asset/relationship.go are removed; AllRelationshipTypes() now wraps the generated slice. - New `make generate-relationships` Makefile target. - Workflow: edit YAML → run target → commit YAML + the two generated files. Generated files have a "DO NOT EDIT" header. - Validation: codegen errors out on duplicate IDs, unknown categories, empty constraints, etc. Migration 000107_relationship_deployed_to_backfill: - One-shot data fix for relationship rows that were created under the old loose constraint table where service/api/website could be a deployed_to source. Per CMDB best practice, deployed_to is artifact- only — runtime workloads belong in runs_on. - Step (a): DELETE rows where a runs_on row already exists between the same pair (pure duplicates). - Step (b): UPDATE the remaining bad rows to relationship_type='runs_on', preserving description / confidence / impact_weight / tags. - Marked "review before applying" in the file header. Down migration is intentionally a no-op (the data fix is one-way).
ESLint blocks @ts-nocheck via ban-ts-comment. The generated TS file is type-safe so the directive was unnecessary belt-and-braces. Removed from the template; pre-commit lint passes.
Honest evaluation of the relationship registry surfaced multiple issues
the previous version inherited from generic CMDB best practice without
validating against the actual platform model. This commit acts on that
evaluation:
Type registry changes (configs/relationship-types.yaml):
- DROP `owned_by` — duplicates the proper asset_owners table (RACI
ownership lives there, not as a fake "service-as-team" edge).
- DROP `member_of` — the inverse of `contains`. Hierarchy now has a
single canonical direction (parent → child via `contains`). Existing
ingest pipeline (subdomain → domain) updated to use the new direction.
- TIGHTEN `resolves_to` to STRICT semantic — targets are now only
`[ip_address, load_balancer]`. Previously also accepted `host` and
`cloud_account` which contradicted the description ("DNS resolves to
IP / CDN / load balancer"). Strict semantic is more flexible long-term:
it enables cross-relationship analytics ("all domains resolving to a
compromised IP"), supports external scanner integration, and lets new
IP-bearing types (subnet, vip, anycast) extend cleanly.
- ADD `cname_of` — DNS alias / subdomain → parent domain edges that
resolves_to was being misused for.
- ADD `peer_of` — symmetric edge for HA pairs, sibling services,
leader/follower nodes. Previously had to be modeled as two depends_on
edges in opposite directions which was wrong.
- ADD `replicates_to` — infra-level state copy (DB primary → replica,
log shipper → remote). Distinct from sends_data_to which is
application-level data flow.
- ADD `has_access_to` — the missing IAM resource direction. The old
`granted_to` conflated "credential granted to principal" with
"principal has access to resource"; now split.
- TIGHTEN `granted_to` — restricted to credential → principal
(matches the IAM grant semantic). Resource-side targets moved to
has_access_to.
- TIGHTEN descriptions for `monitors` vs `protected_by` — the
distinction is now explicit (passive observability vs active
prevention) so users have a rule for ambiguous tools (EDR, WAF).
- RENAME category `control_and_ownership` → `control_and_observability`
(since `owned_by` is gone).
Subdomain ingest fix (internal/app/ingest/processor_assets.go):
- Updated to use the new canonical hierarchy direction
`parent_domain contains subdomain` instead of the dropped
`subdomain member_of parent_domain`.
Usage stats endpoint:
- New `RelationshipRepository.CountByType(tenantID)` aggregates
relationship counts per type via a single GROUP BY query.
- New service method `GetRelationshipTypeUsage(tenantID)` joins the
per-tenant counts with the registry metadata so admins can see which
types are unused (count=0) and prune the registry based on real data.
- New endpoint `GET /api/v1/relationships/usage-stats` exposes it.
- Returns every registered type, including zero-count entries, in
YAML declaration order for stable response ordering.
Tests:
- Removed dropped types from the AllRelationshipTypes test list and
added the new ones.
- Replaced the hard-coded `expected 16` count with
`len(asset.AllRelationshipTypes())` so the test stays in sync with
the registry.
- Added `MockRelationshipRepository.CountByType` implementation.
New comprehensive guide at docs/development/relationship-types.md covering: - The 30-second mental model (YAML → codegen → Go + TS) - Quickstart for adding a new type end to end - File map showing what's manual vs generated - Full YAML schema reference with field rules and naming conventions - Workflows for add / modify / rename labels / remove / category changes - Validation rules and what each codegen error means - How the generated files are wired into the rest of the codebase - Telemetry — using the usage-stats endpoint to drive registry pruning - Limitations and known footguns (asset type IDs not validated, etc.) - Troubleshooting common errors - Design rationale FAQ (why YAML, why AOT codegen, why global registry, why some types are missing, etc.) - Cheat sheet at the bottom Cross-link added to the YAML file's header comment so anyone editing the registry sees the doc reference up front.
cmd/gen-relationships/main_test.go: - 17 tests covering every validate() error path: empty categories, empty types, missing/duplicate IDs, unknown category references, empty labels, missing constraints, empty sources/targets in a constraint. - Tests for the template helpers (toCamel, goStringList, tsStringList, trimDesc) including the YAML-folded multiline collapse path. - TestProductionYAML_LoadsAndValidates pins the production YAML through the parser + validator, so a future edit that produces an invalid config fails the test before it can be merged. Catches drift before it hits the build pipeline. Migration 000108_asset_relationships_type_index: - Adds idx_asset_relationships_tenant_type covering (tenant_id, relationship_type). The /relationships/usage-stats endpoint runs WHERE tenant_id = $1 GROUP BY relationship_type — the existing unique constraint covers the WHERE but is wider than needed, forcing more page reads. The narrower covering index lets PG do an Index Only Scan + HashAggregate, significantly faster on tenants with hundreds of thousands of relationships. - Trivial cost (~16 bytes per row) for meaningful read perf.
POST /api/v1/assets/{id}/relationships/batch — bulk-creates many
relationships from one source asset in a single round trip. Replaces
the previous N-parallel-POST flow used by the multi-select dialog.
Service (CreateRelationshipBatch in asset_relationship_service.go):
- Validates the tenant + source asset ONCE for the whole batch (the
primary efficiency win — single Get-By-ID instead of N).
- Per-item: parses target ID, parses + validates relationship type,
fetches target asset, runs the placement mutex check, builds the
domain entity, persists.
- Per-item failures (invalid type, missing target, duplicate, mutex
violation) mark the item as error/duplicate and continue. The
whole batch never aborts because one item failed — every item
gets a slot in the result.
- Maps unique-violation errors to BatchCreateStatusDuplicate so the
UI can show "already related" instead of "internal error".
- The result includes per-item index (matching the request position
so the frontend can map back to target names without re-fetching),
per-item status, and aggregate created/duplicates/errors counts.
Handler (BatchCreate in asset_relationship_handler.go):
- New BatchCreateRelationshipRequest with `items` array (validator
caps at 100 entries to prevent abuse via giant payloads).
- Always returns 200 with the result body — per-item failures are
inside, not at the HTTP layer.
- Whole-batch failures (bad tenant, missing source asset) still
return the appropriate error response via handleServiceError.
Route registration: POST /api/v1/assets/{id}/relationships/batch
gated by permission.AssetsWrite, same tenant middleware chain as
the singleton create endpoint.
Tests (asset_relationship_service_test.go):
- 7 cases for CreateRelationshipBatch: happy path all-succeed, mixed
one-invalid-target, invalid-type-marks-as-error, missing source
asset, malformed tenant ID, empty input, and a contract test that
pins the result Index matching input position (the frontend depends
on this).
- 4 cases for GetRelationshipTypeUsage: returns every registered type
with count (including unused types as count=0), includes metadata
from the registry, validates tenant ID format, empty tenant returns
all-zero entries.
11 new tests, all passing.
…n member removal
Session / refresh_token table growth:
- The CleanupExpiredSessions() method existed in session_service.go
(DELETE FROM sessions WHERE expires_at < NOW() OR status IN
('expired', 'revoked') + DELETE FROM refresh_tokens WHERE
expires_at < NOW()) but was NEVER wired into any background worker.
Every login created rows, every logout marked them 'revoked' (not
deleted), every refresh-token rotation added new rows with old ones
marked as 'used'. All accumulated forever.
- New SessionCleanupTicker in workers.go: runs every 1 hour, calls
CleanupExpiredSessions. Also fires once at startup to clear any
backlog from when this worker didn't exist. Timeout 5 minutes per
run. Logs counts when rows are actually deleted.
- Same pattern as the existing NotificationCleanupTicker (24h daily)
but on a tighter schedule because sessions expire faster than
notifications' 90-day retention.
Invitation re-accept after member removal (privilege escalation fix):
- When an admin removes a user from a tenant, any PENDING (unaccepted)
invitations for that user's email in that tenant are now auto-deleted.
Without this, a removed user could click their original invitation
link and rejoin the tenant, bypassing the admin's revocation intent.
- New repo method: DeletePendingInvitationsByUserID(tenantID, userID).
Uses a JOIN with the users table so the service doesn't need to
fetch the user's email separately — single SQL statement, case-
insensitive email match.
- Wired into TenantService.RemoveMember() as best-effort: failure is
logged but doesn't roll back the membership delete (the primary
security goal — revoking access — has already succeeded).
- Already-accepted invitations are NOT deleted (they're a historical
audit record with no active token).
- Updated 4 test mock files to satisfy the new interface method.
POST /api/v1/tenants/{tenant}/invitations/{invitationId}/resend — admin-only
endpoint that re-sends the invitation email for a pending invitation
without changing the token, expiry, or any other metadata. Idempotent:
calling it 3 times sends 3 emails, all containing the same token link.
Used when the original invitation email was lost to spam filters or the
recipient's email changed. The previous workaround was delete + recreate,
which invalidated the old token — if the user had already clicked the
link once but didn't finish signing up, their link would be dead.
Service (ResendInvitation in tenant_service.go):
- Validates tenant isolation, checks invitation is pending + not expired
- Looks up inviter name + tenant name for the email template (same logic
as CreateInvitation)
- Re-enqueues via emailEnqueuer.EnqueueTeamInvitation with the remaining
TTL (not the original 7 days) so the email reflects the actual expiry
- Audit logged as ActionInvitationCreated with "resent" message
Handler (ResendInvitation in tenant_handler.go):
- Requires RequireTeamAdmin() middleware
- Returns 200 with JSON message on success
- Returns 400 if invitation is already accepted or expired
- Returns 404 if invitation doesn't exist or belongs to another tenant
Route: POST /tenants/{tenant}/invitations/{invitationId}/resend
registered before the DELETE /{invitationId} route so chi matches the
literal /resend path before the param pattern.
Three-state membership lifecycle: active → suspended → reactivate / remove
Migration 000109_tenant_member_suspension:
- Adds status VARCHAR(20) DEFAULT 'active', suspended_at, suspended_by
to tenant_members. Existing rows default to 'active' (no data change).
- CHECK constraint: status IN ('active', 'suspended')
- Partial index on (tenant_id) WHERE status='suspended' for admin queries.
Domain (pkg/domain/tenant/membership.go):
- New MemberStatus type (active | suspended)
- Suspend(by) — sets status='suspended', records who and when. Rejects
owner suspensions.
- Reactivate() — sets status='active', clears suspended_at/suspended_by.
- IsSuspended(), Status(), SuspendedAt(), SuspendedBy() getters.
- ReconstituteMembershipWithStatus() for loading from DB.
Service (internal/app/tenant_service.go):
- SuspendMember(membershipID, actx) — calls domain Suspend, persists,
invalidates permissions + sessions, wipes pending invitations (same
cleanup as RemoveMember). Audit logged with severity=high.
- ReactivateMember(membershipID, actx) — calls domain Reactivate,
persists. Permissions load lazily on next request. Audit logged.
Repository (internal/infra/postgres/tenant_repository.go):
- UpdateMembershipStatus(membership) — single UPDATE touching status,
suspended_at, suspended_by columns.
Handler (internal/infra/http/handler/tenant_handler.go):
- POST /tenants/{t}/members/{id}/suspend → SuspendMember
- POST /tenants/{t}/members/{id}/reactivate → ReactivateMember
- Both require RequireTeamAdmin() middleware.
Routes (internal/infra/http/routes/tenant.go):
- Registered before DELETE /members/{memberId} so chi matches literal
/suspend and /reactivate paths before the param pattern.
Tests: 4 mock files updated with UpdateMembershipStatus stub.
- Filter out suspended memberships from GetUserMemberships at the SQL level so they cannot be used to mint tenant-scoped JWTs at login, refresh, or token-exchange. - Load membership status in GetMembership / GetMembershipByID and reject suspended members from RequireMembership middleware on every tenant-scoped request. This closes the gap where a suspended user's still-valid JWT could keep working until expiry. - Add unit tests covering the active / suspended / not-a-member paths.
The member suspension feature shipped with a backend gap: ListMembers / SearchMembers / GetMemberByEmail / GetMemberStats all selected u.status (the user-level status field) instead of m.status (the membership-level field added in migration 109). After an admin clicked Suspend, the API correctly wrote tenant_members.status='suspended' and middleware correctly rejected the suspended user — but the response payload still reported status='active', so the UI badge, the "Suspended" filter tab, the per-row Suspend/Reactivate toggle, and the "Active members" stats card all silently lied to the operator. This commit switches all four queries to COALESCE(m.status, 'active'), making the membership lifecycle the single source of truth for "is this person currently allowed in this tenant". User-level status is left in place for the login-time check (defense-in-depth against direct DB manipulation) but is no longer surfaced to tenant admins. Also drop a stray ineffectual idx++ in asset_repository.GetAggregateStats that the linter flagged — it was the trailing increment after the last filter branch and added no value.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.