Develop: v0.1.5 by 0xmanhnv · Pull Request #38 · openctemio/api

0xmanhnv · 2026-04-09T06:13:41Z

No description provided.

…imezone bug Phase 1.1 + 1.2 of scan feature improvements: Migration 000102: - Add scans.agent_preference column (was in domain entity but missing from DB) - Add scans.profile_id column linking to scan_profiles - Add scans.timeout_seconds column (default 3600, max 86400) - Add FK scans.last_run_id -> pipeline_runs (referential integrity) - Add FK pipeline_runs.scan_id -> scans (referential integrity) Domain: - Add Scan.ProfileID and Scan.TimeoutSeconds fields - Add SetProfileID() and SetTimeoutSeconds() methods - Fix calculateNextRun() to honor ScheduleTimezone (was using local time) - Add proper time-of-day support for daily/weekly/monthly schedules - Add DefaultScanTimeoutSeconds (3600) and MaxScanTimeoutSeconds (86400) constants Repository: - Consolidate scanFromRow/scanFromRows into shared readScan helper - Add new columns to INSERT, UPDATE, and SELECT queries - Add WithProfileRepo service option for cross-tenant profile validation Service: - CreateScanInput accepts profile_id, timeout_seconds - Validate scan profile exists and belongs to tenant before linking - System profiles (no tenant) allowed for everyone Handler: - CreateScanRequest exposes agent_preference, profile_id, timeout_seconds - Wire request fields through to service input CRITICAL: Run migration 000102 before deploying this binary, otherwise SELECT queries will fail with "column does not exist" errors.

Phase 1.5 + 1.6 of scan feature improvements: Phase 1.5 — Scan timeout enforcement: - Add MarkTimedOutRuns method on pipeline.RunRepository - Single SQL UPDATE joins pipeline_runs and scans, marks runs as timeout when EXTRACT(EPOCH FROM (NOW() - started_at)) > scan.timeout_seconds - New ScanTimeoutController runs every 60s (registered alongside JobRecovery) - Uses scan's per-config timeout instead of one-size-fits-all global threshold Phase 1.6 — Distributed scheduler lock: - Add TryLockScanForScheduler/UnlockScanForScheduler on scan.Repository - Uses pg_try_advisory_lock(int4, int4) with namespaced key (SCAN namespace) - Per-scan locking — multiple API replicas can process different scans in parallel - Scheduler acquires lock before triggering, releases via defer - Skips scan with debug log if already locked by another instance - FNV-1a 32-bit hash converts UUID to lock key (rare collisions only cause brief serialization, not correctness issues) Tests: - Update 3 mock implementations with new repository methods - All unit + integration tests pass

Phase 1.8 — Plug all the holes from Phase 1: API surface fixes (the audit found these were missing): - ScanDetailResponse now exposes agent_preference, profile_id, timeout_seconds (clients can finally see fields they POSTed) - toScanResponse populates the new fields - UpdateScanRequest/UpdateScanInput accept profile_id, timeout_seconds, agent_preference, max_retries, retry_backoff_seconds (was create-only) - UpdateScan service properly handles all the new fields with sentinel semantics for profile_id (empty string = unlink) - ScanConfigExport/ImportConfig handle profile_id and timeout_seconds (export-import roundtrip no longer loses data) - Frontend ScanConfig type, CreateScanConfigRequest, UpdateScanConfigRequest expose the new fields Security hardening (from cybersecurity audit): - Add MinScanTimeoutSeconds = 30 constant + domain-layer enforcement in SetTimeoutSeconds (defense-in-depth — bypass HTTP validation still gets the floor) - Validation tags now require min=30 (was min=1, allowed DoS via 1s timeouts) - New scanprofile.Repository.GetAccessibleByID enforces tenant scope at SQL layer using `WHERE id = ? AND (tenant_id = ? OR tenant_id IS NULL)` — app-layer check is replaced with SQL-layer check (audit said "rely on SQL, not application logic") - Scan service uses GetAccessibleByID for both Create and Update profile linking — system profiles still allowed, cross-tenant access returns ErrNotFound (no info leak via error messages) Migration 103 (DB performance + Phase 2.3 schema): - Add idx_scans_last_run for FK referential integrity check on pipeline_run delete (audit said missing index causes table scan) - Add partial idx_pipeline_runs_pending_started for ScanTimeoutController sweeper query - Drop unused idx_scans_profile (no query filters by profile_id) - Add max_retries (0-10) and retry_backoff_seconds (10-86400) columns - Add retry_attempt column on pipeline_runs Phase 2.1 — Quality gate evaluation now actually works: - Trigger paths (workflow + single scan) propagate sc.ProfileID to run.ScanProfileID. The pipeline service already evaluates the quality gate from this field on run completion — it just never had a value before because the link wasn't wired. - End-to-end: scan has profile_id → run inherits ScanProfileID → quality gate evaluated when run completes → result stored in pipeline_runs.quality_gate_result Phase 2.2 — Job cancellation: - Add command.Repository.CancelByPipelineRunID — single SQL UPDATE that cancels all non-terminal commands belonging to a pipeline run via step_runs.pipeline_run_id JOIN, scoped by tenant - pipeline.Service.CancelRun now calls CancelByPipelineRunID after marking the run as canceled, so in-flight commands are aborted instead of waiting for JobRecoveryController to time them out (10-30 min) Phase 2.3 — Retry logic with exponential backoff: - Domain: Scan.MaxRetries, Scan.RetryBackoffSeconds, SetRetryConfig, CalculateRetryDelay (exponential), ShouldRetry - Domain: Run.RetryAttempt tracks the current attempt (0 = first) - Repository: pipeline.RunRepository.ListPendingRetries — single query using DISTINCT ON + JOIN to find latest failed run per scan and check exponential backoff against retry_backoff_seconds * 2^retry_attempt - New ScanRetryController polls every 60s, dispatches retries via the RetryDispatcher interface. Scan service implements RetryDispatcher via RetryScanRun method. - New run gets retry_attempt = previous + 1, eventually stopping at max_retries Repository refactoring: - Scan repository handles all new columns in INSERT/UPDATE/SELECT - Pipeline_run repository handles retry_attempt in INSERT/SELECT (both Create and CreateRunIfUnderLimit paths) Tests: - Update mocks for ListPendingRetries, CancelByPipelineRunID, GetAccessibleByID. All unit + integration tests pass. CRITICAL: Run migration 102 AND 103 before/with this binary deploy. Migration 102 adds the columns the binary references in SELECT queries.

Covers all the new functionality from Phase 1.x and Phase 2.x: - Section 1-5: Setup (auth, evil tenant, asset, group, profile) - Section 6: Phase 1.1+1.2 — create scan with new fields - Section 7: Phase 1.8d — min timeout 30s + max 86400 validation - Section 8: Phase 1.8d — cross-tenant profile rejection (security) - Section 9: Phase 1.8b — UpdateScan handles profile_id (link/unlink), agent_preference, timeout_seconds, max_retries, retry_backoff_seconds with sentinel semantics and bound validation - Section 10: Phase 1.8c — export/import roundtrip preserves new fields - Section 11: Phase 2.1 — quality gate evaluation wiring (trigger) - Section 12: Phase 1.3 — findings drill-down via /findings?scan_id= - Section 13: Phase 2.2 — cancel run cascades to commands - Section 14: tenant isolation on GET/UPDATE/DELETE (cross-tenant) - Section 15: timezone validation, agent_preference enum, edge cases - Section 16: stats and listing - Section 17: cleanup The script uses two tenants (primary + evil) to verify cross-tenant isolation. Each test asserts both happy-path and security-path behavior.

User feedback: triggering a paused scan returned "scan is not validate" — confusing. Changes: - trigger.go: replace generic "scan is not active" with state-specific messages explaining WHY and HOW to fix: paused -> "Cannot trigger '<name>' because it is paused. Resume the scan first to trigger it." disabled -> "Cannot trigger '<name>' because it is disabled. Activate the scan first to trigger it." Returns DomainError with code SCAN_NOT_TRIGGERABLE for client-side handling. - scan_handler.go: introduce cleanErrorMessage helper that extracts the user-facing .Message field from shared.DomainError instead of dumping raw err.Error() (which prepends the code and appends ": validation"). All ValidationError, AlreadyExists, etc. responses now show clean text.

Critical fixes from the multi-role audit: PM/TechLead audit: - GAP 1: ScanConfigExport missing max_retries/retry_backoff_seconds. Export-import roundtrip would silently lose retry config. Added fields to struct + ExportConfig + ImportConfig wiring. - GAP 4: pipeline_run_repository.Update() missing retry_attempt in SET clause. After RetryScanRun set run.RetryAttempt and called Update(), the value was silently dropped — retry counter would always read 0 from DB. Added retry_attempt to UPDATE SET + parameter list. - GAP 6: No tests for retry domain logic. Added scan_retry_test.go covering: * SetRetryConfig: defaults, valid, negative, over-max, zero/min/max backoff, boundary values, UpdatedAt mutation * CalculateRetryDelay: 1st/2nd/3rd retry, exponential growth, cap at max * ShouldRetry: max=0 never retries, attempts 0..max allowed, max+1 denied * Clone: preserves retry config, independent from original Security audit: - GAP 5: ScanRetryController could dispatch duplicate retries on crash/restart or with multiple controller instances. Fixed via: * New retry_dispatched_at column on pipeline_runs (migration 103) * ListPendingRetries now uses CTE + FOR UPDATE SKIP LOCKED + atomic UPDATE to set retry_dispatched_at = NOW(). The candidate is "claimed" before being returned, preventing concurrent dispatch. * Eligibility filter excludes runs where retry_dispatched_at IS NOT NULL. * Index idx_pipeline_runs_retry_pending optimizes the lookup. User-reported UX: - Trigger error "scan is not validate" was unhelpful. Fixed in commit 9d64188: state-specific messages explaining WHY (paused/ disabled) and HOW to fix (resume/activate). cleanErrorMessage helper in handler strips DomainError code prefix and wrapped error suffix. Tests: - All unit tests pass (added 4 retry test groups, 24 test cases)

…verride Solves the chicken-and-egg problem: new deployments without SMTP couldn't register users because verification emails couldn't be sent, but verification was required to log in. Resolution order (most specific wins): 1. Per-tenant SecuritySettings.EmailVerificationMode - "always" → require even if SMTP missing (operator must configure SMTP) - "never" → skip verification (closed/internal deployments) - "auto" → smart detection (default) 2. SMTP availability auto-detection - tenant SMTP configured → require verification - system SMTP configured → require verification - no SMTP at all → skip (graceful fallback) 3. Global env config (AUTH_REQUIRE_EMAIL_VERIFICATION) — fallback only Changes: - Domain: SecuritySettings.EmailVerificationMode + EmailVerificationMode enum with "auto" / "always" / "never" values - DefaultSettings: EmailVerificationMode defaults to "auto" - SecuritySettings.Validate: rejects invalid modes - AuthService.shouldRequireEmailVerification: encapsulates the resolution logic - AuthService.SetSMTPChecker: optional dependency injection - AuthService.Register: uses smart check, returns RequiresVerification reflecting actual decision - AuthService.Login: only blocks unverified users when verification is currently required - EmailService now implements SMTPAvailabilityCheck (HasSystemSMTP, HasTenantSMTP) - main.go: wires Email service into AuthService for smart detection Migration: NONE — uses existing JSONB tenants.settings column. Existing tenants automatically get "auto" behavior (empty mode = auto path). Production behavior: - New deployment without SMTP → first user registers + auto-verified + can login - After admin configures SMTP → next users get verification email as expected - Tenant admin can force "always" or "never" via tenant settings UI (TODO: UI)

Operators can now edit agent config templates (yaml/env/docker/cli) without rebuilding the frontend. Templates live in configs/agent-templates/*.tmpl on the API host. Backend: - New configs/agent-templates/{yaml,env,docker,cli}.tmpl files using Go text/template syntax with helper funcs (toScannerName, firstTool, slugify) - AgentConfigTemplateService loads templates from filesystem at startup, supports Reload() for live editing without restart, falls back to built-in defaults if any file is missing - New GET /api/v1/agents/{id}/config-templates endpoint returns rendered templates as JSON - AgentHandler gains SetTemplateService and SetPublicAPIURL injection - New AgentConfig config block: AGENT_CONFIG_TEMPLATES_DIR (default configs/agent-templates) and AGENT_PUBLIC_API_URL (fallback to APP_URL) - Dockerfile copies configs/ into the production image so the templates ship with the binary

Addresses 7 critical/high findings from the multi-role audit. SECURITY CRITICAL — API key in URL (credential leak): - Move agent config templates API key from query parameter to X-Agent-API-Key header. Query strings are logged by proxies, load balancers, CDNs, browser history, referer headers — embedding a credential there is a known leakage vector. - Backend reads from header now (agent_handler.go:670). - Frontend sends via header with AbortController for clean unmount. - Next.js proxy forwards x-agent-api-key in route.ts allowlist. SECURITY HIGH — Login email verification logic: - Old: Login passed empty tenantID to shouldRequireEmailVerification, so the per-tenant override never applied. Tenant with mode="always" could not enforce verification at login. - New: shouldRequireEmailVerificationForUser walks the user's tenant memberships and applies the strictest mode (always > never > auto). - Falls back to global SMTP detection only if user has no memberships. SECURITY HIGH — SMTP downgrade protection: - If a user has a verification token (issued at registration when SMTP was required), Login NOW always denies even if SMTP is later removed. - Prevents an attacker from triggering SMTP removal to auto-verify accounts stuck pending verification. PERFORMANCE HIGH — Missing index on commands.step_run_id: - CancelByPipelineRunID JOINs commands → step_runs via step_run_id but commands had no index on that column. Full table scan on every cancel. - Migration 104 adds idx_commands_step_run_id (partial, NOT NULL). PERFORMANCE HIGH — Non-sargable retry query: - Old idx_pipeline_runs_retry_pending was (scan_id, completed_at), but the query orders by completed_at and filters on retry_dispatched_at IS NULL. The previous index was missing completed_at IS NOT NULL in its WHERE clause and had wrong column order. - Migration 104 drops the old index and creates idx_pipeline_runs_retry_eligible on (completed_at ASC, scan_id) WHERE status='failed' AND retry_dispatched_at IS NULL AND completed_at IS NOT NULL. - Also adds idx_pipeline_runs_tenant_status_created for paginated lists. FEATURE HIGH — EmailVerificationMode in API: - Add field to UpdateSecuritySettingsRequest, UpdateSecuritySettingsInput, SecuritySettingsResponse — operators can now actually configure per-tenant email verification mode via the API. - Validation: oneof=auto always never - Empty value preserves existing mode (don't accidentally reset to auto). UX MEDIUM — Disable agent config buttons when content empty: - Added disabled={!yamlConfig} (etc) so spamming buttons during initial render is impossible. Added aria-label for accessibility.

- Collapse GetAggregateStats from 6 round-trips to 1 (CTE + UNION ALL). Postgres plans a single scan of the filtered set for all aggregates; result is iterated as (category, key, value::float8) rows. ~83% reduction in DB traffic for the asset stats endpoint, which is hit on every page render of /assets/* (every 30s per tenant via SWR). - GetAggregateStats now accepts a `tags` filter ([]string, overlap operator) so /assets/stats?types=...&tags=... mirrors List() semantics. This lets the UI keep stat cards consistent with whatever the user filtered to. - UpdateAssetInput + UpdateAssetRequest now carry owner_ref so the field is editable post-create (previously create-only). Wired through service to entity.SetOwnerRef. - Mock test files updated for the new repo signature.

GET /api/v1/agents/stats returns aggregate counts grouped by status, health, type, and execution_mode for the calling tenant. Computed in a single query using a CTE + UNION ALL — no client-side .filter().length over a paginated list, which previously undercounted as soon as a tenant had more than one page (per_page default = 20) of agents. The endpoint also returns active_jobs (SUM(current_jobs) for online daemon agents) and online_active for the headline cards. Route is registered before /{id} so chi doesn't claim "stats" as a path parameter. All six agent repository mocks gain a stub implementation.

…o 502 parseQueryArray Adds maxQueryArrayItems (100) and maxQueryArrayItemLen (200) caps so a pathological ?tags=a,b,...10000-items query can no longer allocate unbounded slices or pq arrays. Defense-in-depth raw input cap before splitting. Trim happens after the cap, so length limits are enforced. integration_handler.handleServiceError When the SCM client returns a network error (DNS lookup failed, refused, i/o timeout, TLS handshake, EOF, network unreachable), the API used to log the raw error and return 500 INTERNAL_ERROR — leaking internal hostnames and DNS resolver addresses. Now: log details server-side at WARN, return 502 INTEGRATION_UNREACHABLE with a generic message ("Could not reach the integration provider..."). Aligns with the CLAUDE.md "generic error messages" rule.

…tats filter Owner endpoints (POST/PUT/DELETE /assets/{id}/owners): - Add per-request tenant-asset cross-check via new requireAssetInTenant helper. Closes IDOR vector where a user with assets:write in tenant A could craft a request against an asset_id in tenant B. - Inject asset.Repository into AssetOwnerHandler. Asset relationship Create handler: - Use chi URLParam {id} as authoritative source asset id; reject body source_asset_id when it disagrees with the URL. - Add placement mutex in service: runs_on and deployed_to are mutually exclusive for the same source/target pair (UI hides candidates, backend enforces for non-UI clients). Finding stats endpoint: - Add optional asset_id query param so the severity cards on /findings?assetId=… reflect the filtered table instead of global tenant counts. Plumbed through repo (4-arg GetStats), service (GetFindingStatsInput.AssetID) and handler. All 8 mock implementations in tests/ updated to match the new signature. Pentest campaign members repository: - Fix 'column u.display_name does not exist' error: rename to u.name (same pattern as the earlier fix in access_control_repository). Compliance module: - Add ModuleCompliance constant and entry to UserFacingModuleIDs so tenant admins can disable it from Settings → Modules. - ModulePermissionMapping entry now uses the constant instead of the bare 'compliance' string for consistency. New migrations: - 000105_compliance_module_seed: insert the missing compliance row into the modules registry. The compliance feature was implemented end-to-end but missing from the seed, so the sidebar entry pointed at a phantom. - 000106_users_trgm_search_indexes: GIN trigram indexes on users(name) and users(email) so the owner picker server-side ILIKE search scales past sequential-scan range. CONCURRENTLY removed because golang-migrate wraps each file in a transaction; comment documents the manual workaround for very large existing deployments.

Relationship types codegen: - New configs/relationship-types.yaml is the single source of truth for every relationship type ID, label, description, and constraint. - New cmd/gen-relationships Go program reads the YAML and emits two generated files: pkg/domain/asset/relationship_types_generated.go (Go enum + RelationshipTypeRegistry + categories) and ../ui/src/features/assets/types/relationship.types.generated.ts (TS union + GENERATED_RELATIONSHIP_LABELS + constraint maps). - Manual constants in pkg/domain/asset/relationship.go are removed; AllRelationshipTypes() now wraps the generated slice. - New `make generate-relationships` Makefile target. - Workflow: edit YAML → run target → commit YAML + the two generated files. Generated files have a "DO NOT EDIT" header. - Validation: codegen errors out on duplicate IDs, unknown categories, empty constraints, etc. Migration 000107_relationship_deployed_to_backfill: - One-shot data fix for relationship rows that were created under the old loose constraint table where service/api/website could be a deployed_to source. Per CMDB best practice, deployed_to is artifact- only — runtime workloads belong in runs_on. - Step (a): DELETE rows where a runs_on row already exists between the same pair (pure duplicates). - Step (b): UPDATE the remaining bad rows to relationship_type='runs_on', preserving description / confidence / impact_weight / tags. - Marked "review before applying" in the file header. Down migration is intentionally a no-op (the data fix is one-way).

@ts-nocheck

ESLint blocks @ts-nocheck via ban-ts-comment. The generated TS file is type-safe so the directive was unnecessary belt-and-braces. Removed from the template; pre-commit lint passes.

Honest evaluation of the relationship registry surfaced multiple issues the previous version inherited from generic CMDB best practice without validating against the actual platform model. This commit acts on that evaluation: Type registry changes (configs/relationship-types.yaml): - DROP `owned_by` — duplicates the proper asset_owners table (RACI ownership lives there, not as a fake "service-as-team" edge). - DROP `member_of` — the inverse of `contains`. Hierarchy now has a single canonical direction (parent → child via `contains`). Existing ingest pipeline (subdomain → domain) updated to use the new direction. - TIGHTEN `resolves_to` to STRICT semantic — targets are now only `[ip_address, load_balancer]`. Previously also accepted `host` and `cloud_account` which contradicted the description ("DNS resolves to IP / CDN / load balancer"). Strict semantic is more flexible long-term: it enables cross-relationship analytics ("all domains resolving to a compromised IP"), supports external scanner integration, and lets new IP-bearing types (subnet, vip, anycast) extend cleanly. - ADD `cname_of` — DNS alias / subdomain → parent domain edges that resolves_to was being misused for. - ADD `peer_of` — symmetric edge for HA pairs, sibling services, leader/follower nodes. Previously had to be modeled as two depends_on edges in opposite directions which was wrong. - ADD `replicates_to` — infra-level state copy (DB primary → replica, log shipper → remote). Distinct from sends_data_to which is application-level data flow. - ADD `has_access_to` — the missing IAM resource direction. The old `granted_to` conflated "credential granted to principal" with "principal has access to resource"; now split. - TIGHTEN `granted_to` — restricted to credential → principal (matches the IAM grant semantic). Resource-side targets moved to has_access_to. - TIGHTEN descriptions for `monitors` vs `protected_by` — the distinction is now explicit (passive observability vs active prevention) so users have a rule for ambiguous tools (EDR, WAF). - RENAME category `control_and_ownership` → `control_and_observability` (since `owned_by` is gone). Subdomain ingest fix (internal/app/ingest/processor_assets.go): - Updated to use the new canonical hierarchy direction `parent_domain contains subdomain` instead of the dropped `subdomain member_of parent_domain`. Usage stats endpoint: - New `RelationshipRepository.CountByType(tenantID)` aggregates relationship counts per type via a single GROUP BY query. - New service method `GetRelationshipTypeUsage(tenantID)` joins the per-tenant counts with the registry metadata so admins can see which types are unused (count=0) and prune the registry based on real data. - New endpoint `GET /api/v1/relationships/usage-stats` exposes it. - Returns every registered type, including zero-count entries, in YAML declaration order for stable response ordering. Tests: - Removed dropped types from the AllRelationshipTypes test list and added the new ones. - Replaced the hard-coded `expected 16` count with `len(asset.AllRelationshipTypes())` so the test stays in sync with the registry. - Added `MockRelationshipRepository.CountByType` implementation.

New comprehensive guide at docs/development/relationship-types.md covering: - The 30-second mental model (YAML → codegen → Go + TS) - Quickstart for adding a new type end to end - File map showing what's manual vs generated - Full YAML schema reference with field rules and naming conventions - Workflows for add / modify / rename labels / remove / category changes - Validation rules and what each codegen error means - How the generated files are wired into the rest of the codebase - Telemetry — using the usage-stats endpoint to drive registry pruning - Limitations and known footguns (asset type IDs not validated, etc.) - Troubleshooting common errors - Design rationale FAQ (why YAML, why AOT codegen, why global registry, why some types are missing, etc.) - Cheat sheet at the bottom Cross-link added to the YAML file's header comment so anyone editing the registry sees the doc reference up front.

cmd/gen-relationships/main_test.go: - 17 tests covering every validate() error path: empty categories, empty types, missing/duplicate IDs, unknown category references, empty labels, missing constraints, empty sources/targets in a constraint. - Tests for the template helpers (toCamel, goStringList, tsStringList, trimDesc) including the YAML-folded multiline collapse path. - TestProductionYAML_LoadsAndValidates pins the production YAML through the parser + validator, so a future edit that produces an invalid config fails the test before it can be merged. Catches drift before it hits the build pipeline. Migration 000108_asset_relationships_type_index: - Adds idx_asset_relationships_tenant_type covering (tenant_id, relationship_type). The /relationships/usage-stats endpoint runs WHERE tenant_id = $1 GROUP BY relationship_type — the existing unique constraint covers the WHERE but is wider than needed, forcing more page reads. The narrower covering index lets PG do an Index Only Scan + HashAggregate, significantly faster on tenants with hundreds of thousands of relationships. - Trivial cost (~16 bytes per row) for meaningful read perf.

POST /api/v1/assets/{id}/relationships/batch — bulk-creates many relationships from one source asset in a single round trip. Replaces the previous N-parallel-POST flow used by the multi-select dialog. Service (CreateRelationshipBatch in asset_relationship_service.go): - Validates the tenant + source asset ONCE for the whole batch (the primary efficiency win — single Get-By-ID instead of N). - Per-item: parses target ID, parses + validates relationship type, fetches target asset, runs the placement mutex check, builds the domain entity, persists. - Per-item failures (invalid type, missing target, duplicate, mutex violation) mark the item as error/duplicate and continue. The whole batch never aborts because one item failed — every item gets a slot in the result. - Maps unique-violation errors to BatchCreateStatusDuplicate so the UI can show "already related" instead of "internal error". - The result includes per-item index (matching the request position so the frontend can map back to target names without re-fetching), per-item status, and aggregate created/duplicates/errors counts. Handler (BatchCreate in asset_relationship_handler.go): - New BatchCreateRelationshipRequest with `items` array (validator caps at 100 entries to prevent abuse via giant payloads). - Always returns 200 with the result body — per-item failures are inside, not at the HTTP layer. - Whole-batch failures (bad tenant, missing source asset) still return the appropriate error response via handleServiceError. Route registration: POST /api/v1/assets/{id}/relationships/batch gated by permission.AssetsWrite, same tenant middleware chain as the singleton create endpoint. Tests (asset_relationship_service_test.go): - 7 cases for CreateRelationshipBatch: happy path all-succeed, mixed one-invalid-target, invalid-type-marks-as-error, missing source asset, malformed tenant ID, empty input, and a contract test that pins the result Index matching input position (the frontend depends on this). - 4 cases for GetRelationshipTypeUsage: returns every registered type with count (including unused types as count=0), includes metadata from the registry, validates tenant ID format, empty tenant returns all-zero entries. 11 new tests, all passing.

…n member removal Session / refresh_token table growth: - The CleanupExpiredSessions() method existed in session_service.go (DELETE FROM sessions WHERE expires_at < NOW() OR status IN ('expired', 'revoked') + DELETE FROM refresh_tokens WHERE expires_at < NOW()) but was NEVER wired into any background worker. Every login created rows, every logout marked them 'revoked' (not deleted), every refresh-token rotation added new rows with old ones marked as 'used'. All accumulated forever. - New SessionCleanupTicker in workers.go: runs every 1 hour, calls CleanupExpiredSessions. Also fires once at startup to clear any backlog from when this worker didn't exist. Timeout 5 minutes per run. Logs counts when rows are actually deleted. - Same pattern as the existing NotificationCleanupTicker (24h daily) but on a tighter schedule because sessions expire faster than notifications' 90-day retention. Invitation re-accept after member removal (privilege escalation fix): - When an admin removes a user from a tenant, any PENDING (unaccepted) invitations for that user's email in that tenant are now auto-deleted. Without this, a removed user could click their original invitation link and rejoin the tenant, bypassing the admin's revocation intent. - New repo method: DeletePendingInvitationsByUserID(tenantID, userID). Uses a JOIN with the users table so the service doesn't need to fetch the user's email separately — single SQL statement, case- insensitive email match. - Wired into TenantService.RemoveMember() as best-effort: failure is logged but doesn't roll back the membership delete (the primary security goal — revoking access — has already succeeded). - Already-accepted invitations are NOT deleted (they're a historical audit record with no active token). - Updated 4 test mock files to satisfy the new interface method.

POST /api/v1/tenants/{tenant}/invitations/{invitationId}/resend — admin-only endpoint that re-sends the invitation email for a pending invitation without changing the token, expiry, or any other metadata. Idempotent: calling it 3 times sends 3 emails, all containing the same token link. Used when the original invitation email was lost to spam filters or the recipient's email changed. The previous workaround was delete + recreate, which invalidated the old token — if the user had already clicked the link once but didn't finish signing up, their link would be dead. Service (ResendInvitation in tenant_service.go): - Validates tenant isolation, checks invitation is pending + not expired - Looks up inviter name + tenant name for the email template (same logic as CreateInvitation) - Re-enqueues via emailEnqueuer.EnqueueTeamInvitation with the remaining TTL (not the original 7 days) so the email reflects the actual expiry - Audit logged as ActionInvitationCreated with "resent" message Handler (ResendInvitation in tenant_handler.go): - Requires RequireTeamAdmin() middleware - Returns 200 with JSON message on success - Returns 400 if invitation is already accepted or expired - Returns 404 if invitation doesn't exist or belongs to another tenant Route: POST /tenants/{tenant}/invitations/{invitationId}/resend registered before the DELETE /{invitationId} route so chi matches the literal /resend path before the param pattern.

Three-state membership lifecycle: active → suspended → reactivate / remove Migration 000109_tenant_member_suspension: - Adds status VARCHAR(20) DEFAULT 'active', suspended_at, suspended_by to tenant_members. Existing rows default to 'active' (no data change). - CHECK constraint: status IN ('active', 'suspended') - Partial index on (tenant_id) WHERE status='suspended' for admin queries. Domain (pkg/domain/tenant/membership.go): - New MemberStatus type (active | suspended) - Suspend(by) — sets status='suspended', records who and when. Rejects owner suspensions. - Reactivate() — sets status='active', clears suspended_at/suspended_by. - IsSuspended(), Status(), SuspendedAt(), SuspendedBy() getters. - ReconstituteMembershipWithStatus() for loading from DB. Service (internal/app/tenant_service.go): - SuspendMember(membershipID, actx) — calls domain Suspend, persists, invalidates permissions + sessions, wipes pending invitations (same cleanup as RemoveMember). Audit logged with severity=high. - ReactivateMember(membershipID, actx) — calls domain Reactivate, persists. Permissions load lazily on next request. Audit logged. Repository (internal/infra/postgres/tenant_repository.go): - UpdateMembershipStatus(membership) — single UPDATE touching status, suspended_at, suspended_by columns. Handler (internal/infra/http/handler/tenant_handler.go): - POST /tenants/{t}/members/{id}/suspend → SuspendMember - POST /tenants/{t}/members/{id}/reactivate → ReactivateMember - Both require RequireTeamAdmin() middleware. Routes (internal/infra/http/routes/tenant.go): - Registered before DELETE /members/{memberId} so chi matches literal /suspend and /reactivate paths before the param pattern. Tests: 4 mock files updated with UpdateMembershipStatus stub.

- Filter out suspended memberships from GetUserMemberships at the SQL level so they cannot be used to mint tenant-scoped JWTs at login, refresh, or token-exchange. - Load membership status in GetMembership / GetMembershipByID and reject suspended members from RequireMembership middleware on every tenant-scoped request. This closes the gap where a suspended user's still-valid JWT could keep working until expiry. - Add unit tests covering the active / suspended / not-a-member paths.

The member suspension feature shipped with a backend gap: ListMembers / SearchMembers / GetMemberByEmail / GetMemberStats all selected u.status (the user-level status field) instead of m.status (the membership-level field added in migration 109). After an admin clicked Suspend, the API correctly wrote tenant_members.status='suspended' and middleware correctly rejected the suspended user — but the response payload still reported status='active', so the UI badge, the "Suspended" filter tab, the per-row Suspend/Reactivate toggle, and the "Active members" stats card all silently lied to the operator. This commit switches all four queries to COALESCE(m.status, 'active'), making the membership lifecycle the single source of truth for "is this person currently allowed in this tenant". User-level status is left in place for the login-time check (defense-in-depth against direct DB manipulation) but is no longer surfaced to tenant admins. Also drop a stray ineffectual idx++ in asset_repository.GetAggregateStats that the linter flagged — it was the trailing increment after the last filter branch and added no value.

0xmanhnv added 22 commits April 8, 2026 07:30

fix(api): drop @ts-nocheck from relationship types codegen template

74c788c

ESLint blocks @ts-nocheck via ban-ts-comment. The generated TS file is type-safe so the directive was unnecessary belt-and-braces. Removed from the template; pre-commit lint passes.

0xmanhnv changed the title ~~Develop~~ Develop: v0.1.5 Apr 9, 2026

github-advanced-security AI found potential problems Apr 9, 2026

View reviewed changes

Comment thread internal/infra/postgres/asset_repository.go Fixed

github-code-quality Bot found potential problems Apr 9, 2026

View reviewed changes

Comment thread internal/infra/postgres/asset_repository.go Fixed

0xmanhnv merged commit b78800c into main Apr 9, 2026
27 checks passed

0xmanhnv added 2 commits April 9, 2026 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop: v0.1.5#38

Develop: v0.1.5#38
0xmanhnv merged 24 commits into
mainfrom
develop

0xmanhnv commented Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xmanhnv commented Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants