Skip to content

fix(asset-lifecycle): wire scheduler + dry-run worker + bound worker …#60

Merged
0xmanhnv merged 1 commit into
developfrom
feat/asset-lifecycle-wiring
Apr 23, 2026
Merged

fix(asset-lifecycle): wire scheduler + dry-run worker + bound worker …#60
0xmanhnv merged 1 commit into
developfrom
feat/asset-lifecycle-wiring

Conversation

@0xmanhnv
Copy link
Copy Markdown
Collaborator

…batches

Phase 0 shipped the worker, cron controller, and HTTP handlers but none of them were actually connected in cmd/server. Three separate dead-code paths meant the feature compiled, passed tests, and produced no behavior whatsoever in production.

Wiring

  • cmd/server/workers.go now constructs the AssetLifecycleWorker and registers AssetLifecycleController with the ControllerManager so the daily cron tick actually fires. Backward-safe: tenants that have not opted in are skipped inside the worker on the first check.
  • cmd/server/services.go wires AssetService.SetLifecycleRepository against the Postgres asset repo so the snooze endpoint can write lifecycle_paused_until.
  • cmd/server/handlers.go + main.go back-wire the worker instance into TenantHandler after workers.Start — because handlers are built first, we expose a package-level WireAssetLifecycleWorker hook instead of re-ordering initialisation (which would risk other services that depend on handler construction).

Bounded batches

  • applyTransitions now runs UPDATE against a CTE-scoped LIMIT so a tenant with 10M stale candidates doesn't lock the table in one transaction. 5K rows per batch, loop until a batch returns fewer rows than the limit; hard cap at 50K transitions per tenant per cron tick so a mis-configured threshold can't saturate the database. Remainder rolls into tomorrow's run.

Audit

  • Worker now emits one asset.lifecycle_run audit event per non-empty tenant run with transitioned_to_stale count, threshold, grace period, excluded source types, and a sample of up to 100 affected asset IDs. Zero-transition runs stay in structured logs only so the audit table doesn't grow daily rows of noise.
  • Worker keeps running when audit.LogEvent fails — audit outage should not break the security-relevant lifecycle transitions.

Intentionally deferred

  • Per-snooze audit event: snooze endpoint still emits no audit entry because AssetHandler does not yet hold an AuditService reference. Batch worker audit covers the primary "what got demoted and when" question; follow-up to thread AuditService into AssetHandler for per-snooze traceability.
  • Rate limiting on snooze endpoint: admin-only action, low blast radius, low priority. Revisit if we see abuse.
  • Reconstitute wiring of lifecycle columns: still omitted from the main asset SELECT pipeline. UI can call the snooze endpoint but cannot yet display "currently snoozed until X" from a GET. Next follow-up when we surface lifecycle state in asset responses.

…batches

Phase 0 shipped the worker, cron controller, and HTTP handlers but
none of them were actually connected in cmd/server. Three separate
dead-code paths meant the feature compiled, passed tests, and
produced no behavior whatsoever in production.

Wiring
- cmd/server/workers.go now constructs the AssetLifecycleWorker
  and registers AssetLifecycleController with the ControllerManager
  so the daily cron tick actually fires. Backward-safe: tenants
  that have not opted in are skipped inside the worker on the
  first check.
- cmd/server/services.go wires AssetService.SetLifecycleRepository
  against the Postgres asset repo so the snooze endpoint can write
  lifecycle_paused_until.
- cmd/server/handlers.go + main.go back-wire the worker instance
  into TenantHandler after workers.Start — because handlers are
  built first, we expose a package-level WireAssetLifecycleWorker
  hook instead of re-ordering initialisation (which would risk
  other services that depend on handler construction).

Bounded batches
- applyTransitions now runs UPDATE against a CTE-scoped LIMIT so a
  tenant with 10M stale candidates doesn't lock the table in one
  transaction. 5K rows per batch, loop until a batch returns fewer
  rows than the limit; hard cap at 50K transitions per tenant per
  cron tick so a mis-configured threshold can't saturate the
  database. Remainder rolls into tomorrow's run.

Audit
- Worker now emits one asset.lifecycle_run audit event per
  non-empty tenant run with transitioned_to_stale count,
  threshold, grace period, excluded source types, and a sample of
  up to 100 affected asset IDs. Zero-transition runs stay in
  structured logs only so the audit table doesn't grow daily rows
  of noise.
- Worker keeps running when audit.LogEvent fails — audit outage
  should not break the security-relevant lifecycle transitions.

Intentionally deferred
- Per-snooze audit event: snooze endpoint still emits no audit
  entry because AssetHandler does not yet hold an AuditService
  reference. Batch worker audit covers the primary "what got
  demoted and when" question; follow-up to thread AuditService
  into AssetHandler for per-snooze traceability.
- Rate limiting on snooze endpoint: admin-only action, low blast
  radius, low priority. Revisit if we see abuse.
- Reconstitute wiring of lifecycle columns: still omitted from the
  main asset SELECT pipeline. UI can call the snooze endpoint but
  cannot yet display "currently snoozed until X" from a GET. Next
  follow-up when we surface lifecycle state in asset responses.
@0xmanhnv 0xmanhnv merged commit d2cbf02 into develop Apr 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant