Skip to content

fix(helm): make hatchet token job a regular resource#14

Merged
charlie83Gs merged 1 commit intomainfrom
fix/hatchet-token-regular-job
Mar 23, 2026
Merged

fix(helm): make hatchet token job a regular resource#14
charlie83Gs merged 1 commit intomainfrom
fix/hatchet-token-regular-job

Conversation

@charlie83Gs
Copy link
Copy Markdown
Contributor

Problem

The hatchet token job was a post-install Helm hook, creating a deadlock:

  1. Helm waits for all pods to be ready before running post-install hooks
  2. Workers wait for a valid hatchet token (init container)
  3. Token job (post-install hook) never runs because workers aren't ready
  4. Timeout → install fails

Fix

Remove all helm.sh/hook annotations. The token job and its RBAC are now regular resources deployed alongside everything else.

Startup sequence:

Helm deploys all resources simultaneously
  → DBs start (CNPG)
  → Hatchet starts (waits for hatchet-db)
  → Token job starts (init container waits for Hatchet)
  → Token job generates JWT, patches secret, restarts workers
  → Workers' init containers detect valid JWT → start

No deadlock — everything is self-resolving via init containers.

🤖 Generated with Claude Code

Remove all helm.sh/hook annotations from the token job and RBAC.
The job is now a regular Kubernetes Job deployed alongside everything
else. Its init container waits for Hatchet, then generates the token
and patches the secret. Workers' init containers wait for the token.

This avoids the deadlock where post-install hooks can't run because
workers are waiting for the token that the hook would generate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@charlie83Gs charlie83Gs merged commit d924791 into main Mar 23, 2026
3 checks passed
@charlie83Gs charlie83Gs deleted the fix/hatchet-token-regular-job branch March 23, 2026 19:06
charlie83Gs added a commit that referenced this pull request Mar 27, 2026
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
charlie83Gs added a commit that referenced this pull request Apr 5, 2026
Critical:
- #1: Validate schema names with strict ^[a-z0-9_]+$ regex before DDL
- #2: Escape ILIKE special chars (%, _, \) in graph_nodes search
- #3: Replace cached Graph ORM instances with frozen GraphInfo dataclass
  to prevent DetachedInstanceError

High:
- #4: Reuse system session factories for default graph (no duplicate pools)
  via default_graph_session_factory/default_write_session_factory params
- #5: Add 23 unit tests — GraphInfo, GraphSessions, GraphSessionResolver,
  slug/schema validation, CreateGraphRequest, role validation
- #6: Scope sync watermarks by graph_slug — SyncEngine now passes
  graph_slug to _get_watermark/_set_watermark, composite PK on
  (table_name, graph_slug)

Medium:
- #7: Replace N+1 member count queries with batch GROUP BY
- #8: Replace catch { // ignore } with console.error in frontend
- #9: Engine pool disposal on GraphSessionResolver.invalidate()
- #10: Run Alembic migrations during graph provisioning
- #11: (node_count in list deferred — requires cross-schema queries)

Low:
- #13: Replace "Cycle Role" button with role dropdown
- #14: require_writer/require_graph_admin kept for future endpoints

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
charlie83Gs added a commit that referenced this pull request Apr 5, 2026
- #2/#14: Fix MCP scope check — empty scopes = unrestricted access
  (graph_slugs=null tokens and SKIP_AUTH both work again). Only tokens
  with explicit graph:* scopes are restricted to those graphs.
- #3: Block reserved PG schema names (public, pg_catalog, pg_toast,
  pg_temp, information_schema, pg_*) in validate_schema_name()
- #4: Fix scalar_one() → scalar_one_or_none() in resolve_by_slug
  (introduced by session-release refactor in prior commit)
- #10: Sync raises RuntimeError instead of returning error dict when
  graph_resolver is unavailable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant