Skip to content

fix(hasura): bump fetch_meta action timeout to 120s#54

Closed
acmeguy wants to merge 1 commit into
mainfrom
fix/hasura-fetch-meta-timeout
Closed

fix(hasura): bump fetch_meta action timeout to 120s#54
acmeguy wants to merge 1 commit into
mainfrom
fix/hasura-fetch-meta-timeout

Conversation

@acmeguy
Copy link
Copy Markdown

@acmeguy acmeguy commented May 10, 2026

Symptom

Saving cubes on dbx.fraios.dev appears to time out from the user's POV. Logs show the actual save mutation succeeds (the `versions` row gets inserted) but the editor's immediate `fetch_meta` re-query hits Hasura's default 30s webhook timeout:

```
"http exception when calling webhook"
"Response timeout"
"path": "/rpc/fetch_meta"
"responseTimeout": "ResponseTimeoutMicro 30000000"
"query_execution_time": 30.001728027
```

Pattern in actions logs (single user session, branch `f553ff44…`):

time duration result
20:21:25 13934ms 200
20:25:31 21999ms 200
20:24:49 30000ms timeout (hasura)
20:24:50 30000ms timeout (hasura)
20:25:53 27855ms 200
20:26:26 146ms 200 (warm cache)

Cold compile of a schema with many cubes consistently lands in the 14-30s window, racing the 30s timeout. Save invalidates the compiler cache (new version → new content_version), so the very next meta fetch is always cold.

Fix

Add `timeout: 120` to the `fetch_meta` action — same order as `fetch_tables` (180s), `gen_schemas_docs` (300s), etc. Other actions in this file already set explicit timeouts; `fetch_meta` was defaulting to 30s.

Why this works

  • Saves themselves were never the problem — they're a Hasura mutation, no webhook involved.
  • The editor refetches `meta` immediately after save to refresh the field tree. With cold cache, that webhook call needs ~25-30s to complete a full Cube.js compile pass. 120s gives headroom even for unusually large schemas.
  • Cache warm-up is unchanged; once the compile finishes once, subsequent fetches stay sub-200ms until the next save.

Follow-up

Longer-term improvement (separate change): `smartGenerate` route already calls `compilerCache.purgeStale()` after persisting a new version. Could additionally pre-warm by triggering a meta build of the new version on a worker, so the next user-facing fetch is warm. Out of scope for this fix.

Test plan

  • After deploy: save a cube, watch network tab — `fetch_meta` either completes in <30s or runs to 60-90s without a Hasura-side timeout
  • Confirm via `kubectl logs synmetrix-actions-* -n synmetrix` that requests still come through
  • Verify the editor doesn't surface a "save timed out" toast on a real save

🤖 Generated with Claude Code

The fetch_meta action defaulted to Hasura's 30s webhook timeout, which
was racing Cube.js's compile pass on cold cache. Schemas with many
cubes routinely take 25-30s to compile from scratch, so the editor's
post-save meta refetch hit a cold cache and timed out, surfacing in
the UI as "save timed out".

Saves themselves succeed (the version row is created); it's the
follow-on fetch_meta that fails. Bumping to 120s — same order as
fetch_tables (180s) — gives the compile enough headroom on real
workloads. Longer-term we should pre-warm the compiler cache after
a save so fetch_meta hits a warm cache.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@acmeguy
Copy link
Copy Markdown
Author

acmeguy commented May 10, 2026

Superseded by #56, which lands the same fetch_meta timeout bump alongside an explicit-timeout audit of every other action in actions.yaml. Closing without merge to avoid the duplicate edit.

@acmeguy acmeguy closed this May 10, 2026
acmeguy added a commit that referenced this pull request May 10, 2026
Every action in metadata/actions.yaml now has an explicit timeout. Bands
documented in a comment at the top of the file:

  10s   — local-only sync mutations
  30s   — DB-bound mutations / lookups against well-warmed tables
  60s   — read paths that may touch ClickHouse system tables or generate
          SQL across the whole cube graph
  120s  — Cube.js compile-bound queries (cold cache hits 14-30s p99)
  180s  — first-time table introspection / multi-step gen flows
  300s  — long-running profiling, LLM-driven generation, big query runs

Notable bumps from the previously-implicit 30s default:
  fetch_meta             30 → 120 (supersedes #54 — Cube.js compile)
  run_query              30 → 300 (analytical queries on cold caches)
  profile_table         180 → 300 (LC probe alone takes minutes on big tables)
  fetch_dataset, gen_sql, pre_aggregation_preview, pre_aggregations,
  copy_datasource, export_data_models    30 → 60

Hard-tightened where the action is actually trivial:
  list_all_teams, manage_query_rewrite_rule, update_member_properties,
  update_team_properties, invite_team_member, check_connection,
  send_test_alert, create_team   stays at 30s but now explicit
  create_events stays at 10s

Closes #54 (fetch_meta timeout) — that PR can be closed in favor of
this one which lands the same fix plus the rest of the audit.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant