Skip to content

Re-arm Elixir test gate: root-cause mix test --no-start full-app-boot failure (burble#35 item 2) #39

@hyperpolymath

Description

@hyperpolymath

Sub-issue of #35 (item 2). Tracks the blocked work: root-cause and fix the Test (OTP 27 / Elixir 1.17) failure so the gate can be re-armed. Do not remove continue-on-error until the suite is green on main.

Root-cause lead (evidence gathered 2026-05-17)

The failure is almost certainly a full-app-boot failure under mix test --no-start, not individual assertions:

  1. Justfile:245cd server && mix test --no-start.
  2. server/test/test_helper.exsApplication.ensure_all_started(:burble) (manually boots the whole app because --no-start suppressed the automatic start).
  3. server/lib/burble/application.ex start/2 has no test-env guard — it unconditionally boots the full production supervision tree: Burble.Store (VeriSimDB @ http://localhost:8081), Burble.Transport.RTSP (TCP 8554), Burble.Bolt.Listener (UDP 7373), Burble.LLM.Supervisor (QUIC 8503 via :quicer/msquic NIF), Burble.Media.Engine, Zig coprocessor NIF (server/priv/libburble_coprocessor.so).
  4. config/test.exs only sets Endpoint server: false; it does not disable Burble.Store or the network/NIF children.
  5. In CI none of these externals exist: VeriSimDB is not running on :8081, the Zig FFI .so may be absent (the "Build Zig FFI" step is itself continue-on-error, so a failed build silently yields no .so), and the quicer NIF/msquic may be unavailable.

⇒ The first hard-crashing child collapses Burble.Supervisor; ensure_all_started(:burble) returns {:error, …}; every test fails at startup — exactly the predicted mode.

(Local repro on this machine is non-authoritative: toolchain is Elixir 1.18.4/OTP 25 vs CI 1.17/OTP 27, and a separate Guardian.Plug.Pipeline compile error blocks local boot. Confirming which child crashes first still needs the blocking prerequisite below.)

Blocking prerequisite (unchanged from #35)

ONE of: widen cloud-sandbox allowlist to repo.hex.pm + builds.hex.pm; or an actions:read token / Actions-logs MCP tool; or paste the failing ** ( / N) test … failed block.

Remediation directions (decide once root cause confirmed)

  • A. Test-env start path: filter Burble.Application children via Application.compile_env(:burble, :start_children) so test boots only the minimal set (PubSub/registries/store-stub), excluding network listeners + external-dep children.
  • B. Drop --no-start; disable heavy children via config/test.exs flags the supervisor honours.
  • C. Build the Zig coprocessor .so before tests and re-arm the FFI gate (the FFI continue-on-error is part of the failure surface).

Acceptance criteria (from #35)

  • just test-server passes on main for OTP 27 / Elixir 1.17
  • continue-on-error: true removed from "Run server tests"; a planted failing test turns CI red
  • "GATE DEACTIVATED" comment removed with it
  • Consider gating dialyzer (it needs: test) and re-arming the "Build Zig FFI" gate

Refs #35

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions