Skip to content

Refactor/signals rewrite 50 0051#65

Merged
atulmgupta merged 92 commits into
mainfrom
refactor/signals-rewrite-50-0051
May 17, 2026
Merged

Refactor/signals rewrite 50 0051#65
atulmgupta merged 92 commits into
mainfrom
refactor/signals-rewrite-50-0051

Conversation

@atulmgupta
Copy link
Copy Markdown
Contributor

Description

Closes #

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would break existing functionality)
  • Documentation update
  • Infrastructure / CI change

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have added tests that prove my fix is effective or my feature works
  • New and existing tests pass locally
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Screenshots (if applicable)

atulmgupta and others added 30 commits May 13, 2026 23:20
…-slice plan

Adds the Phase-50 AI adoption planning artifacts on feat/ai-adoption:

- ADR-015 (AI-Off Contract): codifies the binding constraint that AI is
  strictly additive. ai_mode defaults to off, every feature has a non-AI
  baseline that ships and stays maintained, off mode performs zero
  outbound provider calls and writes no ai_call_log rows, AI surfaces
  are absent (not greyed out), backend AI routes return 404 in off mode,
  per-feature opt-in inside non-off modes, AI-authored data survives a
  downgrade, provider keys never leak in off mode, the contract is
  enforced by the type system (HOC + middleware + ESLint + Go vet), and
  the final gate proves all 12 invariants end-to-end.

- 0000 methodology: vertical slice plan, P1-P10 design patterns
  (hexagonal port-adapter, tool-use over typed DTOs, SSE streaming,
  strategy + decorator chain, compile-time gates, single retrieval API,
  data-driven eval, single feature registry, baseline coexistence via
  interface), locked decisions D1-D15, provisional defaults PD1-PD8,
  rubber-duck-confirmed risks R1-R10, slice ordering rationale, and
  mandatory per-slice metadata contribution rules.

- 64 slice prompts (0001-0064) plus 9999 final gate, organised into
  16 tiers:
    F0-F9   foundation (ai-off contract, provider abstraction, settings
            UI, ai_call_log, tool-use framework, SSE streaming, eval
            harness, embeddings + pgvector, redaction, rate limit /
            cost cap)
    U1-U4   upgrade existing surfaces (chatbot, weekly digest, YIR,
            anomaly explanations)
    N1-N6   new conversational + builders (NL alert builder, NL
            automation builder, NL search, drive coaching, charging
            diagnosis, RAG help)
    D1-D5   driving (NL drive search/replay, speed-profile insights,
            route-efficiency, auto trip naming, trip planner LLM agent)
    C1-C5   charging (smart-charge schedule, battery health forecast,
            charging-curve fingerprint, cost forecast, vampire-drain)
    T1-T3   climate / tires
    A1-A3   alerts continued
    G1-G3   geofences / locations
    X1-X2   analytics narration
    S1-S7   diagnostics / system
    M1-M3   maintenance
    P1-P3   privacy / safety
    V1-V2   voice / watch
    PU1-PU3 power-user (NL SQL, NL Grafana, NL dashboard composer)
    GEN1-GEN2 generative (share-card image, paint preview)
    ML1-ML3 ML non-LLM (learned anomaly baselines, range prediction,
            charging-curve clustering)
    9999    final gate with ADR-015 invariant suite

  Every feature slice (0011-0064) follows the methodology per-slice
  template: artifact metadata, honesty covenant, logging requirements,
  problem statement, evidence, design, baseline coexistence (P10),
  redaction policy (F8), off-mode contract impact, registry metadata
  contribution (Backend / Frontend / UITestIDs / JobNames / PushKinds),
  action steps, allowed files, verification, gate criteria, commit
  format, blocked-path procedure, deliverable with ADR-015 footer, and
  forward dependency.

- .gitignore: whitelist Phase-50 planning artifacts under
  .github/prompts/db-refactor/phase-50-ai-adoption/** and ADR-015 so
  these branch inputs are tracked while keeping other prompt artifacts
  local-only.

No production code changes in this commit. The slice prompts are the
input contracts for the actual implementation work that will follow on
this branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Foundation slices 0001-0010, the 0000 methodology, and 9999 final gate
now share the same standard envelope as the feature slices 0011-0064:

- Front-matter description block
- Artifact Metadata table (log path, depends-on, allowed files)
- Honesty Covenant (10 rules)
- Logging Requirements (8 mandatory log sections)
- Problem statement scoped to ADR-015 preservation
- Action Steps preflight checklist
- Gate criteria with EXIT/STATUS markers
- Commit format including Copilot Co-authored-by trailer
- Blocked Path procedure

The original deeply-detailed Why / Evidence / Design / Tasks /
Verification / Forward-dependency content is preserved verbatim below
the standard header in each file. No semantic content was removed; the
diff is line-for-line equal in count (3611 insertions, 3611 deletions)
because every previously-existing line moved or was wrapped in the
new envelope.

This makes the slice prompts mechanically uniform so the per-slice
checklist (predecessor logs, gate transcripts, ADR-015 footer) is
enforceable across all 65 prompts without per-tier exceptions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Lint rule

Phase-50 / 0001 — BLOCKING foundation slice. Implements ADR-015 ("AI is
strictly additive, default-off") via end-to-end type-system enforcement
that no later AI feature slice can bypass.

What lands:

  - Migration 000201 extends settings (typed K/V per ADR-011) with a
    value_jsonb column and seeds four AI keys at default-off:
      ai_mode='off', ai_features='{}', ai_provider_config='{}',
      ai_cost_cap_cents=0
  - internal/ai/features/registry.go is the single source of truth for
    every AI surface (Routes, UI test IDs, capabilities). Seeded with
    chatbot-llm. CoverageOK rejects entries with no surface metadata
    or DefaultOn=true.
  - internal/ai/guard wraps every AI handler. Returns 404 (not 403/503,
    per ADR-015 §I6 — the route is functionally non-existent in off
    mode) on any of: settings-read error, ai_mode='off', or per-feature
    flag false. Panics at boot on unknown feature IDs so misspellings
    fail fast.
  - tools/aivet statically vets internal/api/*.go: every /api/v1/ai/*
    route must be a guard.Wrap call AND every Routes.Backend in the
    registry must appear in the router AND CoverageOK must pass.
  - tools/aigen generates web/src/ai/features.ts from the Go registry
    so backend and frontend cannot drift; --check mode fails CI on
    drift. Wired into Makefile as make generate / generate-check.
  - web/src/hooks/useAiEnabled.ts is the SPA-side gate, fail-closed
    on every error path.
  - web/src/components/ai/withAiFeature.tsx HOC renders null in off
    mode and tags rendered output with data-ai-feature for the
    invariant suite to assert against.
  - web/eslint-rules/ai-component-must-be-wrapped.js custom ESLint rule
    rejects raw default exports of AI-prefixed components or any
    component under web/src/features/<x>/ai/**.tsx that is not the
    return value of withAiFeature(...). Registered in eslint.config.js.
  - tests/ai-off-mode.spec.ts: Playwright skeleton, gated behind
    RUN_PLAYWRIGHT=1 for the 9999 final-gate.
  - settings_handler.go redacts ai_provider_config from GET responses
    when ai_mode='off' (ADR-015 §I9) and preserves it across off-mode
    SPA round-trips (incoming nil = use stored value).
  - One stub route mounted: POST /api/v1/ai/chatbot returns 501 when
    reached, so the off-mode 404 assertion is provably the guard's
    work and not chi's default no-match. Slice U1 (0011) replaces it.

Adapted decisions vs. the prompt as written:

  - Migration number 000196 in the prompt is taken (alert_rules_escalation);
    used 000201 (next available after 000200).
  - settings is a typed K/V store (ADR-011), not the wide-column shape
    the prompt's ALTER TABLE assumed. Schema extends K/V with value_jsonb
    + extends data_kind CHECK; INSERT 4 AI keys with defaults. Honors
    ADR-011 facade; the Settings struct shape and DTO are unchanged.
  - TeslaSync is single-tenant; guard.Settings interface drops the
    userID parameter the prompt assumed.

Verification (full transcript in slice log):

  go vet ./...                                           EXIT=0
  go test -race ./internal/ai/...   (9 tests pass)       EXIT=0
  go test -race ./internal/database/...                  EXIT=0
  go run ./tools/aivet                                   EXIT=0
  go run ./tools/aigen --check                           EXIT=0
  cd web && npx tsc --noEmit                             EXIT=0
  cd web && npx vitest run useAiEnabled withAiFeature
                            offMode.invariant eslintRule  21 PASS  EXIT=0
  cd web && npx eslint (AI scope, --max-warnings 0)      EXIT=0

The 15 ESLint errors that remain on
px eslint . are pre-existing
baseline on feat/ai-adoption (verified by stashing this slice and
re-running). All are in files this slice does not touch.

ADR-015 invariants:

  I1 default-off:    PASS  (migration default + Settings defaults)
  I5 hidden UI:      PASS  (offMode.invariant suite walks AI_FEATURE_IDS)
  I6 404 routes:     PASS  (TestGuard_OffModeReturns404)
  I7 type system:    PASS  (aivet + ESLint rule + aigen --check)
  I9 no leak:        PASS  (settings_handler.Get redacts in off mode)

Slice log: .github/prompts/db-refactor/logs/phase-50-0001-F0-ai-off-contract.log

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…corator chain, health endpoint

Phase-50 / 0002 - establishes the hexagonal Provider port plus
Ollama / OpenAI / Anthropic / mock adapters, the RFC1918+DNS-rebinding
local-mode validator (R3), the decorator chain seeded with WithTrace,
the Registry that resolves provider from settings, and the sudo+guard
gated /api/v1/ai/_internal/health diagnostic route.

ADR-015 invariants verified: I1, I3, I4, I5, I6, I7, I9, I10, I11, I12.
aivet PASS - 2 AI route(s), 2 feature(s) in registry, TS mirror in sync.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…chive policy

Phase-50 / 0003 - delivers the only opt-in surface for AI per ADR-015
sect.I7 (per-feature opt-in, no silent restore) and sect.I9 (key never
displayed in off mode). The Settings -> AI panel mounts a 3-mode
picker (off/local/cloud, default off), generates per-feature toggles
from the canonical AI registry (never hand-listed), and exposes a
"Restore previous selection?" panel with explicit Confirm/Decline
when the server has an archived selection from a prior mode->off
transition.

Backend:
  - migrations/000202 adds the ai_features_archived JSONB row.
  - models.Settings.AIFeaturesArchived round-trips through the typed
    settings repo.
  - settings_handler.Get redacts AIFeaturesArchived in off mode (same
    rationale as AIProviderConfig).
  - settings_handler.Update preserves both fields across SPA
    round-trips and calls applyAIArchiveOnModeFlip on every PUT - a
    pure helper that nil-safely clears AIFeatures and snapshots the
    prior selection on local/cloud->off transitions.
  - ai_settings_validate_handler mounts POST
    /api/v1/settings/ai/validate-config (settings sub-resource, not
    /api/v1/ai/* - reachable in OFF mode by design so users can opt
    in). Local mode runs ValidateLocalCtx with a 5s timeout; cloud is
    a no-op OK; off/unknown/malformed return 400; rejections return
    422 with structured {error,code} via writeErrorCode.

Frontend:
  - useSaveAiSettings: partial-merge wrapper around PUT /settings.
  - useValidateAiProvider: POSTs to the validate endpoint and shapes
    422 responses into a discriminated failure variant for inline
    feedback.
  - AISettings + 4 sub-components (AIProviderSection,
    AIFeatureToggleList, AIRestorePanel, AIUsageCard).
  - SettingsPage mounts <section id="ai"> between appearance and
    advanced.
  - i18n: top-level ai.settings.* namespace + toast keys.

Tests:
  - 16 Go tests (9 validate handler + 7 archive helper) - all pass.
  - 11 React component tests covering default-off rendering, sect.I9
    key redaction, registry-driven toggle generation, mode-flip
    clearing, archive restore panel visibility, validate happy + 422
    paths.

ADR-015 verification:
  - I1, I3, I4, I6, I7, I9, I10 PASS with evidence in slice log.
  - aivet PASS (2 AI routes, no new /api/v1/ai/* mounts).
  - aigen --check PASS (no registry changes; auto-generation in sync).
  - tsc --noEmit PASS.
  - Slice contribution to web vitest: +11 passing, 0 new failures
    (the pre-existing 77 failures are unrelated charts/signals/page
    container tests, verified by stash+rerun baseline diff).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase-50 / 0004 — adds the per-call AI audit log (TimescaleDB hypertable),
cost calculator, async Audit provider decorator (drop-oldest with metric),
three /ai/usage/* read endpoints, and a shared <UsageCard> primitive
that both TeslaApiUsageCard (refactored) and the new AiUsageCard consume.

Adaptations from prompt (documented in slice log):
- Migration slot 000203 (000198 was taken)
- user_subject TEXT instead of user_id BIGINT (no users table — single-tenant)
- Decorator wired in router.go (the prompt's app/new.go has no provider plumbing)
- AiUsageCard uses an inline ai_mode != off gate instead of withAiFeature
  (because __usage__ is a server-side meta-feature with no per-feature toggle)

Gates: aigen --check, aivet, go build, go test ./internal/ai/...,
./internal/database/... -run AICallLog, ./internal/api/... -run AIUsage,
tsc --noEmit, vitest (27/27 F3 tests pass).

Refs: ADR-015 (AI-off contract). All slice gates green; see
.github/prompts/db-refactor/logs/phase-50-0004-F3-ai-call-log.log

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…her, continuations

Phase-50 / Prompt 0005 — F4 ships the canonical AI tool-use surface:

- internal/ai/tools: Tool interface, Registry, JSON-Schema generator that reflects from validate:"..." struct tags (R2 mitigation: schema and runtime validator share one source of truth, pinned by TestEverySchemaMatchesHandlerValidation), 12 read-only starter tools wrapping existing repos.

- internal/ai/strategy: Strategy interface (interface-only) with placeholder RedactionPolicy/EvalGolden marker types that F8/F6 will widen.

- internal/ai/dispatch: Dispatcher chat loop with tool validation, mutating-tool confirm gate via ConfirmFn, max-iteration cutoff, ContinuationState round-trip, StreamWriter + CaptureWriter for tests.

- internal/database: ai_chat_continuations_repo with Save/Load/Delete/CleanupExpired, 24h DefaultContinuationTTL pinned by test, subject-scoped Load returns ErrContinuationNotFound for wrong subjects.

- migrations/000204: ai_chat_continuations table with JSONB state, expires_at index, partial user index, CHECK(expires_at>created_at). Slot 000204 (not the prompt's 000199; F0..F3 used 000201..000203 — slot variance documented in log).

- web/src/components/ai/ConfirmDialog: AiConfirmDialog Modal+Button (distinct from generic ui/ConfirmDialog) renders tool name + JSON args verbatim so user sees exactly what is about to happen — 8 vitest cases.

- docs/architecture/ai-tool-use.md: architecture overview, 5 design rules, 12-tool table, SSE protocol contract.

Mutating tools NOT shipped here per the prompt; they ship with the features that use them (N1, N2, ...). All 12 builtins are read-only and pinned by TestBuiltinsHaveNoMutators.

ADR-015 invariants preserved: zero new feature toggles (3 features pre/post), zero new HTTP routes (5 routes pre/post per aivet), zero outbound egress, zero non-AI files modified. Audit decorator chain unchanged.

Gates green: build=0, race tests=0 (tools/dispatch/strategy), continuations live DB=0 (7/7), tsc --noEmit=0, vitest=0 (8/8), aigen --check=0, aivet=0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase-50 / Prompt 0006. Ships the canonical SSE streaming primitive
(Pattern P3) for all conversational AI features.

Backend (internal/ai/stream/):
  - Writer implements dispatch.StreamWriter with bounded chan(64) +
    consumer goroutine. Send blocks the producer (R4: drops
    forbidden); on stall (default 5s, tunable) cancels upstream
    context and emits a terminal stream_stalled error event.
  - 5 Prom metrics (open/chunk/stall/cancel/duration), all labeled
    by feature_id. No drop counter by design.
  - 15 -race tests including stall determinism via a pinned
    httptest.ResponseRecorder.

Frontend (web/src/hooks/useAiStream.ts):
  - fetch + ReadableStream + TextDecoder consumer with 4-state
    machine (idle/streaming/paused-confirm/done/error).
  - paused-confirm survives stream close so the SPA dialog can wait
    for the user decision before opening a fresh continuation
    stream.
  - 19 vitest cases covering parse, accumulation, confirm pause,
    cancel propagation, 404/network/error surfaces, unmount cleanup.

Contract test (tools/aistream-contract/):
  - Text-level scan asserts every event-type literal and every JSON
    field name appears on BOTH sides. Catches schema drift between
    Go writer and TS hook before merge.

ADR-015: I1/I3/I4/I6/I12 invariants verified. Zero new feature
toggles, zero new HTTP routes — primitive is unreachable until a
future U-slice mounts a route under guard.Wrap. Stall observability
(I12) introduced by this slice.

Predecessor: F4 (0005) DONE.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ge + CI gate

Phase-50 / Prompt 0007 — adds the deterministic, offline LLM eval harness:

- internal/ai/provider/mock/canned.go: SequencedMock wrapper around mock.Mock + canned-file YAML loader. Mock.go itself is unchanged.

- internal/ai/eval/: GoldenSet/Validate, GenericStrategy adapter, stub tool registry, runner (RunSet/RunGolden, applyExpectations), judge invoker (seed=42, temperature=0), text + JUnit reporters.

- cmd/ai-eval: CLI with --feature/--all/--judge/--judge-model/--output/--record.

- tools/eval-schema-check: walks goldens.yaml files, validates schema.

- internal/ai/strategies/chatbot-llm/{goldens.yaml,canned/*.yaml}: 5 starter cases (range_question, tool_call_battery, tool_call_then_answer, refusal, ambiguous).

- .github/workflows/ai-eval.yml: fast on PR (advisory), full on push to main (blocking + JUnit), judged nightly (gated on JUDGE_PROVIDER+JUDGE_API_KEY).

- Makefile: 3 targets (ai-eval-fast, ai-eval-full, ai-eval-judged).

ADR-015 invariants touched: I3 (baseline intact), I4 (zero default egress), I10 (per-feature isolation). go.mod / go.sum updates are mechanical: yaml.v3 promoted to direct dep + its test-graph entries written by `go mod tidy`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…riever, PgvectorRetriever, TTL cron

Phase-50 / Prompt 0008 — single canonical retrieval surface (P7) for
AI consumers (N3, N6, D2/D5/C4 in subsequent slices).

Migrations:
  000205_enable_vector  — CREATE EXTENSION vector + version assert
  000206_embeddings     — embeddings_768 + embeddings_1536 with
                          HNSW (cosine), dedupe unique, expiry btree

Library (internal/ai/rag):
  Retriever interface + NoopRetriever (off-mode, ADR-015 I4 type
  proof) + PgvectorRetriever (audit-decorated via ProviderResolver,
  hash-deduped Index, transactional UPSERT/DELETE-stale, MaxK=100).
  Helpers: ChunkText (rune-safe word-boundary), encode/validateVector
  (reject NaN/Inf, dim assert), TTLPolicy (per source_type, year-9999
  sentinel for docs).

Background job (internal/jobs):
  RunEmbeddingsTTL — re-reads AIMode per tick (I12), DELETEs expired
  rows from both tables. Scheduled by app.New every hour.

Constructor wiring (internal/app/new.go):
  initAIBackgroundJobs runs the TTL cron unconditionally; the
  per-tick AIMode re-check is what enforces off-mode silence
  (handles runtime flips without server restart).

ADR-015 invariants preserved: I1 (mode-off Noop), I3 (audit chain via
ProviderResolver), I4 (zero embed/SQL/network in off-mode — proven by
spy test in factory_test.go), I7 (single P7 entry — every Embed flows
through resolver), I8 (factory fail-closed on settings error), I12
(cron re-checks AIMode per tick).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ypass report

Phase-50 / Prompt 0009 — F8 Redaction Layer (P5 decorator chain).

Adds:

- internal/ai/redact: 11 PIIClass detectors (VIN ISO 3779, email, phone E.164+intl, lat/long, address scanner, IPv4+IPv6 with RFC1918 exclusion, plate opt-in, CC Luhn, SSN, vehname, userid)

- Apply/Manifest/Mode (RedactedTags default, round-trippable via Restore)

- Process-local meta sink with 60s TTL sweep, deny-all DefaultPolicy

- WithRedaction provider decorator (innermost in chain; deep-copies req)

- Strategy hook + redactadapter bridge (breaks provider→redact→strategy cycle)

- Dispatcher installs per-request policy in ctx (default deny-all)

- Migration 000207 extends ai_call_log with redacted_classes[] + redaction_bypass

- Repo Insert consumes meta + RedactionBypassByFeature query

- /api/v1/ai/admin/redaction-bypass endpoint (gates on ai_mode != 'off')

- __redaction_bypass__ meta-feature (mirrors __usage__ pattern)

Slot variance: prompt says 000202 (taken by ai_features_archive); used 000207 (next free post-F7).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds:

- internal/ai/limit: token-bucket Limiter (per (subject,featureID)), 30s-cached CostCap with strict per-subject reservation, 80% warn threshold, fail-closed on infra error, MapTier/MapQuotaResolver helpers, FakeClock for deterministic tests

- internal/ai/provider/{ratelimit,cost}_decorator.go: Chat/Stream/Embed wrappers with two-arm select on stream forwarding + ctx-cancel slot release. Decorators ship as building blocks; chain wiring deferred to first consuming feature slice (router.go not in allowed-files list).

- internal/ai/dispatch/dispatch.go: errors.As(*limit.LimitError) detection in Chat loop -> structured SSE error frame via optional LimitErrorEmitter interface (5-scalar adapter to keep packages decoupled).

- internal/ai/stream/writer.go: idempotent WriteDoneFull (fixes deferred-overwrites-error bug); LimitDecisionPayload + WriteLimitError + EmitLimitError adapter.

- internal/ai/health/ollama_poll.go: poller probes /api/tags; suspends provider on 3 consecutive failures for 60s. Decoupled via Suspender/Doer/Clock interfaces (no cycle into limit package).

- web/src/hooks/useAiStream.ts: widened error event with reason/retry_after_s/banner_level/baseline_available; new AiLimitInfo + limit field on result.

- web/src/components/ai/AiLimitBanner.tsx: presentational banner with live retry countdown, baseline-available gating, full reason taxonomy (i18n + English fallbacks).

- web/src/features/settings/components/AISettings.tsx: live cost-cap spend bar (cloud-mode only, gated on cap>0); 80% amber / 100% rose; ARIA progressbar.

All gates green: go test -race -count=1 ./internal/ai/limit/... ./internal/ai/provider/... ./internal/ai/dispatch/... ./internal/ai/stream/... ./internal/ai/health/... = EXIT 0; go build ./... = EXIT 0; npm test --run AiLimitBanner = 18/18 EXIT 0; npx tsc --noEmit = EXIT 0; adjacent useAiStream + AISettings tests = 19+11 EXIT 0.

Per ADR-015: I1 default-off (no goroutines started by constructors), I3 baseline intact (limit error -> structured SSE -> baseline_available:true), I4 zero outbound egress (decorators do no IO; poller probes user-configured local URL only), I7 fail-loud on missing/unknown feature ID, R8 graceful fallback, R9 cost cap with banner. Decorators-as-building-blocks rationale: router.go + registry.go are NOT in this slice's allowed-files list per Honesty Covenant rule 9; wiring deferred to first consuming feature slice (e.g. U1).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pre-flight check fails: only 20 of 64 phase-50 slice logs exist
(slices 0001-F0 through 0020-N6 are DONE; slices 0021-D1 through
0064-ML3 have not been executed yet). Per the slice's Honesty
Covenant rules #3 and #7 and the explicit Blocked Path, this
verification-only terminal slice stops and commits only the
blocked log so the next operator resumes at slice 0021.

No production source changed. No tests added (would be vacuous
against an incomplete features.Registry). See log for full
preflight transcript and reasoning.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase-50 F2 (settings UI) was writing the provider config in a
flat shape:

  {"provider":"ollama","base_url":"...","model":"...","api_key":"..."}

while F1's ParseProviderConfig (internal/ai/provider/config.go)
expects the namespaced shape that the multi-provider design
mandates:

  {
    "default":   "ollama",
    "ollama":    {"base_url":"...","model":"...","api_key":"..."},
    "openai":    {"base_url":"...","model":"..."},
    ...
  }

When the flat shape was stored the backend couldn't find
raw["ollama"], fell through to applyDefaults, and substituted
DefaultLocalBaseURL = http://localhost:11434 (unreachable from
inside the API container). Every AI call failed with
"dial tcp [::1]:11434: connect: connection refused" no matter
what the user typed in Settings.

Changes
- AISettings.tsx
  - reads cfg[default] then drills into cfg[providerName]; falls
    back to legacy flat keys for unmigrated rows (defensive)
  - writes the namespaced shape, spreads existing
    ai_provider_config so other providers' entries survive,
    strips legacy top-level keys on save
  - new handleProviderChange callback re-loads the form fields
    from the new provider's stored entry when the dropdown
    switches (proper multi-provider UX)
- AISettings.test.tsx
  - 4 new tests pinning the canonical contract:
    namespaced read, legacy-flat read (backward-compat),
    namespaced write with multi-provider preservation,
    legacy-top-level-key stripping on re-save
- migrations/000208_ai_provider_config_renest.up.sql
  - idempotent in-place conversion of any legacy flat row to
    the namespaced shape on next API boot
  - .down.sql is intentionally a no-op (round-trip would lose
    non-default providers' configs)

Verification
- npx tsc --noEmit: clean
- AISettings.test.tsx: 15/15 pass
- offMode.invariant.test.tsx: 18/18 pass
- migration applied to local Postgres; legacy flat -> namespaced
  conversion verified; second run is a no-op (idempotent)
- end-to-end smoke: POST /api/v1/ai/chatbot returned real SSE
  delta+done events in 6.96s against the user's local Ollama
  at http://192.168.68.218:11434

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
atulmgupta and others added 24 commits May 15, 2026 17:04
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce an opt-in AI advisor that proposes a single quiet-hours / Do-Not-Disturb window from a user’s recent notification history. Register the feature in the ai features registry and add a full strategy implementation, read-only tools (draft_quiet_hours_window, validate_quiet_hours_window), goldens/canned examples, and unit tests. Add an API handler and routes, a frontend AI panel and tests, and small SPA/ui wiring updates. Tools are read-only (no DB writes), use aggregated per-hour counts (no raw titles/messages), enforce the same validation rules as the canonical POST /api/v1/notifications/quiet-hours handler, and apply a strict redaction policy (PolicyAlertBuilder) and per-request scope checks. The advisor never performs saves — users must click "Apply to form" and then use the existing Save flow to persist changes.
paho.mqtt.golang v1.5.0 with SetCleanSession(false) + SetAutoReconnect(true)
+ ResumeSubs=false (default) does NOT re-issue SUBSCRIBE on reconnect; it
relies on the broker remembering the persistent session. When EMQX's
session_expiry_interval (7200s) elapses while disconnected OR an EMQX node
restart wipes session state on a non-replicated cluster, the broker creates
a fresh empty session on the next reconnect. Paho silently stays
`connected=true` with zero subscriptions forever, the telemetry stream
goes dark, and no new drives/charges are captured.

Reproduced in prod via `emqx ctl clients list` showing
`Client(teslasync-pipeline ... clean_start=false subscriptions=0
delivered_msgs=0 connected=true)` and `emqx ctl subscriptions list`
confirming no `telemetry/+/v/+` subscription for the pipeline client.

Fix: wire an OnConnect callback through `NewProductionPipelineMQTT` that
invokes a new `PipelineSubscriber.OnBrokerReconnect` method. The method:

- Guards against the first OnConnect (which paho fires during the initial
  blocking Connect, possibly on a goroutine that races with Start) by
  requiring `started==true && stopped==false`. The initial Subscribe is
  still owned by Start.
- Re-issues `client.Subscribe(topic, qos, onPipelineMessage)` with the
  configured timeout.
- Resets the local `RedeliveryTracker` because the broker's in-flight
  bookkeeping is gone after a session-expired reconnect; keeping stale
  counts would skew the MaxRedeliveries DLQ threshold. (This finally
  fulfills the existing intent comment on `RedeliveryTracker.Reset`.)
- Logs success/failure clearly so operators can spot a stuck stream.

Construction in internal/app/new.go uses an `atomic.Pointer` to bridge
the chicken-and-egg between paho client construction (must happen before
PipelineSubscriber) and the OnConnect closure (which needs the
subscriber). The pointer is published BEFORE Start so the goroutine-
scheduled OnConnect cannot observe a torn state.

Why not `SetResumeSubs(true)`? paho v1.5.0 persists SUBSCRIBE packets
when ResumeSubs is true but does NOT delete completed entries after
SUBACK (client.go:854-872, net.go:205-217). Combining ResumeSubs with
manual re-Subscribe in OnConnect causes accumulating duplicate persisted
SUBSCRIBE packets across reconnects. We chose explicit re-Subscribe and
left ResumeSubs at its default false; the docstring on
`productionPipelineOptions` records this trade-off.

Tests in mqtt_test.go pin the contract:
- pre-Start invocations are no-ops (initial Subscribe is owned by Start)
- post-Start invocations re-issue Subscribe + reset the tracker
- post-Stop invocations are no-ops
- nil client argument falls back to the embedded client
- Subscribe error / timeout does NOT reset the tracker (subscription is
  not actually live)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Append the missing === STATUS === footer (EXIT=0, STATUS=DONE) and re-verify gates.

Production code shipped in ae32a68 (Add quiet-hours suggestion AI feature).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the build-time contract that pairs every guarded AI feature ID with its SPA component file and canonical /api/v1/ai/* endpoint (internal/ai/features/spa_wiring.go + generated web/src/ai/spaWiring.ts mirror), codifies methodology principles P11 (Wired-or-absent) and P12 (No placeholder buttons), and adds two `aivet` static checks that enforce them across the web/src/components/ai/ tree:

- W1-A rejects placeholder substrings (future slice, coming soon, wiring lands, would call POST) and literal-disabled Buttons.
- W1-B requires every SPAWiringTable Component file to import useAiStream AND reference its canonical endpoint path (either directly or via SPA_WIRING_BY_ID).

SURVEY confirmed 57/57 wireable components already import useAiStream from predecessor slices (F5 through ML tier 0064); W1's role is contract codification + static enforcement, not bulk component rewrites. The single pre-edit placeholder hit (AIChatbotIndicator.tsx file-header comment) is rewritten to historicize the wiring; the indicator file is allowlisted in SPAWiringIndicatorOnly because the chatbot call path lives on ChatbotPage.tsx.

No baseline handler, runtime path, route, UI test ID, background job, or client storage key is modified. ADR-015 invariants I3, I5, I6, I7, I8 untouched.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eb-lint, and web-test (drift from predecessor AI slices)

Phase-50 / Prompt 9999 - Final Gate. Predecessor coverage now satisfied
(64 / 64 slices in 0001..0064 plus the 0065 W1 SPA wiring slice all
STATUS=DONE), so the previous BLOCKED-on-coverage failure mode is
resolved. The HX (Helix UX) project-wide invariants all PASS.

However, the slice's prompt-defined Section 2 build matrix is RED on
three of its nine command groups, blocking the final gate for a
different reason:

  - go test -race ./...   FAIL
      internal/arch tests (TestBaselineHonoured,
      TestEveryInternalPackageHasDocGoWithLayer,
      TestFrozenPackagesNoNewFiles): 67 unauthored AI handler files
      under the ADR-009-frozen internal/api package; 75 packages
      missing the required doc.go layer declaration; baseline
      doc.go coverage dropped from 100.0% to 58.3%.

  - npm run lint   FAIL  (24 errors, 2 warnings)
      jsx-a11y label-has-associated-control x2,
      no-empty-object-type x1, no-unused-vars x2,
      unused eslint-disable directive x4.

  - npm test -- --run   FAIL  (64 tests in 11 test files)
      AISettings.test.tsx unhandled rejection at
      AIProviderSection.tsx:128 (validate-config response shape
      regression), plus 10 other pre-existing failing test files.

These red signals are NOT introduced by this slice. They are drift
created by predecessor AI feature slices that recorded
STATUS=DONE under their narrower per-slice gates while deferring
the global cleanup. The pattern was first disclosed by slice 0008-F7
("pre-existing failure disclosure") and has compounded across every
subsequent feature slice.

This slice's allowed-files list cannot include any of the files
required to fix the blockers (tools/archmetrics/baseline.json, the
internal/api/ai_*_handler.go relocations to internal/handler/v1, the
24 lint sites, the AIProviderSection response-shape regression, etc.),
and the prompt explicitly forbids production-source changes from this
slice.

Per Honesty Covenant rules #1, #2, #3, and #8, the slice STOPS at
EXIT=1 / STATUS=BLOCKED and commits only the log. The phase-50-final-gate
tag is NOT created and CHANGELOG.md is NOT modified. AI-Off Contract
invariants I5, I6, I7 remain proven by existing infrastructure
(internal/ai/guard/off_mode_test.go and
web/src/ai/__tests__/offMode.invariant.test.tsx); I4 and I12 remain
partially proven by the per-job tests under internal/jobs.

Forward path is documented in the log's REASONING section.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three coordinated layers fix the long-standing `need to clear cookies to

see data'' bug that surfaced as an infinite refresh loop on installed PWAs

and mobile devices where users can't easily clear site data.

**Layer 1 — stop precaching the SPA shell (web/vite.config.ts, web/src/sw/sw.ts)**

Workbox 7.4.0's precacheAndRoute() with directoryIndex default 'index.html'

rewrites GET / to /index.html and serves it from cache. Behind a ForwardAuth

proxy (Authentik) this swallows the 302 to /login on session expiry — the

SPA boots, fetches /api/v1/* which 401s, calls window.location.reload(),  the

SW serves cached index.html again, loop. Manifest's start_url '/' makes

every PWA cold launch enter the loop.

Drop 'html' from globPatterns + register a NavigationRoute(NetworkFirst,

networkTimeoutSeconds: 3). Navigations now hit the network (where the proxy

can redirect), with the last successful navigation HTML cached as the

offline fallback. Also switch registerType from 'prompt' to 'autoUpdate' so

buggy SWs don't strand users who dismissed the update toast; this requires

manual self.skipWaiting() + self.clients.claim() listeners in sw.ts because

injectManifest does NOT auto-inject them like generateSW.

**Layer 2 — explicit IdP handoff (web/src/lib/resilience.ts and modals)**

Replace window.location.reload() with navigateToReauth() that navigates the

top-level window to Authentik's documented entry point

/outpost.goauthentik.io/start?rd=<href> (verified against authentik upstream:

internal/outpost/proxyv2/application/application.go + oauth_state.go

redirectParam='rd'). The rd param deep-links the user back after sign-in;

sessionStorage write is kept as belt-and-suspenders fallback. Reauth URL is

configurable per-deployment via window.__TESLASYNC_REAUTH_URL__ (matches the

existing nginx sub_filter pattern used for __TESLASYNC_API_BASE__).

30s latch + window 'focus' listener gate against parallel queries each

firing their own navigation in the same tick. No per-response reset — the

session endpoint always returns 200 even when unauthenticated, so resetting

on success would race and churn Authentik's state-JWT cookie.

Updated SessionExpiredModal.handleSignIn and SessionExpiringModal.handleSignOut

to call navigateToReauth() for consistency and rd= preservation across all

paths. Removed the now-unreachable AuthExpiredOverlay (event dispatcher

deleted) and the dead offline.html file (never referenced from sw.ts).

**Layer 3 — tighten session-expiry polling near expiry (useSessionMonitor)**

TanStack Query refetchInterval is now a callback that returns 30s when

expires_in < 5min, else 5min. Without this the SessionExpiringModal

countdown could be up to 4m59s stale relative to the actual cookie lifetime.

**Test infrastructure**

test-setup.ts beforeEach resets the auth-expired latch since vitest's

per-file isolation is insufficient within a file. Updated SW test mocks for

the new NavigationRoute + NetworkFirst imports. Updated modal tests to

assert the outpost URL shape instead of the old reload-current-path target.

**Verification**

- tsc --noEmit: clean

- npm run build: succeeds; built sw.js confirmed NOT containing index.html

  in the precache manifest; NavigationRoute + NetworkFirst + networkTimeoutSeconds

  + 'navigations' cache name all present in the bundled output

- vitest run: 4054 pass / 64 fail — exact parity with the pre-change baseline

  (the 11 failing files are pre-existing QueryClient/leaflet/useMatch issues

  unrelated to this change)

- audit-violations skill: 0 violations in changed files

Companion change: gitops repo sets config.forwardAuthHeader to

'X-authentik-username' so the backend leaves 'open mode' and the modal UI

is actually reachable in production.

Architect-validated against Authentik upstream source (outpost route registration,

rd= param validation rules, Traefik middleware header emission casing).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…m schema migration

Three waves of hotfixes for the same root-cause class: the strict
JSON decoder in internal/tesla/codec rejects payloads where the
on-wire shape diverges from the declared ValueKind in
cmd/protogen-tesla/emit.go::classifyExplicit. After the
signals-rewrite cutover this caused production dropped payloads on
DriverSeatBelt, PassengerSeatBelt, GpsState, RearSeatHeaters,
HvacAutoMode, HvacPower, HvacFanStatus, and CabinOverheatProtectionTemperatureLimit.

Wave 1 (rubber-duck session 'codec-rewrite-review'):
  DriverSeatBelt / PassengerSeatBelt enum->bool, GpsState->TEXT,
  RearSeatHeaters Float->TEXT. Per-field tolerant override sits
  outside classifyExplicit so legacy firmware on the proto-batch
  path is unaffected.

Wave 2 (rubber-duck session 'codec-audit-findings'): audit of all
260 signals via tmp/audit_signal_types found 3 more lurking bugs:
  HvacAutoMode enum->BOOLEAN (On=>true, Override=>false per architect
    -- Override means user has taken manual control, not auto-active)
  HvacPower enum->BOOLEAN (Off=>false, On/Precondition/OverheatProtect=>true
    -- column means 'HVAC powered/running', not 'user-requested')
  HvacFanStatus Float->TEXT (string passthrough + number->decimal string;
    bool deliberately not supported)
Same audit confirmed 4 TPMS timestamps are false positives
(tire_pressure_writer.writeTimestamp handles float epoch ->
TIMESTAMPTZ) and CabinOverheatProtectionTemperatureLimit was a
genuine deferred mismatch needing a schema change.

Wave 3 (rubber-duck session 'cabin-overheat-migration-design'):
migration 000210 renames climate_snapshots.cabin_overheat_protection_temperature_limit_c
DOUBLE PRECISION to ..._limit TEXT. The _c suffix is dropped per
ADR-004 (SI-unit suffixes reserved for unit-bearing numeric columns).
Codec now canonicalises the proto enum label (Low/Medium/High);
Unknown, numeric, and bool wire shapes drop loudly. Routing.yaml
and climate_writer.go updated to match the new column name.

Counter teslasync_codec_json_coercion_total{field,from} fires only
on successful coercion, never on passthrough, so sustained non-zero
rate per (field,from) means Tesla's wire shape has drifted and
classifyExplicit needs a refresh. Per architect: do NOT also
increment jsonDecodeErrorsTotal on successful coercion (would
conflate drift with errors and page on normal traffic).

Audit tool tmp/audit_signal_types/main.go is left in tmp/ as a
discovery tool, not a CI gate (architect: 'don't make it brittle').
Re-run with 'go run ./tmp/audit_signal_types' from repo root after
any signals-rewrite work to verify no new mismatches.

Post-migration audit state:
  total signals:                       260
  routed:                              286
  mismatch candidates:                 10
    fixed (codec coercion):             6
    false positives (writer-handled):   4
    deferred (schema migration):        0
    *** NEW (action required):          0

Tests: go test ./internal/tesla/... ./internal/api/... -- 11 packages green.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce TelemetryErrorsPanel to render the four UI states for fleet telemetry errors (idle, loading, error, empty/data) and prevent the previous silent-empty-table behavior. Add extractTelemetryErrors and pickString helpers to normalize various Tesla response shapes into a stable UI-friendly TelemetryError shape, and add a TelemetryError type. Refactor FleetTelemetryConfigTool to use the new panel, disable actions when VIN is not selected, and adjust columns/keys accordingly. Export the new panel and type from the devtools index.
Delete ADR-015 and the Phase-50 AI adoption prompt and log artifacts under .github/prompts/db-refactor (adrs/, logs/, phase-50-ai-adoption/ and related helper prompt files). Cleanup of obsolete/refactored prompt/log files to reduce repo clutter and remove deprecated Phase-50 AI-adoption docs.
@atulmgupta atulmgupta merged commit 596950c into main May 17, 2026
4 of 6 checks passed
@atulmgupta atulmgupta deleted the refactor/signals-rewrite-50-0051 branch May 17, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants