fix(auto-updater): suppress transient network errors on background polls#223
Merged
ghinkle merged 2 commits intoMay 9, 2026
Merged
Conversation
Reporter on Windows is behind a network that cannot resolve the GitHub release feed. The auto-updater polls every hour, every poll fails with `net::ERR_NAME_NOT_RESOLVED`, every failure fires `update-toast:error`, and the user has no way to opt out short of disabling network access. `AutoUpdaterService.error` handler now uses the existing `classifyUpdateError` helper (matches `enotfound`, `timeout`, `econnrefused` etc.) and suppresses the user-facing toast when ALL three apply: - error stage is `check` (not download) - error type is `network` - check was the hourly auto-poll, not a manual Check for Updates click Error is still logged via `log.error`, still sent to analytics under `update_error`, and still relayed to renderer subscribers via the `update-error` channel. Manual checks still surface so user clicks get feedback. Download errors still surface so mid-download failures are not swallowed. Fixes #56.
Collaborator
Author
ademczuk
added a commit
to ademczuk/nimbalyst-1
that referenced
this pull request
May 20, 2026
…s network electron-updater runs through Electron's Chromium net stack, which reports connectivity failures as `net::ERR_*` strings. `classifyUpdateError` only matched Node-style errors (enotfound / econnrefused / timeout / "network"), so `net::ERR_NAME_NOT_RESOLVED` and friends fell through to 'unknown'. The background-poll toast suppression added in nimbalyst#223 is gated on `errorType === 'network'`, so it never fired for these - a single failed hourly check (a DNS blip while the machine wakes) left an "Update Error: net::ERR_NAME_NOT_RESOLVED" toast on screen. Match the net::ERR_* connectivity family (name-not-resolved, internet-disconnected, network-changed, connection-*, proxy/address failures) in the network branch. net::ERR_CERT_* and net::ERR_SSL_* are deliberately excluded so TLS/certificate failures keep falling through to the signature bucket rather than being silenced as benign network blips. Adds 2 unit tests: the connectivity family classifies as network, and net::ERR_CERT_* stays signature. Follow-up to nimbalyst#223 / nimbalyst#56.
ghinkle
pushed a commit
that referenced
this pull request
May 21, 2026
…s network (#387) * fix(auto-updater): classify Chromium net::ERR_* connectivity errors as network electron-updater runs through Electron's Chromium net stack, which reports connectivity failures as `net::ERR_*` strings. `classifyUpdateError` only matched Node-style errors (enotfound / econnrefused / timeout / "network"), so `net::ERR_NAME_NOT_RESOLVED` and friends fell through to 'unknown'. The background-poll toast suppression added in #223 is gated on `errorType === 'network'`, so it never fired for these - a single failed hourly check (a DNS blip while the machine wakes) left an "Update Error: net::ERR_NAME_NOT_RESOLVED" toast on screen. Match the net::ERR_* connectivity family (name-not-resolved, internet-disconnected, network-changed, connection-*, proxy/address failures) in the network branch. net::ERR_CERT_* and net::ERR_SSL_* are deliberately excluded so TLS/certificate failures keep falling through to the signature bucket rather than being silenced as benign network blips. Adds 2 unit tests: the connectivity family classifies as network, and net::ERR_CERT_* stays signature. Follow-up to #223 / #56. * fix(auto-updater): broaden net:: match to the full DNS + connectivity family The first pass enumerated only NAME_NOT_RESOLVED / CONNECTION_* and missed the entire net::ERR_DNS_* family (DNS_TIMED_OUT, DNS_SERVER_FAILED, DNS_MALFORMED_RESPONSE), ICANN_NAME_COLLISION, NETWORK_IO_SUSPENDED, and SOCKET_NOT_CONNECTED - all transient connectivity failures that should get the same background-poll suppression. Switch to dns_* / network_* / connection_* prefix groups so the whole connectivity surface is covered. net::ERR_CERT_* and net::ERR_SSL_* remain excluded (verified by the existing guard test) so TLS/certificate failures still classify as signature rather than being silenced as benign network blips. Extends the DNS + connectivity assertions in the test. * refactor(auto-updater): keep classifyUpdateError JSDoc attached to the function The CHROMIUM_NETWORK_ERROR const was inserted between the function's JSDoc block and the function, orphaning the doc. Move the const above the JSDoc so the `/** Classify update errors */` block stays adjacent to classifyUpdateError. No behaviour change. --------- Co-authored-by: Andrew Demczuk <5212682+ademczuk@users.noreply.github.com>
ghinkle
added a commit
that referenced
this pull request
May 21, 2026
- Shareable deep links for tracker items. "Copy Link" in the tracker item detail header and in the context menus on the kanban board and the tracker table produces a `nimbalyst://tracker/{id}?orgId={orgId}` URL. The main process resolves the orgId against open and recent workspaces, focuses or creates a window, queues the payload for the renderer; the renderer switches workspace, switches to tracker mode, and opens the detail panel via `setTrackerModeLayoutAtom` (selectedType `all` so the item is visible regardless of its primaryType). Menu entries are single-selection only and hidden when the workspace has no team configured. Fallback notification distinguishes "not signed in" from "no matching workspace".
- Shareable deep links for team documents. "Copy Link" in the shared-document context menu produces a `nimbalyst://doc/{id}?orgId={orgId}` URL that can be pasted in Slack/email and opens the doc in the recipient's app. Main process resolves orgId against open windows (active and rail-warm) and recent workspaces, queues the link payload per workspace so a freshly opened renderer can drain it on listener init, focuses an existing window or creates one for the matching workspace. Renderer switches to collab mode, switches active workspace if needed, and opens the doc by id without waiting for the team's shared-docs list to sync (title backfills automatically).
- Programmable action prompts can now launch a new sibling session in the current workstream instead of prefilling the current input. Per-action config in `ai-actions.md` picks the model, foreground/background, and auto-submit vs prefill behavior. Parser extracts an optional `key:value` config block under each `##` heading (`launch`, `model`, `foreground`, `autoSubmit`, `worktree`); actions without a known leading key keep current behavior. Model field validates via `ModelIdentifier.tryParse` so every model the app accepts is accepted here too, including slash-bearing IDs like `opencode:vendor/name`. Dropdown shows an open-in-new icon next to launcher entries and branches on config; SessionTranscript builds the prompt with the originating-session reference appended via the full UUID. IPC + `MetaAgentService.launchActionSession` spawn the sibling, queuing the prompt when `autoSubmit` is true or leaving the new session's draft prefilled when false. Central listener focuses the new session by driving `selectedWorkstreamAtom` with worktree- and workstream-aware branches.
- File-based tracker items unified with database tracker items. Frontmatter-backed plan files (`design/trackers/*.md`) now show up through the same tracker read model as DB-backed items; plan status and kanban ordering persist through canonical file IDs. MCP visibility and plan mutation regressions are covered by tests.
- Agent transcript flat-list code path removed. Both desktop and iOS now run VList unconditionally; `LazyMount`, `wrapHeavy`, `flatListRef`, `flatBottomSentinelRef`, the flat-list scroll handler and ResizeObserver effects, and the `.rich-transcript-flat-list` / `.scroll-container.flat` CSS are gone. The mobile-vs-desktop VList `bufferSize` difference is preserved.
- `nimbalyst-session-naming` MCP server set to `alwaysLoad: true` so `update_session_meta` stays in the prompt instead of being deferred behind `ToolSearch`. Audit of recent sessions showed ~56% burned their first turn on a `ToolSearch` lookup for the naming tool before they could set the session title and phase.
- Three layered defenses against file-watcher attribution-queue overflows that froze the renderer under multi-session load: hardcoded build-artifact directories (`.build`, `dist`, `node_modules`, etc.) filtered at `WorkspaceEventBus`, `ingestWatcherEvent`, and the bash file-path extractor; per-session cap (`SessionEditQuota`) of 500 distinct edited files hydrated from `session_files` so it survives restart; per-workspace burst throttle (`WorkspaceAttributionThrottle`) token bucket of 20 events with 20/sec refill applied at `ingestWatcherEvent` so codegen/build dumps never reach the queue. Works in pre-git-init workspaces because the throttle is gitignore-independent. (#352, #365)
- Stytch B2B auth recovery made resilient to JWKS key rotation and stale tokens. JWT refresh now uses the actual `exp` claim instead of a 60s "last refresh" throttle, and already-expired JWTs are rejected at the `getJwt` boundary so the WebSocket reconnect loop stops feeding stale tokens to the worker. WebSocket close code/reason/wasClean logged on both index and session rooms so server-side auth rejections stop logging as `[object Object]`. Auth-callback URL params logged from the deep-link handler with sensitive tokens redacted; worker-reported `error` / `error_description` / `stytch_error_type` params surface verbatim instead of silently falling through the missing-`session_token` branch. Stytch auth state centralized in a single atom + IPC listener so consumers stop re-subscribing to `onAuthStateChange`; gutter user icon flips to `no_accounts` in warning color when sync is enabled but the user is signed out. iOS Settings danger zone gets a Sign Out action plus banner/escalation hooks for sustained auth-failure recovery.
- High-volume low-value PostHog events cut to remove ~970k events/week of retry-storm noise. `file_save_blocked_after_delete` and `file_conflict_detected` analytics emits removed (save-block / conflict-detect logic unchanged; ~563k/wk, 1500+ per affected user). `ai_response_streamed` merged into `ai_response_received` and the 1:1 duplicate emit deleted (~106k/wk). `update_toast_shown` deduped to once per distinct `newVersion` per app run (prior guard only suppressed while state stayed `available`). `update_error` background path deduped on `(stage, error_type)` per app run, resetting on a successful check so a recurring error after the network heals is still captured; manual download-failure branch unchanged. `POSTHOG_EVENTS.md`, `FILE_WATCHING_AND_CHANGE_TRACKING.md`, and `WEEKLY_DASHBOARD.md` updated to match.
- System addendum's Interactive User Input section rewritten to nudge agents toward the `PromptForUserInput` widget over chat-based questions: trigger on "about to write a question, list, or draft", inline field-type cheatsheet, and a directive to combine multi-turn questions into one multi-field prompt.
- CLAUDE.md refactored into path-scoped rules and dedicated docs. Repeated guidance from root, electron, and runtime CLAUDE.md extracted into focused docs under `docs/` and `packages/electron/`; path-scoped `.claude/rules/` entries added so each rule loads only where it applies (error handling, naming, database, main process init, end-to-end verification). The three CLAUDE.md files trimmed to high-signal critical rules plus a documentation reference table.
- PGLite lock-staleness check now surfaces an "ambiguous" branch and asks the user via dialog instead of guessing. Follow-up to the closed PR #316: previously the EPERM-with-fresh-timestamp case (lock < 60s old, lock holder PID unsignalable from us) was treated as a live sibling and the launch was refused. On a slow-disk machine where the original Nimbalyst wrote the lock less than 60s before crashing, that produced a false-positive DATABASE_LOCKED that the user could only clear by deleting the lock file from the filesystem. `decideLockIsRunning` now returns a ternary `decision: 'running' | 'stale' | 'ambiguous'` and exposes `lockPid` / `lockAgeMs` for dialog rendering. The 'running' and 'stale' branches behave as before. The 'ambiguous' branch raises a distinct `DATABASE_LOCKED_AMBIGUOUS` error code; `PGLiteDatabaseWorker` catches it and shows a dialog explaining the two scenarios (live sibling vs fast PID reuse) with "Cancel" and "Open Anyway (clear lock)" buttons. The legacy `isRunning` boolean stays on the return shape for backwards compatibility (true for both 'running' and 'ambiguous'). Per @ghinkle's review on the closed PR #316. (#272 follow-up)
- Ambiguous database lock recovery restored end-to-end as a follow-up to PR #325. Worker error metadata is now preserved across the thread boundary so the ambiguous-lock dialog and force-unlock path actually fire instead of being silently downgraded to a generic `DATABASE_LOCKED`. Focused coverage added for worker error round-tripping.
- AI edits to a file open in a custom diff-mode editor (csv-spreadsheet) no longer skip the red/green pending-review diff. Custom editors get file changes through `EditorHost.subscribeToFileChanges`, and that path did not carry the in-flight-diff guard the built-in Lexical/Monaco file-change handler already applies. So the AI edit's own file-watcher echo reached the editor's external-change handler and discarded the pending diff before it could render, and the pre-edit review tag flipped to reviewed within milliseconds. The custom-editor file-change subscription now mirrors the built-in guard and suppresses the raw change while a diff is being applied or a pending AI-edit tag is tracked; the modified content still arrives via the diff request path and the final content via diff resolution. (#328)
- Auto-update error toast no longer fires on transient DNS failures during background polls. electron-updater runs through Electron's Chromium net stack, which reports connectivity failures as `net::ERR_*` strings (most commonly `net::ERR_NAME_NOT_RESOLVED`) that `classifyUpdateError` did not recognise as network errors. Because the #223 background-poll suppression is gated on `errorType === 'network'`, it never fired for these, so a single failed hourly check (e.g. a DNS blip while the machine wakes) left an "Update Error" toast on screen all day. The classifier now matches the `net::ERR_*` connectivity family (DNS resolution, connection, network-state, proxy, address, and timeout failures) while keeping `net::ERR_CERT_*` / `net::ERR_SSL_*` in the signature bucket. Follow-up to #223 / #56.
- Tracker transcript "Tracker Updated" widget no longer crashes on legacy tag value shapes, and now renders all field changes (not just `status` / `priority` / `title` / `owner` / `archived` / `progress` / `tags`). Updates to custom tracker types (incident severity, vendor, description, etc.) previously showed a header with an empty body; the widget now renders a generic from-to row for any unhandled change key. `description` is special-cased to a chars-only summary so entire documents don't render in the transcript. String-shaped tag diffs are covered by a transcript widget test.
- Dragging a file from the file tree or files-edited sidebar into the AI input now inserts `[name](/absolute/path)` instead of `@<workspace-relative-path>`. Codex previously had to guess sibling directories before falling back to cwd-relative, costing multiple shell commands per dropped reference; absolute-path links resolve on first try for any agent.
- Shared tracker bodies now sync end-to-end through the collab Y.Doc, not just local PGLite cache. End-to-end test added covering the shared tracker body sync path against a live collab room; collab sync URL construction centralized in a shared utility to keep tracker and document sync paths aligned.
- Pasted HTML clipboard images stored as assets. Inline `data:image` HTML paste content now rewrites to asset refs so Google Docs-style image pastes don't bloat the document with base64 payloads. Regression coverage added.
- iOS Codex sessions on the app-server transport now display messages. Mobile transcript projection now routes through `CodexRawParserDispatcher` so newer (app-server) sessions get the right parser; previously the SDK parser threw on every output message and the catch swallowed it, leaving only user prompts visible on iOS. Regression test added covering `agentMessage` + `mcpToolCall`. The iOS pre-build script also stops checking the package-local `node_modules/.bin/vite` (npm workspaces hoist it to the root), invokes the workspace-root binary directly, and fails loudly instead of silently shipping a stale transcript bundle.
- Transcript rerenders from other sessions reduced. `SessionTranscript` now subscribes only to its own phase; registry reads remain on-demand for session mention expansion; selection churn from unrelated session activity no longer triggers redraws.
- AIService shutdown no longer dereferences null on late-arriving IPC. `AIService.destroy` stops nulling `sessionManager` / `settingsStore`, which were being read by late IPC invocations during quit.
- E2E selectors unstuck: dropped the dead `.agentic-panel--agent` worktree locator (developer worktrees + blitz alpha enabled in the first describe block so the gated buttons render); datamodellm spec targets `[data-testid="agent-mode-chat-input"]` instead of a multi-match selector that hit both mode textareas; core spec matches `test.md` exactly so the file-tree click no longer collides with `files-mode-test.md` / `agent-mode-test.md`.
- Typecheck fixed: bogus `Promise<TrackerItem[]>` cast on an already-awaited IPC result in `trackerSyncListeners` dropped; `TrackerItem -> Record<string, unknown>` coercion routed through `unknown` so tsc accepts the field-lookup helper; tracker `DocumentService` mocks widened to `Promise<any>` so `mockResolvedValueOnce` accepts TrackerItem fixtures; stale `prompt.test` assertions updated to match the rewritten Interactive User Input section.
- `buildSdkOptions` env tests extended to assert the base flags composed onto every spawned session env (tool-search mode and the client entrypoint label); `CLAUDE_CODE_ENTRYPOINT` restored in `afterEach` alongside the API-key cleanup so the suite leaves the environment as it found it. Test-only.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #56. Reporter on Windows is behind a network that cannot resolve the GitHub release feed. The auto-updater polls every hour, every poll fails with
net::ERR_NAME_NOT_RESOLVED, every failure firesupdate-toast:error, and there is no setting to opt out. The same pattern hits anyone on a captive portal, restrictive corporate firewall, or LAN-only network.Root cause
AutoUpdaterService.setupEventHandlers()atpackages/electron/src/main/services/autoUpdater.ts:158-196firesupdate-toast:errorto the frontmost window for everyerrorevent fromelectron-updater, regardless of:The codebase already has a
classifyUpdateError(err)helper at line 33 that returns'network' | 'permission' | 'disk_space' | 'signature' | 'unknown'. It is currently only used for analytics. This PR uses the same classifier to gate the toast.Fix
In the existing
errorhandler, capturewasManualCheckbefore the reset (the existing code resets the flag without remembering its prior value), then suppressupdate-toast:errorwhen all three are true:What is preserved exactly as before:
log.error('Update error:', err)still firesAnalyticsService.sendEvent('update_error', ...)still fires with the same shapethis.sendToAllWindows('update-error', err.message)still fires (any renderer subscriber listening onupdate-errorcontinues to receive it)checkForUpdatesWithUIsetisManualCheck = trueand still see the toast on failure'check') still see the toastautoUpdater.ts:167is reached before the toast logic and is unchangedupdate-toast:errorwith a literal "Update checking is not available in development mode") is in a different code path (checkForUpdatesWithUI, manual flow), unchangedWhy not add a hide-update-errors setting
Considered. The reporter explicitly asked "Can the update check be disabled somehow?" - meaning they want to opt out. A setting toggle would address that ask but introduces a real footgun: a user toggling it off would also miss download failures, signature failures, and disk-space failures, which are real problems they should hear about. Suppressing only the transient-network-check class targets the actual pain (recurring noise on a network that cannot reach the feed) without hiding errors that warrant action. Open to adding a setting if you'd prefer that direction.
Class-of-bug check
grep -rn "update-toast:error" packages/electron/srcfinds three sites:autoUpdater.ts:191- the auto-poll error handler (this PR's site)autoUpdater.ts:434- download retry failure (different stage, intentionally surfaces)autoUpdater.ts:572,592- manual-check-flow errors (intentionally surfaces, user clicked)Only the first one was the recurring-noise source.
Test plan
npm run typecheck --workspace=packages/electronpassesUpdate error: ...and analytics still recordsupdate_errorCheck for Updateswhile offline, error toast appears (manual check flow preserved)Closes
Fixes #56.