Skip to content

fix(update): make in-app Update reliable — bounded stop hook + untrack build artifact#517

Merged
jaylfc merged 1 commit into
masterfrom
fix/update-reliability
Jun 2, 2026
Merged

fix(update): make in-app Update reliable — bounded stop hook + untrack build artifact#517
jaylfc merged 1 commit into
masterfrom
fix/update-reliability

Conversation

@jaylfc
Copy link
Copy Markdown
Owner

@jaylfc jaylfc commented Jun 2, 2026

Why

Updating taOS via the in-app Update button was failing and leaving the box unreachable. Root-caused on the Pi during a live update. Two independent issues:

1. The graceful-stop hook could stall every restart for ~5 minutes

scripts/taos-graceful-stop.sh POSTed to /api/system/prepare-shutdown with --max-time 320, and the unit allowed TimeoutStopSec=360. When that endpoint hangs (observed: curl (28) timed out after 320001 ms), the service sits in deactivating with port 6969 dead. Since the in-app Update applies the pull then restarts the service, the restart stalls and the update looks like it failed / the box looks down.

Fix: bound the curl to 25s and TimeoutStopSec to 45s. Draining is best-effort; anything slower belongs in an async task, never inline in shutdown.

2. A tracked build artifact dirtied the tree and could block git pull

desktop/tsconfig.tsbuildinfo was tracked but is regenerated on every npm run build, so it perpetually showed as modified and could block the fast-forward pull the update performs.

Fix: git rm --cached it and add it to .gitignore (consistent with static/desktop/, the built bundle, already being ignored).

Notes

  • The quick fix is already applied on the live Pi (script --max-time 25 + a TimeoutStopSec=45 drop-in); this lands it at the source so a reinstall doesn't regress.
  • Follow-up worth filing separately: investigate why /api/system/prepare-shutdown can hang at all.

Summary by CodeRabbit

  • Bug Fixes

    • Optimized service shutdown timeout configuration by reducing timeout durations. Changes prevent the service from remaining stuck during shutdown operations and address potential update and restart issues.
  • Chores

    • Updated version control configuration to exclude incremental build artifacts from tracking.

…k build artifact

Two issues made the in-app Update / restart appear to fail and leave the
box unreachable:

1. taos-graceful-stop.sh curled /api/system/prepare-shutdown with
   --max-time 320 and the unit allowed TimeoutStopSec=360. When that
   endpoint hung, every restart sat in 'deactivating' for ~5min with the
   port dead. Bound the curl to 25s and TimeoutStopSec to 45s so a hung
   drain can never stall a restart.

2. desktop/tsconfig.tsbuildinfo was tracked but is regenerated on every
   SPA build, so it perpetually dirtied the working tree and could block
   'git pull' during an update. Untrack it and add it to .gitignore.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR bounds service shutdown time to prevent update/restart failures by reducing timeouts across systemd and graceful stop, and excludes regenerated TypeScript build metadata from version control.

Changes

Build Artifact Exclusion

Layer / File(s) Summary
Ignore TypeScript build artifact
.gitignore
Adds rule to exclude desktop/tsconfig.tsbuildinfo, a regenerated Vite/TypeScript incremental build artifact that must not be tracked.

Service Shutdown Timeout Bounding

Layer / File(s) Summary
Systemd service shutdown timeout
scripts/systemd/tinyagentos.service
TimeoutStopSec reduced from 360 to 45 seconds to bound service deactivation and prevent hangs during updates. Comment documents the intent and coordination with graceful stop timeout.
Graceful stop script timeout
scripts/taos-graceful-stop.sh
HTTP --max-time reduced from 320 to 25 seconds for POST /api/system/prepare-shutdown to ensure shutdown request cannot outlive systemd timeout window. Best-effort behavior preserved by continuing to ignore failures.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Hops of joy! A graceful stop, so neat,
We bound the shutdown, no more wait,
Timeouts tight, the dance complete,
Build artifacts fade through Git's gate. 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: fixing in-app Update reliability through a bounded stop hook and untracking a build artifact.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/update-reliability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/taos-graceful-stop.sh`:
- Line 10: The curl call in scripts/taos-graceful-stop.sh is hardcoded to
localhost:6969 so it ignores custom TAOS_PORT used by tinyagentos.service;
update that line to build the URL from environment variables (use TAOS_PORT with
a default of 6969 and optionally TAOS_HOST with a default of localhost) so
prepare-shutdown targets the actual controller address used at runtime (keep
--max-time and other flags intact and ensure proper quoting/expansion of
${TAOS_PORT:-6969} and ${TAOS_HOST:-localhost} in the curl invocation).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f4a29111-33b8-459a-b273-6e98f9599173

📥 Commits

Reviewing files that changed from the base of the PR and between 3f04596 and 9dcdc49.

📒 Files selected for processing (4)
  • .gitignore
  • desktop/tsconfig.tsbuildinfo
  • scripts/systemd/tinyagentos.service
  • scripts/taos-graceful-stop.sh
💤 Files with no reviewable changes (1)
  • desktop/tsconfig.tsbuildinfo

# service in `deactivating` with the port dead for minutes — which also makes the
# in-app Update appear to fail, since it restarts the service. Draining must be
# best-effort and quick; anything slower belongs in an async background task.
curl -fsS -X POST --max-time 25 http://localhost:6969/api/system/prepare-shutdown || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop hook still ignores custom controller ports.

scripts/systemd/tinyagentos.service starts uvicorn on TAOS_PORT (Line 33), but this hook is still pinned to localhost:6969. Since scripts/install-server.sh copies this script verbatim, any install with a non-default TAOS_PORT will skip prepare-shutdown entirely, so updates/restarts lose the graceful-drain path this PR is trying to harden.

Suggested fix
# scripts/systemd/tinyagentos.service
@@
-Environment=PYTHONUNBUFFERED=1
+Environment=PYTHONUNBUFFERED=1
+Environment=TAOS_PORT=TAOS_PORT
# scripts/taos-graceful-stop.sh
-curl -fsS -X POST --max-time 25 http://localhost:6969/api/system/prepare-shutdown || true
+curl -fsS -X POST --max-time 25 \
+  "http://localhost:${TAOS_PORT:-6969}/api/system/prepare-shutdown" || true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
curl -fsS -X POST --max-time 25 http://localhost:6969/api/system/prepare-shutdown || true
curl -fsS -X POST --max-time 25 \
"http://localhost:${TAOS_PORT:-6969}/api/system/prepare-shutdown" || true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/taos-graceful-stop.sh` at line 10, The curl call in
scripts/taos-graceful-stop.sh is hardcoded to localhost:6969 so it ignores
custom TAOS_PORT used by tinyagentos.service; update that line to build the URL
from environment variables (use TAOS_PORT with a default of 6969 and optionally
TAOS_HOST with a default of localhost) so prepare-shutdown targets the actual
controller address used at runtime (keep --max-time and other flags intact and
ensure proper quoting/expansion of ${TAOS_PORT:-6969} and
${TAOS_HOST:-localhost} in the curl invocation).

# `deactivating` (which also breaks the in-app Update). Paired with the short
# --max-time in taos-graceful-stop.sh.
TimeoutStopSec=45
Environment=PYTHONUNBUFFERED=1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Stop timeout may be too low for graceful shutdown

Reducing TimeoutStopSec from 360s to 45s may not give the service enough time to shut down gracefully under load, potentially leading to forced termination and data loss. Ensure the application can stop within 45 seconds after the graceful-stop script exits.

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented Jun 2, 2026

Code Review Summary

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
scripts/systemd/tinyagentos.service 46 Stop timeout may be too low for graceful shutdown
Other Observations (not in diff)

No other observations.

Files Reviewed (4 files)
  • .gitignore
  • desktop/tsconfig.tsbuildinfo
  • scripts/systemd/tinyagentos.service
  • scripts/taos-graceful-stop.sh

Reviewed by nemotron-3-super-120b-a12b-20230311:free · 432,824 tokens

@jaylfc jaylfc merged commit 3a155b6 into master Jun 2, 2026
8 checks passed
@jaylfc jaylfc deleted the fix/update-reliability branch June 2, 2026 17:55
@github-project-automation github-project-automation Bot moved this from Todo to Done in TinyAgentOS Roadmap Jun 2, 2026
jaylfc added a commit that referenced this pull request Jun 2, 2026
jaylfc added a commit that referenced this pull request Jun 2, 2026
…ng 6969

Addresses CodeRabbit on #517: a custom-port install would drain a
hardcoded localhost:6969 (silently no-op via '|| true'). systemd already
passes TAOS_PORT into the unit Environment (this PR), so the hook inherits
it and falls back to 6969 only when unset.
jaylfc added a commit that referenced this pull request Jun 2, 2026
… cross-origin CSP) (#518)

* fix(browser): make the proxy work on real deploys (dual-port launch + cross-origin CSP)

The Browser app (#511) never worked on a systemd deploy. Two bugs:

1. The unit's ExecStart ran 'uvicorn tinyagentos.app:create_app' directly,
   which never starts the second browser-proxy origin (that only launches
   via __main__.main() -> _serve_dual_port). So nothing listened on the
   proxy port and pages failed to load. Switch ExecStart to
   'python -m tinyagentos' and inject TAOS_HOST/TAOS_PORT into the unit
   Environment (install-server.sh) so the module entrypoint binds the same
   host/port and runs both origins.

2. Proxied responses set 'frame-ancestors self', which blocks the taOS
   shell (main origin) from embedding the proxy origin (different port =
   different origin) — so even with both origins up, the iframe was CSP-
   blocked. Compute frame-ancestors per-request as "'self' <shell-origin>"
   (same host, main port), so only the shell can frame it (clickjacking
   defence preserved). Requires main_port on app.state and main_port/
   browser_proxy_port added to the proxy origin's shared-state allowlist.

Verified live: example.org renders through the proxy and links are
rewritten to flow back through it. Adds CSP unit tests for both the
single-port ('self' only) and dual-origin (shell origin allowed) cases.

* fix(deploy): graceful-stop hook honours TAOS_PORT instead of hardcoding 6969

Addresses CodeRabbit on #517: a custom-port install would drain a
hardcoded localhost:6969 (silently no-op via '|| true'). systemd already
passes TAOS_PORT into the unit Environment (this PR), so the hook inherits
it and falls back to 6969 only when unset.

* fix(browser): robust host/scheme parsing in _shell_origin (CodeRabbit #518)

- Handle IPv6 literal Host headers: a bare [::1] would be mangled by
  rsplit(':',1) into '[', producing a malformed frame-ancestors CSP that
  breaks framing for IPv6 access. Extract via the closing bracket instead.
- Clamp x-forwarded-proto to http/https (take the first value of a comma
  list) so a hostile/odd proto can't deform the CSP.
- Unit tests for _strip_port covering host:port, bare host, IPv6 ±port, empty.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant