feat(ops): daily DB backup schedule, idempotent install via session-start hook#39
Merged
Merged
Conversation
Wires the existing scripts/backup.sh into a macOS launchd job that fires daily at 03:14 local time. Retention: 3 most recent daily_*.sql.gz files (rotation already in backup.sh). Manual snapshots with other prefixes (e.g. pre_v2_backfill_*) are preserved. What this adds -------------- * scripts/com.metazen.agent-memory-backup.plist — launchd template with __PROJECT_DIR__ and __HOME__ placeholders. * scripts/install_backup_schedule.sh — renders the template, copies it to ~/Library/LaunchAgents/, bootstraps the job. Supports --check and --uninstall. Falls back to crontab on non-macOS. * hooks/ensure-services.js — new ensureBackupSchedule() called at the end of Main after services come up. Idempotent: re-installs only when the template is newer than the installed plist or the target is missing. macOS only. Never fails session start (debug-log + swallow). scripts/backup.sh — bug fixes ----------------------------- Pre-existing script hard-coded user=agentmem with no password support, which fails against the actual dev setup that uses DATABASE_URL. Now: - Prefers DATABASE_URL (matches the FastAPI server's DSN). - Falls back to POSTGRES_USER + PGPASSWORD env when DATABASE_URL absent. - Refuses to keep a backup file < 1 KB (catches silent auth failures that would otherwise produce a near-empty .gz). Verified -------- * install_backup_schedule.sh --check reports plist installed + job loaded. * backup.sh produced a 319 MB gzipped dump and rotated correctly. * launchctl list confirms com.metazen.agent-memory-backup is scheduled. Docs ---- * docs/backups.md (new) — operator reference: setup, verification, restore, manual run, disabling, non-macOS fallback. * README.md, handoff.md — short pointers to docs/backups.md.
metazen11
added a commit
that referenced
this pull request
May 14, 2026
…omplete (#43) * HANDOFF.md: replaces stale 'next session = backfill' section with current state — all 7 v2 data-pipeline sub-issues closed (PRs #34/#35/#36/#37/#38/#39/#40/#41/#42 merged). Sole remaining v2 work is #33 (retrain). Includes the actual training procedure to run. Live DB stats (28,599 backfilled tool_calls, 100% linked) and v2 dataset stats (23,983 train rows) captured for the next session. * README.md: adds 'V2 Tool-Call Dataset' subsection under Fine-Tuning Dataset Exports — documents data/processed/qwen25_tools/v2/ shape, build command, source-of-truth tables, and link to the plan doc. * docs/fine_tune/V2_DATA_PIPELINE_PLAN.md: per-step checklist now reflects merged PR status. #31 explicitly marked deferred (not on training critical path). #36 (project consolidation) and the daily backup work flagged as bonus items from the data audit. Co-authored-by: MZ <mz@wfca.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Daily
pg_dumpschedule. Pre-existingscripts/backup.sh(with rotation logic) was already in the repo; this PR wires it into a macOS launchd job that fires at 03:14 daily, AND adds idempotent auto-install via the session-start hook.Not tied to a particular issue — surfaced while taking a manual safety backup before #28 work and noticing that the backup script existed but had never been scheduled.
Summary
scripts/install_backup_schedule.sh— installer with--check+--uninstall. Renders the launchd plist with absolute paths; bootstraps the job.scripts/com.metazen.agent-memory-backup.plist— launchd template (StartCalendarInterval = 03:14 daily;Hour: 3 / Minute: 14— off the :00 mark per scheduling guidance).hooks/ensure-services.js— newensureBackupSchedule()runs at end of Main. Idempotent: re-installs only when the template is newer than the installed plist OR the target is missing. macOS-only. Failures debug-logged and swallowed — never blocks session start.scripts/backup.sh— bug fixes (see below).docs/backups.md— operator reference.scripts/backup.sh bug fixes
The pre-existing script hard-coded
user=agentmemand didn't carry a password, which silently failed against the real dev setup (which usesDATABASE_URLfrom.env).Auth resolution order is now:
DATABASE_URL(preferred — matches the running FastAPI server's DSN).POSTGRES_USER+PGPASSWORDenv var fallback.agentmem/agent_memory/5432.Plus a min-size guard: if the produced file is < 1 KB the script deletes it and exits 1. Catches silent auth-failure cases that would otherwise leave a useless 20-byte
.gzindata/backups/.Test plan
bash scripts/install_backup_schedule.shinstalls the plist and bootstraps the job.bash scripts/install_backup_schedule.sh --checkreports plist installed + job loaded.launchctl list | grep com.metazen.agent-memory-backupshows the loaded job.bash scripts/backup.shproduces a valid 319 MB gzipped dump.daily_*.sql.gzfiles; manual snapshots with other prefixes are preserved.hooks/ensure-services.jsruns throughnode -cparse (already in production every session).data/backups/after tomorrow's run.Verification commands
Rollback
Reverting the commit also reverts the
ensure-services.jshook, so subsequent sessions won't re-install the job. Existing dumps indata/backups/stay; data is never destroyed.Out of scope
crontab -efallback) but not exercised on a real Linux host — needs verification when the second-Mac / Linux deploy lands (issue ops: move agentMemory off Dropbox to local SSD #14).Why not in
#28insteadThis is operational infra, independent of the v2 fine-tune pipeline. Wanted a safety backup before #28's bulk import (which it now has — see
data/backups/pre_v2_backfill_20260513_211653.sql.gz), and discovered the schedule was missing. Decoupling so the backup fix can ship independently.