Skip to content

merge: integrate sunc behind default-off feature flags#12

Merged
marcus merged 185 commits intomainfrom
codex/sync-flag-framework
Feb 5, 2026
Merged

merge: integrate sunc behind default-off feature flags#12
marcus merged 185 commits intomainfrom
codex/sync-flag-framework

Conversation

@marcus
Copy link
Copy Markdown
Owner

@marcus marcus commented Feb 5, 2026

Summary

This PR brings the sunc sync implementation into main while keeping end-user sync behavior disabled by default.

It introduces a feature-flag framework and wires sync surfaces to explicit gates so we can:

  • ship DB/schema migrations early,
  • continue internal sync testing on mainline,
  • avoid exposing auth/sync CLI and background network behavior to all users yet.

What Changed

1) Merged sync stack from sunc

  • Sync engine, API/server, client commands, migrations, e2e/syncharness coverage, docs, and deployment assets.

2) Added feature-flag framework

  • New registry + resolution logic:
    • internal/features/features.go
    • internal/features/features_test.go
  • Gate map for sync touchpoints:
    • internal/features/sync_gate_map.go
  • Local config persistence for flags:
    • internal/models/models.go (feature_flags)
    • internal/config/config.go
    • internal/config/config_test.go
  • CLI for management:
    • cmd/feature.go
    • cmd/feature_test.go

3) Gated sync command registration (sync_cli, default: off)

  • cmd/auth.go
  • cmd/sync.go
  • cmd/project.go (sync-project)
  • cmd/config.go (sync config surface)
  • cmd/doctor.go

4) Gated autosync hooks (sync_autosync, default: off)

  • Root lifecycle hook framework:
    • cmd/feature_gate.go
    • cmd/root.go
  • Autosync hooks registered via:
    • cmd/autosync.go
  • Monitor periodic autosync path gated in:
    • cmd/monitor.go

5) Gated monitor sync prompt (sync_monitor_prompt, default: off)

  • pkg/monitor/commands.go (checkSyncPrompt now gated with BaseDir context)

6) Test harness compatibility

  • test/e2e/harness.go enables sync flags in test subprocess env so e2e coverage remains valid with production defaults off.

Feature Flags

  • sync_cli (default false)
  • sync_autosync (default false)
  • sync_monitor_prompt (default false)

Resolution priority:

  1. Env overrides (TD_FEATURE_*, TD_ENABLE_FEATURE, TD_DISABLE_FEATURE)
  2. Local project config (.todos/config.json -> feature_flags)
  3. Code default

Emergency kill switch:

  • TD_DISABLE_EXPERIMENTAL=1

Validation

Executed:

  • go test ./internal/features ./cmd ./pkg/monitor
  • go test ./test/e2e
  • go test ./...

All passed.

Rollout Plan

  1. Release with all sync flags default-off.
  2. Monitor migration outcomes and support signals.
  3. Enable flags for internal canary cohorts only.
  4. Roll out progressively (sync_cli -> sync_monitor_prompt -> sync_autosync).

Rollback Plan

If needed, disable all experimental behavior immediately with:

  • TD_DISABLE_EXPERIMENTAL=1

Notes

  • Existing local CLAUDE.md changes were intentionally left untouched.

marcus and others added 30 commits January 30, 2026 21:43
Sync engine foundation: event application, server event log,
client sync helpers, test harness, and 49 tests covering
edge cases, conflicts, idempotency, and convergence.

- internal/sync/types.go: Core types (Event, PushResult, PullResult, etc.)
- internal/sync/events.go: ApplyEvent with SQL injection prevention
- internal/sync/engine.go: Server event log (insert, query, dedup)
- internal/sync/client.go: Client helpers (pending events, apply remote, mark synced)
- test/syncharness/: Multi-client test harness with convergence checks
- 31 unit tests + 18 integration tests (all passing)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Server-side data access for users, API keys, projects, memberships,
sync cursors, and role-based authorization.

- internal/serverdb/ package with schema v1
- API key gen: td_live_ prefix, SHA-256 hash storage, timing-safe verification
- CreateProject atomically creates project + owner membership
- RemoveMember uses transaction to prevent last-owner race condition
- Authorization matrix: owner > writer > reader hierarchy
- 36 tests covering all CRUD, auth matrix, and edge cases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
td-sync server with stdlib net/http, per-project SQLite pool,
Bearer token auth, role-based authorization, and sync endpoints.

- cmd/td-sync/main.go: Server entry point with graceful shutdown
- internal/api/: Server, config, middleware (recovery, request ID,
  logging, auth, max body size), sync push/pull/status handlers,
  project CRUD, membership management
- Per-project database pool with lazy init and WAL mode
- 10 integration tests covering auth, sync, projects, members
- HTTP timeouts, request size limits, error info sanitization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Device authorization flow for CLI authentication without SMTP.
Client starts login, user enters code in browser, client polls
until verified, receives API key.

- internal/serverdb/device_auth.go: AuthRequest CRUD, code generation
- internal/serverdb/schema.go: Migration v2 with auth_requests table
- internal/api/auth.go: login/start, login/poll, verify page/submit
- internal/api/templates/verify.html: Self-contained verification page
- Auto-create users on verification (AllowSignup config)
- 19 tests (12 DB + 7 integration) covering full flow and edge cases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add sync commands (auth, sync-project, sync) with config management,
HTTP client, and local sync state tracking. Includes device auth flow,
project linking, and push/pull with atomic sync state updates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Multi-stage Dockerfile, docker-compose with healthcheck, and
Litestream continuous replication for server.db. File-based
replica by default with optional S3 config.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…Phase 1f)

In-memory metrics (/metricz), context-propagated slog logger with
request ID/user ID/project ID, configurable log format/level via
SYNC_LOG_FORMAT and SYNC_LOG_LEVEL, DB ping health check.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…e 2)

Fixed-window rate limiter (10/min auth, 60/min push, 120/min pull,
300/min other). Exclude_client on pull skips own device events.
td doctor runs connectivity and sync health diagnostics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nagement, tests

- Add sync_conflicts table with overwrite tracking (entity type/ID, local/remote data snapshots)
- Capture old row data in upsertEntity before INSERT OR REPLACE overwrites
- Collect ConflictRecord entries in ApplyRemoteEvents for caller storage
- Add `td sync conflicts` subcommand with --limit and --since filtering
- Enhance td sync output with per-entity overwrite details
- Add member management CLI: td sync-project members/invite/kick/role
- Add member management API client methods (AddMember, ListMembers, UpdateMemberRole, RemoveMember)
- Add 15 integration tests: serverdb membership (7), sync conflict tracking (3), API collaboration (5)
- Add collaboration guide documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…e limit, pagination, exclude_client

- TestConcurrentPush: two goroutines push simultaneously, verify sequential server_seqs and convergence
- TestCrashRecovery: simulate crash by skipping MarkEventsSynced, verify server dedup on re-push
- TestPushRateLimit: 61 pushes with same API key, 61st returns 429
- TestLongSessionPagination: 5000 events pushed in batches, paginated pull verifies completeness
- TestPullExcludeClient: HTTP-layer exclude_client filtering with two users
- Fix harness server DB to use shared cache + WAL for concurrent test support
- Add PushWithoutMark helper to harness for crash simulation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Normalize action_log entity types before push\nAdd tests for normalization and skip unsupported types\nAlign sync entity allowlist with spec\nAdd review report
…ndencies, file links

Board positions, dependencies, and file links now sync across devices using
deterministic SHA-256 IDs derived from composite keys. File paths are stored
repo-relative with normalized separators. Entity types are normalized from
short names to canonical table names before push.

- Add deterministic ID generation (bip_, dep_, ifl_ prefixes)
- Migrations v18 (deterministic IDs) and v19 (repo-relative paths)
- Normalize entity types in sync push (board_position → board_issue_positions)
- Update action_log entries with full row data for new entity types
- Add undo support for board position actions
- 10 new sync harness tests covering all composite entities
- Update sync-client-guide.md with new entity documentation

Closes: td-10f2bb

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
AddLog, AddComment, CreateWorkSession, and UpdateWorkSession now insert
action_log entries so these entities are included in sync push/pull.
Undo filters out non-undoable entity types (logs, comments, work_sessions)
to avoid regressions.

- Raw action_log INSERT inside withWriteLock to avoid deadlock
- GetLastAction/GetRecentActions exclude non-undoable types
- Safety-net case in performUndo for sync-only entity types
- 4 new sync harness tests for activity entity round-trips
- Updated sync-client-guide: work_sessions now synced

Closes: td-b55f9e

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The sync test harness now uses db.BaseSchema() instead of a duplicated
schema constant. This prevents schema drift between the harness and the
real application. Sync-specific tables (action_log extensions, sync_state,
sync_conflicts) are applied on top.

- Export BaseSchema() from internal/db/schema.go
- Replace clientSchema with initClientSchema() using real schema
- Add missing sync_conflicts table to harness
- All existing harness tests pass

Closes: td-183738

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TestConcurrentPush was flaky because two goroutines would simultaneously
call Begin() on the shared-cache in-memory SQLite server DB, causing
"database table is locked" errors. Added sync.Mutex to serialize server
writes in Push and PushWithoutMark, matching SQLite's single-writer model.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New internal/crypto package with X25519 keypair generation, AES-256-GCM
encrypt/decrypt, ECDH+HKDF key wrapping, Argon2id passphrase derivation,
and DEK generation. Foundation for Phase 8 sync encryption.

- 8 round-trip tests covering all operations
- Depends on golang.org/x/crypto (argon2, hkdf)

Closes: td-7dfd23

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
removal actions (remove_dependency, unlink_file, board_unposition,
board_delete, board_remove_issue) were mapped to "update" by default,
preventing deletes from propagating to other clients.

Refs: td-1f6faf

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Server schema v3: user_public_keys, encrypted_private_keys,
project_key_epochs, wrapped_project_keys. Add key_id to events table.

Refs: td-5b4fcc

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Before this fix, board unposition logged action data without the
previous position value, making undo a no-op. Now captures position
via GetIssuePosition before deletion and includes it in action log.

Refs: td-03ee72

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…e3a)

Server now looks up existing server_seq for duplicate events and returns
it in the rejection response. Client treats duplicate rejections as acks,
marking those action_log rows synced. Prevents infinite re-push loop
after crash/network loss during push.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… (td-c995ba)

Added normalizeFieldsForDB() in upsertEntity to convert non-scalar JSON
values to DB-compatible strings. Labels arrays become comma-separated text,
all other arrays/objects become JSON strings. Prevents SQLite driver errors
on pull for issues with labels or handoff payloads.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
AddHandoff now creates action_log rows with full handoff data within
the same write lock. Removed redundant manual LogAction for cascaded
handoffs. Added 'handoff' → 'create' mapping in sync mapActionType
so handoffs sync as creates instead of updates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nc (td-681716)

ApplyRemoteEvents now accepts lastSyncAt parameter and uses
localModifiedSinceSync() to check if the old row's timestamp
post-dates last sync. Overwrites of unchanged data no longer
pollute sync_conflicts. nil lastSyncAt (bootstrap) skips all
conflict recording.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…6f29)

Server: GET /v1/projects/{id}/sync/snapshot replays all events into a
temp SQLite DB with full td schema, streams it as application/x-sqlite3
with X-Snapshot-Event-Id header for cursor positioning.

Client: GetSnapshot() downloads snapshot and returns data + sequence.
Enables fast bootstrap without replaying entire event log.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…eriodic monitor sync (td-88b016)

Adds automatic background sync: PersistentPostRun hook pushes pending events
after mutating commands, and monitor TUI syncs every 30s via callback.
Controlled by TD_AUTO_SYNC env var (default: enabled).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nt wiring (td-816643)

Build snapshot DB with all migrations (not just BaseSchema), add server-side
snapshot caching, and wire client bootstrap path that auto-downloads snapshots
when first syncing projects with many events.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
marcus and others added 27 commits February 2, 2026 21:55
Adds deterministic regression testing infrastructure:
- regression_seeds.json: stores known seeds with metadata (test, args, fixed status)
- run_regression_seeds.sh: runs stored seeds with filtering (--fixed-only, --unfixed-only)

Supports CI integration with JSON output and proper exit codes (1 on regression).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…d-92176a)

- ClearActionLogSyncState: clears synced_at and server_seq, returns rows affected
- CountSyncedEvents: counts action_log entries with synced_at set

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When linking to a different project, prompts user if synced events exist.
Adds --force flag to skip confirmation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds --force flag and interactive prompt to optionally clear action_log
sync state when unlinking, allowing events to be re-pushed to a new project.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests the scenario where a client must re-sync all events after
migrating to a new server. Includes tests for:
- Single client, single event
- Single client, multiple events
- Two clients with convergence verification

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add deploy.sh script for unified deployments (dev/staging/prod)
- Add environment templates in deploy/envs/ with .example files
- Add docker-compose overrides per environment in deploy/compose/
- Add litestream configs per environment in deploy/litestream/
- Update docs to use new deploy system instead of manual rsync
- Add gitignore for .env files and personal runbooks (*.local.md)

Usage:
  ./deploy/deploy.sh dev        # Local development
  ./deploy/deploy.sh staging    # Deploy to staging VPS
  ./deploy/deploy.sh prod       # Deploy to production VPS

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use sql.NullString for id scan and skip rows with NULL/empty id,
logging a warning with rowid for debugging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Migration 26 fixes existing NULL/empty action_log.id values by generating
random 'al-' prefixed IDs, then recreates the table with NOT NULL constraint.
Uses standard table recreation pattern (CREATE _new, INSERT, DROP, RENAME).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add TestGetPendingEvents_NullID that verifies action_log rows with NULL id
are skipped without error or panic while valid rows are still processed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change TestInterleavedSync to use soft_delete instead of hard delete
- Update dumpTable to filter soft-deleted rows for issues and board_issue_positions
- Update test assertion to check deleted_at is set rather than row missing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add case for "board" and "boards" entity types in performUndo switch
- Create undoBoardAction function to handle board create/delete/update undo
- Add RestoreBoard function in db/boards.go for restoring deleted boards
- Handle both logged ("board_create") and backfill ("create") action types
- Exclude builtin boards from backfill to prevent undo errors

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
td create requires titles to be 15+ characters. Fixed several e2e test
titles that were too short:
- rapid-cd-N (10 chars) -> rapid-create-delete-N (21 chars)
- herd-alice-N (12 chars) -> thundering-herd-alice-N (23 chars)
- herd-bob-N (10 chars) -> thundering-herd-bob-N (21 chars)
- cascade-parent (14 chars) -> cascade-conflict-parent-issue (29 chars)
- cascade-child-N (15 chars) -> cascade-conflict-child-N (24 chars)
- dep-cycle-A/B/C (11 chars) -> dependency-cycle-issue-A/B/C (24 chars)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Undo operations now use *Logged DB methods so changes propagate to other
clients via sync. Previously undo performed local DB changes without
creating action_log entries.

Changes:
- undoIssueAction: use DeleteIssueLogged, UpdateIssueLogged, LogAction(restore)
- undoDependencyAction: use RemoveDependencyLogged, AddDependencyLogged
- undoFileLinkAction: use UnlinkFileLogged, LinkFileLogged
- undoBoardPositionAction: use RemoveIssuePositionLogged, SetIssuePositionLogged
- undoHandoffAction: LogAction(delete) after DeleteHandoff
- undoBoardAction: use DeleteBoardLogged, UpdateBoardLogged, LogAction(restore)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add RestoreIssueLogged() to log restore actions
- Map "restore" action type in sync client
- Add restoreEntity() handler in ApplyRemoteEvents

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When two clients add conflicting dependencies simultaneously (e.g., A->B
on client-1 and B->A on client-2), the sync would previously create a
cycle since local validation only sees partial state.

This fix adds cycle detection in applyEventWithPrevious for
issue_dependencies create events. Uses a deterministic rule for
convergence: the edge with lexicographically smaller (issue_id,
depends_on_id) wins. If incoming edge should win, the conflicting
edge is removed.

- Added wouldCreateCycleTx and related helpers for tx-based cycle check
- Added checkAndResolveCyclicDependency with deterministic conflict resolution
- Updated ScenarioDependencyCycle e2e test to cover distributed scenario
- Added unit tests for cycle detection and resolution

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests sync convergence when client clocks differ significantly by
directly manipulating action_log timestamps. Covers:
- Forward skew: client A's clock 5 minutes ahead
- Backward skew: client A's clock 5 minutes behind
- Symmetric skew: A ahead, B behind (6 minute total drift)
- Soft-delete/restore ordering with skewed timestamps

All test scenarios verify convergence passes despite clock drift,
demonstrating the sync protocol handles timestamp-based ordering
correctly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests concurrent sync operations from the same client:
- Scenario 1: Parallel sync commands (multiple td sync in parallel)
- Scenario 2: Rapid-fire syncs (10+ syncs in quick succession)
- Scenario 3: Server consistency and convergence verification

Verifies no duplicate events pushed, sync_state consistency, and
graceful handling of DB lock contention.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add UndoLastAction() helper to syncharness and 7 test scenarios:
1. Undo BEFORE sync - undone event never sent
2. Undo AFTER sync - compensating soft_delete propagates
3. Undo update AFTER sync - reverts to previous state
4. Remote modification then undo - LWW determines outcome
5. Undo restore (re-delete) - restore event propagates
6. Undo already-synced - compensating event propagates
7. Undo then redo - toggles between states

The UndoLastAction helper:
- Finds last non-undone action by rowid (excludes backfill entries)
- Marks action as undone=1
- Inserts compensating event (soft_delete/restore/update)
- Mirrors cmd/undo.go behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --soak flag to test_chaos_sync.sh for extended duration testing
that detects resource leaks. Collects metrics every 30s:
- Memory (Go runtime MemStats)
- Goroutines
- File descriptors (server process)
- SQLite WAL sizes
- Directory growth

New components:
- cmd/debug_stats.go: td debug-stats outputs runtime stats as JSON
- harness.sh: init_soak_metrics(), collect_soak_metrics()
- chaos_lib.sh: verify_soak_metrics(), soak_metrics_summary()
- test_chaos_sync.sh: --soak [duration] flag (default 30m)

Configurable thresholds via env vars (SOAK_MEM_GROWTH_PERCENT, etc).
Writes soak-metrics.jsonl for post-hoc analysis.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add soak/endurance test (--soak mode in chaos_sync)
- Add undo-after-sync tests (7 scenarios in syncharness)
- Add Operations & Lifecycle section
- Update syncharness scenario count (18 -> 25)
- Add verify_soak_metrics() and UndoLastAction() to verification functions
- Move completed items out of gaps section

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…d-1979a8)

Replace process substitution with temp files in chaos_lib.sh to avoid
bash FIFO hangs when calling comm from deeply nested loops. The process
substitution pattern <(echo "$var" | sort) can create stale FIFOs
when called repeatedly in tight loops, causing indefinite hangs.

Fixed in:
- verify_convergence_quick() - 1 occurrence
- verify_convergence() - 2 occurrences

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Go timestamps can be either:
- "2006-01-02 15:04:05.999999 -0700 MST" (with microseconds)
- "2006-01-02 15:04:05 -0700 MST" (no microseconds at exact second)

substr(1, 26) failed on the second format, extracting partial timezone
and causing SQLite datetime() to return NULL, violating NOT NULL constraint.

Changed to substr(1, 19) to extract only "YYYY-MM-DD HH:MM:SS" which works
consistently for both formats.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@marcus marcus merged commit 8dc5fed into main Feb 5, 2026
@marcus marcus deleted the codex/sync-flag-framework branch February 5, 2026 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant