-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Design 22
Shipwright's fleet orchestrator (scripts/sw-fleet.sh) currently requires manual configuration of repos in .claude/fleet-config.json. Users managing GitHub organizations with many repos must hand-edit this config for each repo, which is error-prone and doesn't adapt as repos are created, archived, or change activity levels.
Constraints from the codebase:
- All scripts must be Bash 3.2 compatible (no associative arrays, no
readarray, no${var,,}) - Scripts use
set -euo pipefail, atomic file writes (tmp +mv), andjq --argfor JSON - GitHub API calls must respect
$NO_GITHUBenv var (existing pattern across all modules) - The fleet already has a background loop pattern:
fleet_rebalance()runs on an interval viasleepin a backgrounded subshell — the rediscovery loop should follow the same pattern -
gh apiis the standard GitHub client (not rawcurl), used throughoutsw-github-graphql.shandsw-fleet.sh - Fleet config lives at
.claude/fleet-config.jsonwith a known schema (repos[],worker_pool, etc.)
Extend sw-fleet.sh with a discover subcommand and fleet_rediscover_loop() background process.
gh api /orgs/{org}/repos --paginate
→ JSON array of repos
→ Filter: language, pushed_at > activity_days, topics, has_open_issues, !archived, !disabled, !fork (unless --include-forks)
→ Opt-out check: skip repos with "shipwright-ignore" topic
→ Opt-out check: skip repos where gh api /repos/{owner}/{repo}/contents/.shipwright-ignore returns 200
→ Generate fleet-config.json entries (or merge with existing)
→ Output summary / dry-run report
shipwright fleet discover --org <org> [flags]Flags:
-
--org <name>— GitHub org (required) -
--language <lang>— filter by primary language -
--activity-days <N>— only repos pushed within N days (default: 90) -
--topic <topic>— require this topic (repeatable via comma-separated) -
--has-issues— only repos with open issues -
--include-forks— include forked repos (excluded by default) -
--merge— merge discovered repos into existing config rather than overwriting -
--dry-run— print what would be added, don't write config
{
"repos": [...],
"worker_pool": {...},
"auto_discover": {
"enabled": false,
"org": "my-org",
"interval_seconds": 3600,
"filters": {
"language": null,
"activity_days": 90,
"has_issues": false,
"topics": [],
"include_forks": false
}
}
}fleet_rediscover_loop() follows the identical pattern to fleet_rebalance():
- Spawned as a backgrounded subshell from
fleet_start() - Sleeps for
interval_seconds, then callsfleet_discover --org "$org" --merge - On new repos found, writes a
fleet-rediscover.flagfile - Main fleet loop checks for this flag file and calls
fleet_add_repo()to hot-add repos to running daemons - Flag file removed after processing
fleet_add_repo() adds a repo entry to the in-memory config, starts a daemon for the new repo (following existing fleet_start_repo() patterns), and updates the fleet status file. This avoids restarting the entire fleet for newly discovered repos.
Extend the existing status output with a topology section:
- Repos grouped by machine (local vs. each remote)
- Workers allocated per repo
- Active/queued job counts
- Auto-discover: enabled/disabled, last scan timestamp, next scan ETA
-
$NO_GITHUBset →fleet_discover()prints warning and exits 0 (no-op, consistent with other modules) -
gh apifailures → logged viawarn(), discovery aborted for that run, next interval retries - Invalid org / 404 →
error()+ exit 1 for CLI,warn()+ continue for background loop - Rate limiting →
gh apihandles retry headers natively; if pagination fails mid-stream, partial results are discarded (no partial writes) -
.shipwright-ignorefile check failure (network error) → repo is included (fail-open, user can always addshipwright-ignoretopic as the reliable opt-out) - Atomic config writes: write to
fleet-config.json.tmp, thenmvinto place
gh api --paginate handles GitHub's Link-header pagination automatically. For orgs with 1000+ repos, this produces a single concatenated JSON array. We pipe through jq filters in a single pass.
-
GitHub GraphQL API via
sw-github-graphql.sh— Pros: single request for all data including topics, richer filtering server-side, lower API call count. Cons:sw-github-graphql.shis designed for per-repo queries within a known repo context, not org-wide scans; would require new query templates and caching logic; RESTgh api --paginateis simpler and already used in fleet for health checks; GraphQL org queries require different auth scopes. Rejected: unnecessary complexity for the use case. -
Separate discovery script (
sw-fleet-discover.sh) — Pros: smaller files, clear separation. Cons: discovery is tightly coupled to fleet config schema and hot-add; a separate file would need to import fleet internals or duplicate them; the existing fleet script already handles config loading, status, and rebalancing — discovery is a natural extension. Rejected: would create coupling issues without meaningful separation benefit. -
GitHub App / webhook-based discovery — Pros: real-time repo creation events, no polling. Cons: requires a running server or Lambda, GitHub App setup, dramatically increases infrastructure complexity; polling at 1-hour intervals is sufficient for fleet management where daemon startup itself takes seconds. Rejected: over-engineered for the use case.
- Files to create: None
-
Files to modify:
-
scripts/sw-fleet.sh— Addfleet_discover(),fleet_rediscover_loop(),fleet_add_repo(), topology infleet_status(), CLI parsing fordiscoversubcommand,load_fleet_config()updates forauto_discoverblock -
scripts/sw-fleet-test.sh— 13 new test cases covering discover, filters, opt-out, merge, dry-run, rediscovery loop, hot-add, topology display, NO_GITHUB handling -
.claude/CLAUDE.md— Documentfleet discovercommand,auto_discoverconfig keys, topology status output
-
-
Dependencies: None new. Uses existing
gh,jq, standard POSIX tools. -
Risk areas:
-
Pagination memory for large orgs:
gh api --paginateconcatenates all pages into memory. For orgs with 5000+ repos, this could be several MB of JSON. Acceptable for bash fleet management; not a realistic bottleneck. -
.shipwright-ignorefile checks: One API call per discovered repo to check for the file. For 100 repos, that's 100 sequential API calls. Mitigate by checking theshipwright-ignoretopic first (free, already in the repo listing response) and only checking the file for repos that pass all other filters. Consider caching results in the rediscovery loop. - Race condition on hot-add: If rediscovery and rebalancer both modify config simultaneously. Mitigate with atomic writes and flag-file signaling (rebalancer processes flag after its current cycle).
-
--mergecorrectness: Must match repos bypathfield (local repos) or remote URL. Repos already in config should not be duplicated. Usejqto deduplicate by a canonical key.
-
Pagination memory for large orgs:
-
shipwright fleet discover --org test-org --dry-runlists repos without modifying config -
shipwright fleet discover --org test-orggenerates validfleet-config.jsonwith discovered repos -
--language,--activity-days,--topic,--has-issues,--include-forksfilters reduce the repo list correctly - Repos with
shipwright-ignoretopic are excluded from discovery - Repos with
.shipwright-ignorefile are excluded from discovery -
--mergeadds new repos to existing config without duplicating or removing existing entries -
auto_discoverconfig block is parsed byload_fleet_config()and drivesfleet_rediscover_loop() - Background rediscovery loop fires at configured interval and hot-adds new repos via flag file
-
fleet_status()displays topology with repos grouped by machine -
NO_GITHUB=1causes discover to no-op with a warning - All 13 new test cases pass in
sw-fleet-test.sh - All 22 existing test suites continue to pass (
npm test) - No Bash 3.2 incompatibilities (no associative arrays, no
readarray, no${var,,})