Skip to content

Substitution safety, portability, deploy refactor, tests#6

Merged
fichte merged 1 commit into
mainfrom
agentic-update
Apr 27, 2026
Merged

Substitution safety, portability, deploy refactor, tests#6
fichte merged 1 commit into
mainfrom
agentic-update

Conversation

@fichte

@fichte fichte commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary

  • Substitution safety: new safe_replace_token (perl s///ge) replaces every sed -i 's;__VAR__;…;g' call site that silently corrupted compose/config files when a value contained ;, &, \, $1, or $2y$10$… — bcrypt hashes were the most-affected. Adds array_contains to fix the substring regex that let "rod" validate against "prod". Fixes the gpd.sh fall-through on invalid --deploy-type.
  • Portability (Linux + macOS + Git-Bash/WSL): new _compat.sh with curl→wget, htpasswd→python3+bcrypt, htpasswd→openssl-apr1 fallbacks. Inline gpd_realpath replaces GNU-only readlink -e. Bash 4+ asserted on startup. _check_openssl.sh swaps GNU-only grep -P for perl. _config_files.sh swaps seq 0 N-1 for array iteration (BSD seq 0 -1 counts down).
  • Deploy refactor: _gpd.sh gets run_in_target/compose_in_target that quote args via printf %q and route locally or via SSH. Every dual-branch in _deploy.sh collapses onto them (333 → 232 lines). Fixes the if ! \$(cmd &>/dev/null) pattern, the CI_REGISTRY_USER checked-twice typo, and adds exponential-backoff retry with fatal-pattern detection in deploy_dry_run.
  • Docs + cleanup: README rewritten with the parent-project layout contract, file formats, three substitution passes, full flag reference, and runtime requirements. CHANGELOG seeded. Dropped the dead -l/--docker-login and -k/--docker-logout flags. Removed htpasswd and wget from required binaries.
  • Tests + CI: 40-case bats suite covering substitution (every sed-killer + perl gotcha), array_contains, every _compat.sh fallback (with mask_command helper), and end-to-end generate against tests/fixtures/parent-stack/. GitHub Actions matrix runs on Ubuntu + macOS (with brew bash + bats-core + zstd + gnu-getopt) plus a shellcheck pass at severity: error.

Test plan

  • bats tests/bats/ locally on macOS — 40/40 green
  • make gpd-generate against the xyxyx/cloud parent stack — passes end-to-end (substitution, all four bcrypt + apr1 password flows, GeoIP download, checksum)
  • CI on this PR: ubuntu bats, macOS bats, shellcheck — all green
  • Manual deploy run against a non-local* SSH target — out of scope here, follow-up before merge

…add tests

Substitution safety
- New functions/_substitute.sh exposing safe_replace_token, a perl
  s///ge helper that treats values as opaque strings. Replaces every
  sed -i 's;__VAR__;...;g' call site that silently corrupted compose
  and config files when a value contained ;, &, \, $1, or $2y$10$.
  The bug was most visible with bcrypt hashes (Portainer, OpenSearch,
  STACK_ADMIN), which sometimes shipped truncated or empty.
- New functions/_array_contains.sh: exact-match argument check. The
  prior `[[ "${arr[@]}" =~ "${needle}" ]]` was a substring regex, so
  `rod` silently validated against an environments file containing
  `prod`.
- gpd.sh: the elif branches for invalid --deploy-type printed an
  error and fell through to the workflow; now they exit 1.
- Drop several useless `$(echo VAR)` subshells that built a name
  string already present in plain text.

Portability (Linux + macOS + Git-Bash/WSL)
- New functions/_compat.sh with gpd_http_get (curl -> wget),
  gpd_bcrypt_hash (htpasswd -> python3+bcrypt), gpd_apr1_hash
  (htpasswd -> openssl passwd -apr1), gpd_htpasswd_create.
- gpd.sh: assert Bash 4+ on startup and inline a gpd_realpath shim
  (readlink -e -> readlink -f -> realpath -> perl Cwd::realpath) so
  the script no longer depends on GNU readlink.
- functions/include.sh: BASH_SOURCE[0] instead of readlink -e $0.
- functions/_check_openssl.sh: replace grep -P (GNU-only) with perl.
- functions/_config_files.sh: replace `for u in $(seq 0 N-1)` with
  `for x in "${arr[@]}"`. BSD seq counts down through -1 instead of
  producing nothing, which made the empty-asset-template path call
  `cp /path/to/asset/ /dest` and explode.
- Drop htpasswd and wget from the required-binary list in
  _generate.sh; both stay preferred when available.

Documentation
- README rewrite: parent project layout contract, file formats, the
  three substitution passes, password-token table, full flag
  reference, runtime requirements (Bash 4+, GNU getopt for macOS),
  platform notes.
- CHANGELOG seeded in keepachangelog format.
- functions/_usage.sh: drop -l/--docker-login and -k/--docker-logout.
  They were parsed into LOGIN_DOCKER/LOGOUT_DOCKER but never read;
  login/logout already happens inline in _deploy.sh.

Deploy refactor (functions/_deploy.sh, functions/_gpd.sh)
- Add run_in_target / compose_in_target to _gpd.sh, routing a command
  to local docker for local* environments or via gpd ssh otherwise.
  Args are shell-quoted with printf %q for the SSH path.
- Collapse every dual-branch local-vs-remote in _deploy.sh onto these
  helpers (333 -> 232 lines, no logic forks).
- Replace `if ! $(cmd &>/dev/null)` with `if ! cmd >/dev/null 2>&1`.
  The old form preserved exit status only because stdout happened to
  be empty under &>/dev/null.
- deploy_login_to_registry: fix copy-paste bug where CI_REGISTRY_USER
  was checked twice instead of CI_REGISTRY_USER + CI_REGISTRY_PASSWORD.
- deploy_logout_from_registry: skip when CI_REGISTRY=null.
- deploy_dry_run: 3 retries with exponential backoff (1/2/4 s) and a
  fatal-pattern grep over stderr (access denied, manifest unknown,
  unauthorized, undefined service, ...) so config errors fail fast
  instead of burning every retry.
- functions/_variables_env.sh: change ${arr[@]} -> ${arr[*]} inside a
  double-quoted string so shellcheck SC2145 stops complaining.

Retries on flaky-network operations
- New -r/--retries=<N> flag (default 3, validated as a positive
  integer) wrapping rsync push, registry login, image pull, and the
  two GeoIP downloads with a new gpd_retry helper in _compat.sh.
  Backoff between attempts is exponential (1s, 2s, 4s, ...).
- gpd_silent companion suppresses stdout/stderr of noisy commands
  (docker login, docker compose pull) so retry warnings stay visible
  while command spam is hidden.
- functions/_gpd.sh rsync mode: switch from `exit 1` to `return 1`
  on transfer failure so callers can re-invoke through gpd_retry.
  Only one caller (_push.sh) and the change is internal.
- The obvious `if "$@"; then return 0; fi; RC=$?` is wrong - bash
  spec: an `if` block with no executed branch exits 0, so RC always
  saw 0. The implementation uses `"$@" || RC=$?` instead and the
  comments call out the gotcha so the next reader doesn't reintroduce
  it.

Tests + CI
- tests/bats: 46 cases covering safe_replace_token (every sed-killer
  + every perl replacement gotcha), array_contains, every _compat.sh
  fallback path (with mask_command in helpers.bash to force the
  fallback), gpd_retry (success first-try, inner exit-status
  preservation, success on attempt N, gives-up-after-MAX, COUNT < 1
  still runs once - sleep stubbed out so tests don't actually wait),
  gpd_silent, and end-to-end generate against tests/fixtures/parent-
  stack/. Fixture is intentionally minimal: one local environment,
  one deploy-type, no nginx/opensearch.
- .github/workflows/test.yml: bats matrix on Ubuntu and macOS (with
  homebrew bash + bats-core + zstd + gnu-getopt installed and
  prepended to GITHUB_PATH) plus an Ubuntu shellcheck pass at
  severity: error. SC2148/SC2034/SC1090/SC1091 excluded.
@fichte fichte merged commit ec476af into main Apr 27, 2026
6 checks passed
@fichte fichte deleted the agentic-update branch April 27, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant