Skip to content

refactor(briklab/scripts): notion-based lib, Makefile infra entrypoint, E2E hardening#18

Merged
jeanjerome merged 18 commits into
mainfrom
refactor/briklab-scripts
Jun 5, 2026
Merged

refactor(briklab/scripts): notion-based lib, Makefile infra entrypoint, E2E hardening#18
jeanjerome merged 18 commits into
mainfrom
refactor/briklab-scripts

Conversation

@jeanjerome
Copy link
Copy Markdown
Member

@jeanjerome jeanjerome commented Jun 5, 2026

Summary

Structural refactor of briklab/scripts/ (chantier #30) aligning it with brik's
notion doctrine (<notion>.<submodule>.<verb>, single source of truth, small
focused files), plus a clean split of infra lifecycle from the E2E test CLI and
two E2E reliability fixes found while validating end to end.

No behaviour change for the lab; the same cmd_* flows now sit behind a shared
scripts/lib/ backbone with two thin dispatchers.

What changed

Transverse + lib restructure

  • Extract transverse helpers into briklab.{log,env,http,wait}.* notions; route
    E2E API clients and polling/container probes through the briklab.http.* and
    briklab.wait.* SoT.
  • Unify auth and verify helpers under briklab.*; fold infra-refresh into the
    recovery layer; split the E2E assertion library into core + report.

Setup / infra decomposition

  • Split ArgoCD Application provisioning out of k3d setup (setup/argocd-apps.sh).
  • Drive the GitLab Nexus CI variables from a table; extract the GIT_ASKPASS
    helper; replace eval with "$@" in verify.cmd.

E2E dedup (scenario layer)

  • scenario.sh: hoist the aggregate-report validation tail, the rollback
    release-chain / Jenkins job-wait helpers, centralize deploy/gitops scenario
    classification, and unify the gitops post-run sync check across suites.

Makefile infra entrypoint (lifecycle split)

  • New root Makefile -> scripts/infra.sh owns the lab lifecycle
    (init/start/stop/restart/clean/k3d-*/versions); briklab.sh keeps
    test/config/ops and redirects the moved commands.
  • Move the version generator into the briklab.versions.* notion
    (scripts/lib/versions.sh); make versions / make versions-check.
  • Shared check_prereqs + load_env bootstrap extracted to lib/cli/prereqs.sh.

E2E fixes

  • Authenticate k3d image pulls against the Nexus registry.
  • Jenkins push scenarios: scan the Multibranch job after each push so branch/tag
    pushes get indexed (no webhook for user-owned Gitea repos, no periodic scan).

Docs

  • docs/architecture.md: Entrypoints section, versions workflow, infra-refresh
    guidance, refreshed directory tree, Known Gotchas.
  • New docs/e2e-known-issues.md (Jenkins multibranch scan + GitLab ArgoCD-token
    reset cases).

Test plan

  • shellcheck --severity=error --external-sources over scripts/ (CI parity): clean.
  • make versions-check: artifacts in sync with versions.yml.
  • make help / make versions; briklab.sh redirects moved lifecycle commands.
  • GitLab E2E: 7/7 (node-full, node-deploy-gitops, node-plan-tag,
    node-full-cve, workflow-trunk-main/tag, node-deploy-rollback) after
    infra-refresh.
  • Jenkins E2E: 8/8 including workflow-trunk-main 9/9 and
    workflow-trunk-tag 10/10 (the scan-after-push fix).

jeanjerome added 18 commits June 5, 2026 14:24
…ttp,wait}

Add lib/transverse/{log,env,http,wait}.sh as single-source-of-truth notion
modules: briklab.log.* (colors + leveled logging), briklab.env.* (.env /
versions.env helpers), briklab.http.* (one curl transport, auth passed by the
caller), and briklab.wait.until (one poll-until-ready loop, no eval).

common.sh becomes a backward-compat facade that sources the four modules and
re-exposes the legacy names (log_*, save_to_env, reload_env, load_env,
load_versions, check_http) plus the colors and root-path vars, so every
existing caller keeps working unchanged. Purely additive: no behavior change.
Replace the per-service wait_for_* poll loops (gitlab, gitea, jenkins, nexus),
the ArgoCD port-forward wait, the cli _wait_for_http helper and recovery's
_recover_wait with briklab.wait.until. Route container-state checks through
briklab.check.container_running, fixing infra-refresh's divergent .State.Running
probe. Replace the unsafe kill $(lsof ...) with pkill -f, and give infra-refresh
and infra-verify proper source guards.
…port

Add briklab.http.request (response body + trailing status line, never-fail) and
migrate all five E2E API clients (gitlab, jenkins, gitea, nexus, argocd) off raw
curl onto the shared briklab.http.{get,post_json,delete,code,request} transport.
Auth (PRIVATE-TOKEN / token / Bearer / basic) and per-call options (-X, -L, -o,
-g, cookie jars, --max-time) are passed as arguments. Collapses gitlab-api's
three divergent curl shapes into one transport and removes ~30 duplicated curl
invocations.
Rename the credential helpers (ensure_*) to briklab.auth.* and the
infra-verify / smoke-test helpers (verify_*, check/skip/is_running) to
briklab.verify.*, collapsing the orchestration layer's divergent naming
conventions into the <notion>.<submodule>.<verb> scheme. Definitions and all
call sites move together; no behavior change.
Move token propagation (GitLab CI variables, Jenkins token-staleness restart)
into recovery.sh as briklab.recover.gitlab_ci_vars / jenkins_token, routing
their HTTP through briklab.http.*. infra-refresh.sh now sources recovery.sh
directly instead of reaching down through e2e/lib/auth.sh, and composes the
existing briklab.check.* / briklab.recover.* heals instead of duplicating
per-token check wrappers. No behavior change to the infra-refresh command.
Break the 832-line assert.sh into assert/core.sh (domain-agnostic engine:
counters, lifecycle, generic + JSON + status + job-log assertions) and
assert/report.sh (Brik aggregate-report business outcomes, package/promote,
and deploy-state delegations). assert.sh becomes a thin facade sourcing both.

Drop the "is the lib loaded?" type guards on the delegating assertions: a test
that calls them has already sourced the domain lib, so an undefined call now
fails loudly instead of producing a soft "not loaded" assertion failure.
Normalize the private helpers to the assert._verb convention. No change to the
public assertion API.
The Nexus docker repo enforces basic auth, but the k3d registries.yaml only
declared the mirror endpoint with no credentials, so containerd pulled
anonymously, got 401, and deployed pods were stuck in ImagePullBackOff (the
brik-deploy job then failed). Add a configs: auth block for both the registry
host (nexus.briklab.test:8082) and the mirror endpoint host (brik-nexus:8082,
the host containerd actually contacts), reusing the admin credentials the CI
already uses to push.
…etup

Move the two ArgoCD Application manifests (brik-e2e-gitops, brik-e2e-rollback)
into a standalone setup/argocd-apps.sh that k3d.sh invokes at the end of a
fresh cluster bring-up. The apps can now be re-applied against an existing
cluster without recreating it. The two near-identical manifests are folded
into a single parameterized helper.
Replace the ~17 inline _set_group_variable calls in setup_nexus_ci_variables
with a key|value|masked table iterated in a single loop, and extract the SSH
file-type variable into a _set_group_file_variable helper. The produced set of
group CI/CD variables is unchanged; the count/total accounting is now exact.
The mktemp + printf + chmod block that builds a throwaway GIT_ASKPASS script
was repeated verbatim in four git functions (push, push_tag, push_branch,
trigger_via_push). Fold it into a single e2e.git._askpass_file helper; the
public function signatures are unchanged.
briklab.verify.cmd took a command string and ran it through eval. Take the
command and its arguments directly and run "$@", dropping the eval. The sole
caller (cli/setup.sh) passes its command as separate words.
…ait helpers

The v0.1.0 -> v0.2.0 git chain built verbatim by both rollback callbacks
becomes e2e.git.build_release_chain. The 'wait for a Jenkins job to exist'
poll duplicated across jenkins-test.sh and jenkins-rollback.sh becomes
e2e.jenkins.wait_job_exists, built on briklab.wait.until. No behavior change
beyond the now-silent poll.
…helper

The download + assert.aggregate_v1 + conditional image_tag + conditional
promote_succeeded tail was duplicated near-verbatim in gitlab-test.sh and
jenkins-test.sh. Extract it into e2e.scenario.assert_aggregate (new
lib/scenario.sh); the platform-specific run-id discovery and download command
stay in the test scripts and are passed in. Behavior unchanged.
The deploy taxonomy was globbed in three places: cli/test.sh decided
--with-deploy with *deploy*/*gitops*/*rollback*, and both suites matched
*-deploy-gitops|*-deploy-rollback for the ArgoCD precheck and post-run sync
assertion. Replace them with e2e.scenario.needs_deploy and
e2e.scenario.is_gitops in lib/scenario.sh, so the CLI no longer hardcodes the
E2E naming. Behavior is identical across the current scenario set.
Both suites asserted the ArgoCD sync after a green *-deploy-gitops run -- the
GitLab suite through a _suite_assert_gitops_sync wrapper, Jenkins inline.
Replace both with e2e.scenario.gitops_postcheck, removing the wrapper and the
duplicated is_gitops guard.
Separate infra lifecycle from the E2E test CLI. A root Makefile and a new
scripts/infra.sh thin dispatcher own create/start/stop/restart/clean/k3d and
version-artifact generation; briklab.sh keeps test/setup/status/logs/reset/
preflight and redirects the moved commands.

- Add Makefile (init/start/stop/restart/clean/clean-force/k3d-start/k3d-stop/
  versions/versions-check) delegating to scripts/infra.sh.
- Add scripts/infra.sh lifecycle dispatcher.
- Move generate-versions.sh to scripts/lib/versions.sh as the briklab.versions
  notion (generate/check); resolve its root independently of the dispatcher.
- Extract check_prereqs + the rich load_env into scripts/lib/cli/prereqs.sh,
  shared by both dispatchers.
- Slim briklab.sh: drop lifecycle dispatch, point help at the Makefile.
- Repoint all generate-versions references (versions.yml, docker-compose.yml,
  env.sh, runner-images.sh, README, generated artifact headers) to make versions.
…agation

Reflect the lifecycle split (Makefile/infra.sh vs briklab.sh) and the generated
versions workflow in the architecture guide and README.

- Add an Entrypoints section: Make/infra.sh own the lifecycle, briklab.sh owns
  test/config/ops; both thin over lib/.
- Document make versions / versions-check and the briklab.versions notion.
- Explain when to run infra-refresh: a lab reset rotates the ArgoCD signing key,
  and only infra-refresh propagates a fresh ARGOCD_AUTH_TOKEN to the GitLab CI
  variables; a stale CI token makes brik-deploy fail on argocd app sync.
- Refresh the directory structure (Makefile, infra.sh, lib/versions.sh,
  lib/cli/prereqs.sh, versions.yml/env, generated config artifacts).
- Add Known Gotchas / Troubleshooting entries for the stale ArgoCD CI token.
…index

Push-driven Jenkins scenarios (workflow-trunk-main, workflow-trunk-tag) timed
out with "No build found for SHA" after a lab reset. The multibranch job is
never re-indexed: there is no PeriodicFolderTrigger, and the gitea-plugin only
manages webhooks at org level while brik is a Gitea user, so a freshly recreated
repo has no webhook to notify Jenkins.

Trigger an explicit Multibranch scan right after the push. A branch push then
auto-builds via BranchDiscoveryTrait; a tag push makes the tag sub-job appear so
the existing explicit /build step runs.

- Add e2e.jenkins.scan_multibranch (POST /job/<job>/build).
- Call it after the push in jenkins-test.sh (push mode).
- Document the issue and the GitLab ArgoCD-token reset case in
  docs/e2e-known-issues.md, cross-linked from the architecture gotchas.

Validated live: workflow-trunk-main 9/9, workflow-trunk-tag 10/10.
@jeanjerome jeanjerome self-assigned this Jun 5, 2026
@jeanjerome jeanjerome merged commit 0785ecd into main Jun 5, 2026
2 checks passed
@jeanjerome jeanjerome deleted the refactor/briklab-scripts branch June 5, 2026 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant