pi-pack is a small collection of workflow skills for agents. It was originally developed as a set of skills for the pi harness, but it has since been updated to work with any harness. I kept the name because I liked it.
The goal of this repository is to make human/agent collaboration more reliable by giving agents a shared set of operating procedures for common engineering work: bootstrapping a repo, planning changes, validating behavior, refining and solidifying implementations, tightening architectural boundaries, and capturing durable notes.
Most of the repository lives under .agents/skills/. Each skill is a focused Markdown guide that tells an agent when to use it, what to do, and what to validate before handoff.
Use pi-pack as a reusable toolkit when you want agents to work in a more disciplined, engineering-friendly way.
It helps with things like:
- setting up a repo in a deterministic way
- checking that the baseline environment is healthy before making changes
- publishing the repo's workflow into CI for repeatable remote validation
- planning work before editing code
- following a test-first implementation workflow
- validating browser-visible changes with Playwright
- pressure-testing test coverage with manual mutation checks
- recording discoveries and reusable plans for future sessions
The skills below are grouped by the phase of work they support.
An advanced orchestration skill for autonomous, end-to-end implementation from a standing start. It coordinates the other skills in a full workflow: behavior specification, planning, saving the plan, implementing, setting up the repo, refining the result, checking boundaries, reviewing the diff, reworking feedback, refreshing docs, and capturing durable notes.
Use it when:
- you want the agent to carry a new task from start to finish without asking clarifying questions
- the change is new or non-trivial and benefits from a full workflow
- you want one skill to orchestrate the whole pi-pack toolchain
- you want to demonstrate how the other skills can combine to produce a higher-quality result
Usually prefer other skills when:
- you are working iteratively and want small, local changes
- you want frequent human review and course correction
- the task is simple enough that a full autonomous workflow would be unnecessary overhead
- you are following a normal day-to-day development loop rather than an all-in-one demonstration
Rebuilds deterministic bootstrap scripts and configuration so a repository can be brought into a known-good state for development or testing.
Use it when:
- a repo needs repeatable setup commands
- setup logic has drifted or become inconsistent
- you want a canonical
bootstrap,dev,test, ande2eentrypoint
Prepares and starts the local development environment using the repo's generated bootstrap script.
Use it when:
- you want to bootstrap the repo
- you want to start the dev server
- you want a standard way to run tests or browser tests from a clean setup
Checks whether the repository and local environment are ready before implementation starts.
Use it when:
- you want a baseline readiness check
- you need to confirm the repo can bootstrap and pass its standard checks
- you want to catch environment problems before editing code
Captures the current as-is behaviour of risky, inherited, undocumented, weakly tested, or messy code before changing it.
Use it when:
- you need to observe and document what the system currently does
- you want to create as-is specs or tests before refactoring or fixing behaviour
- you want to protect legacy behaviour while you investigate a risky area
Creates a durable behavior spec in docs/specs/ using Given / When / Then scenarios.
Use it when:
- a change needs a clear behavior contract before implementation
- you want to define observable outcomes for a new feature or request
- you want a spec that can inform later planning and testing
Reconciles docs/specs/ by refreshing active behavior specs and archiving superseded ones.
Use it when:
- specs have drifted from current implementation or tests
- multiple specs overlap and should become one canonical contract
- older follow-up specs should be archived in favor of a cleaner current spec set
- you want one active spec per coherent behavior slice where possible
Produces a concrete implementation plan in read-only mode.
Use it when:
- the request is new or non-trivial
- you want to inspect the repo first and map out the work
- you need a clear sequence of steps before making changes
Guides careful, minimal implementation work with an emphasis on validation and local repo conventions.
Use it when:
- you are ready to edit code
- you want a disciplined implementation workflow
- you need a senior-engineer style approach with tests and checks
Forces a strict red-green loop: write the failing test first, then make the smallest production change needed to pass.
Use it when:
- the change affects behavior
- you want regression coverage before editing production code
- you are implementing a bug fix or feature that should be test-driven
Scaffolds or repairs a minimal CI workflow using the repo's canonical bootstrap, test, build, and e2e commands.
Use it when:
- you need GitHub Actions or similar CI for the repo
- you want pull-request and push checks wired to the repo's existing commands
- you need to repair or simplify a failing or drifting CI workflow
Runs the project's test suite with npm test and reports a concise summary.
Use it when:
- you need the standard test result for the repository
- you want a quick pass/fail signal from the main test command
Runs the repository's end-to-end browser tests and validates browser-visible flows with Playwright.
Use it when:
- a change affects routes, forms, navigation, or UI state
- you need real browser validation rather than just unit tests
- you want the repo's own e2e suite to confirm the flow
Provides a browser-validation workflow using the Playwright CLI for browser-visible changes.
Use it when:
- you need to validate a UI change manually in the browser
- the problem needs snapshot, console, network, or tracing investigation
- you want the lightest useful browser check before handoff
Applies manual mutation testing to check whether the current tests would fail if a targeted behavior were broken.
Use it when:
- you want to verify that tests really protect a recent change
- you are reviewing a risky area and want to find weak spots in coverage
- you need a small, manual check that simulates likely bugs one at a time
Performs a careful code review of the current changes and reports findings by severity.
Use it when:
- you want a second set of eyes on a change before merging
- you want risks, bugs, and design issues called out clearly
- you want review feedback grouped by severity
- you want a concise final summary of issues to address
Updates an existing implementation in response to review feedback, CI failures, static analysis findings, security advisories, or similar external feedback.
Use it when:
- you need the smallest correct change that resolves feedback
- you want to preserve the existing implementation intent
- you are responding to reviewer comments or failing checks
- you want a narrow fix rather than a broad redesign
Refactors existing code to read clearly, cleanly, and almost narratively without changing behavior.
Use it when:
- the code works, but the main story is buried in the details
- you want to improve names, reduce nesting, and simplify control flow
- you need behavior-preserving cleanup focused on readability and intent
- you want to make a module easier to read and trust
Refactors an existing implementation toward SOLID design and smaller single-responsibility files without changing behavior.
Use it when:
- the code works, but one or more files are doing too much
- responsibilities are mixed together and the boundaries are unclear
- you want to move toward SOLID design in a pragmatic way
- you want smaller files, narrower modules, and cleaner dependency direction
Checks an existing codebase and refactors it toward clearer architectural boundaries, feature-first layered folders, and boundary validation without changing behavior.
Use it when:
- transport, application, domain, and persistence logic are mixed together
- folder structure does not match the architecture the code is trying to express
- boundary validation is happening too late or too deep
- raw input or infrastructure types leak into core logic
- you want to split a codebase into clearer layers and modules
Reworks an existing bug fix into a cleaner, smaller, more maintainable solution.
Use it when:
- a previous fix works but feels clunky or workaround-heavy
- you want to simplify a draft solution after the first pass
- you are improving an existing patch rather than fixing the bug for the first time
Creates durable journal entries for important learnings, traps, workarounds, and discoveries.
Use it when:
- you uncover a non-obvious issue worth saving
- you learn a workflow trap future agents should avoid
- you want to preserve a practical lesson from the current session
Saves a plan verbatim into docs/plans/.
Use it when:
- you have already written a good plan
- you want to persist it exactly as written for later reference
- you need a dated plan file to keep work organized
Creates or updates an Architecture Decision Record for a significant technical decision.
Use it when:
- an implementation introduces or confirms an important architectural choice
- you need to capture the context, decision, and consequences for future maintainers
- the change affects boundaries, dependencies, runtime choices, or other long-term trade-offs
Refreshes the repository's top-level documentation from the current codebase and supporting context.
Use it when:
- the README, architecture docs, repo map, AGENTS guidance, or ADRs are stale
- the repo's current shape has changed and the docs need to catch up
- you want the documentation rewritten from visible sources of truth
A common workflow looks like this:
yolo— optional advanced orchestration for end-to-end autonomous delivery when that makes sense; includes BDD, planning, saving the plan, and implementationinit— make setup deterministicenvorpreflight— prepare and check the repoasis+bdd+plan— characterize current behaviour, define behavior, and map out a pathbuild+tdd— implement the change with testsci,test,e2e,playwright, ormutate— validate the result and pressure-test coverage when neededrespec— reconcile drifting or overlapping specs into a clean current set when neededcleaner— improve readability and narrative flow after the first implementation landssolidify— simplify and split responsibilities after the first implementation landsreview— inspect the current diff and report issues before mergingrework— apply feedback, CI results, mutation findings, or analysis findings with the smallest safe changeboundit— reshape the code into clearer layers and boundary-aware foldersrefix— simplify if the first fix was roughadr,journal,saveplan, orredoc— preserve useful context
That sequence is not mandatory, but it reflects the way these skills are meant to fit together.
- These skills are intentionally opinionated.
- They are written to help agents act more like careful engineers and less like generic chatbots.
- The repository is meant to be lightweight: the value is in the workflow guidance, not in a large application codebase.
This project is licensed under the MIT License. See the LICENSE file for details.