-
Notifications
You must be signed in to change notification settings - Fork 0
capstone
📖 This page is generated from
capstone/README.md. Edit the source, not the wiki; edits here are overwritten on the next sync. Run the hands-on labs from the repo, linked inline.
⬅ Previous: Module 27. Evals: Trusting an Agent That Acts Without You
One feature, taken end to end, with every module doing its job in sequence. This is the finale: not new material, but proof that the twenty-seven pieces you learned separately are actually one motion. By the end you'll have shipped a real change to
tasks-app, from prompt to running container. The model did the typing. The workflow is what made that safe and repeatable, and the workflow is the part you built.
There's nothing to learn here that the modules didn't already teach. The capstone exists to wire it together. Every step below names the module it comes from, so you can see the dependency chain you climbed now collapse into a single fluent pass. If a step feels unfamiliar, that's a pointer back to the module to re-read, not new content to absorb.
You'll do it twice:
- The main loop. You direct, the AI executes. You file the issue and make the calls; the AI does the git and the edits; you verify each result. The full pipeline, once.
- The stretch variant (optional). The same feature run the Unit 5 way, with autonomous agents inside the pipeline, so you watch the workflow start to run itself.
All of it. Concretely, you need the tasks-app repo in the state the course left it:
- A Git repo (Module 2) with a committed AI instructions file at the root (Module 5), a remote on
your forge (Module 8), and a protected
mainthat requires a PR to merge (Module 11). -
test_tasks.pyand a green test suite (Module 13). - A CI workflow that lints and tests on every push and PR (Module 14), with a security-scan step wired in (Module 15), running on a runner you understand (Module 19).
- A
Dockerfileand.dockerignore(Module 16),serve.pyexposing/healthand/tasks(Module 18),.env/.env.examplefor config (Module 17), and adeploy.shthat tags by commit SHA, injects env, health-checks, and rolls back (Module 18).
If any of those is missing, build it from its module first. The capstone assumes the machine is already standing; it doesn't re-pour the foundation.
Pick something small enough to finish in one sitting and real enough to touch the whole stack. We'll add due dates:
- A task can carry an optional due date:
python3 cli.py add "file taxes" --due <YYYY-MM-DD>. - A new
overduecommand lists pending tasks whose due date has already passed. - The deployed service grows a matching
GET /overdueendpoint, so the change is visible in the running container, not just the CLI.
This deliberately spans the core (tasks.py), the CLI (cli.py), and the deployable service
(serve.py): one feature, three surfaces, exactly the kind of change that used to mean three
copy-paste sessions and a prayer (Module 1). And it has a built-in trap for the review step: "is a
task due today overdue?" is the kind of off-by-one an AI will answer confidently and wrongly.
Read this once as a map before you touch the keyboard. Each arrow is a module.
Prompt → issue (M9). Don't start in your editor. Start with the work written down. File an issue:
"Add optional due dates to tasks, an overdue command, and a /overdue endpoint." Acceptance
criteria in the body. Label it. The issue is the contract the rest of the loop closes against.
Issue → branch (M6/M11). Never work on main. Have the AI branch off main, named for the issue
(something like 47-due-dates). The branch is a sandbox you can throw away wholesale (M6); that
disposability is what lets you turn the AI loose on three files at once without risking main.
Branch → AI implementation (M4), config already in place (M5). Now the AI edits the files
directly in your editor or CLI, with no browser and no paste. It already knows your conventions because the
committed instructions file has been in the repo since the first commit (M5): core logic in
tasks.py, CLI wiring in cli.py, standard library only, run the tests before claiming done. You
didn't re-explain any of that. That's the file earning its keep.
Implementation → tests (M13). The feature isn't done when it runs; it's done when it's pinned.
Have the AI extend test_tasks.py with cases for the new logic, and name the boundary cases
yourself, because the boundary is exactly where the AI guesses: due yesterday (overdue), due tomorrow
(not), due today (not yet), no due date at all (never overdue, never crashes).
Secrets stay clean (M17). This feature needs no new secret; it reads the system clock. The
discipline is that nothing got hardcoded anyway: the service still reads its config from the
environment via .env, and .env.example documents any new keys. The win here is a non-event, and
that is the point. The failure mode (M17: AI hardcodes a value) simply didn't happen, because the
pattern was already there.
Tests → PR (M10/M11). Have the AI push the branch and open the PR, with Closes #47 in the
description so the merge closes the issue automatically (M11). The PR is the review gate even though
it's your own code, and especially because an AI wrote most of it.
PR → CI → security scan (M14/M15/M19). Opening the PR triggers the pipeline on your runner (M19): lint, build, tests (M14), then the security gate (M15): dependency audit, secret scan, SAST. The feature added no dependencies, so SCA should be quiet, and the secret scan confirms you didn't smuggle a key into a fixture. CI catches code that looks right (M14); the security scan catches the failure classes a build check never would (M15).
Review (M10). Green CI is necessary, not sufficient. Read the diff like you didn't write it
(M10). Go straight for the plausibility trap: open overdue() and check the comparison. Did it use
< or <=? Does a task due today show up as overdue? Does a task with no due date crash the
comparison or get silently treated as overdue? This is the single least-automatable skill in the
course, and the capstone is where you prove you have it.
Merge (M11). Once CI is green and the diff is honest, squash-merge. Issue #47 closes itself. main
is now ahead by one clean, tested, scanned commit.
Merge → containerized deploy (M16/M18). The merge to main triggers delivery (M18): CI builds the
image from your Dockerfile (M16), tags it with the new commit SHA (immutable, not latest), runs
deploy.sh to start the container with env injected (M17), polls /health, and rolls back to the
previous SHA if health fails. Hit GET /overdue on the running container. The feature is live, in a
reproducible artifact, behind a health check that can undo itself.
If it goes wrong (M12). Something slips past every gate eventually. Because you squash-merged, the
bad change is one ordinary commit on main, so you direct the AI to revert it and verify the revert
lands as a clean new commit on shared history, without needing the -m 1 flag (M12). A bad deploy is
already handled by deploy.sh's rollback to the last good SHA. Recovery is a move you rehearsed.
That's the whole motion. Notice what carried it: not the model. The model wrote the diff; the workflow is everything that made the diff safe to merge and trivial to undo. Swap the model next quarter and every arrow above is unchanged. That's the Module 1 thesis (the model is the cheap, swappable part; the workflow is the durable skill), and you just lived it instead of reading it.
Starting point (this lab is skip-friendly). The capstone runs the whole loop on one feature. To begin from a clean app, copy the snapshot into a fresh
tasks-appand make the first commit:mkdir -p ~/ai-workflow-course/tasks-app cp -r ~/ai-workflow-course/capstone/lab/start/. ~/ai-workflow-course/tasks-app/ cd ~/ai-workflow-course/tasks-app && git init -b main && git add -A && git commit -m "start: capstone"Already carrying your
tasks-appfrom earlier modules? Keep using it and ignore this box. Lab language: shell + Python, on thetasks-apprepo. You'll direct Claude Code (claude; sub your own agent) to do the git and the edits (M4); you make the calls and verify each result.
You'll need: the tasks-app repo in the prerequisite state above, Claude Code (or your own
agent), your forge account, and a working Docker install.
-
File the issue on your forge. Title: "Task due dates +
overduecommand +/overdueendpoint." In the body, write the acceptance criteria as you'd hand them to a contributor you don't trust to guess:-
addtakes an optional--due YYYY-MM-DD. -
overduelists pending tasks with a due date strictly before today. - A task due today is not overdue. A task with no due date is never overdue.
-
serve.pyexposesGET /overduereturning the same set as the CLI.
-
-
Point Claude Code at the repo and tell it to sync
mainand cut the branch:"Sync
mainwith the remote, then create a branch named47-due-datesfor issue #47." (Use your real issue number.)Then verify it did what you asked:
cd ~/ai-workflow-course/tasks-app git status # on 47-due-dates, clean, up to date with main git branch # the new branch exists and is checked out
-
Give Claude Code the issue, not a vague wish:
"Implement issue #47. Add an optional due date to tasks (core in
tasks.py), wire--dueinto theaddcommand and a newoverduecommand incli.py, and add aGET /overdueendpoint toserve.py. Follow the acceptance criteria exactly. Run the tests before you tell me it's done."You should not have to specify "stdlib only" or "don't touch
tasks.json"; that's in the committed instructions file (M5). If the agent reaches for a date library or hand-edits the JSON, your file is missing a line, and that gap is the useful signal. -
Run it yourself to confirm it's real. Choose the two dates relative to your today (one comfortably in the future, one safely in the past) so the assertion below holds whenever you run this:
python3 cli.py add "file taxes" --due <a date a few months out> # future → NOT overdue python3 cli.py add "renew domain" --due 2020-01-01 # past → overdue python3 cli.py overdue # should list "renew domain", not "file taxes"
Verify-before-publish: refresh the example due dates so the "future" one is still in the future at publish time; a hardcoded near-future date silently inverts this assertion once it passes.
-
Have the AI extend
test_tasks.py, then read the test names and confirm the boundaries are actually covered. If "due today" and "no due date" aren't each their own test, tell the AI to add them by name. Confirm the suite is green:pytest # or: python3 -m unittestOnce it's green, tell the AI to commit the change. Then verify what it actually staged and wrote:
git show --stat HEAD # the right files, with a sensible message git status # nothing stray left uncommitted
-
Tell the AI to push the branch and open the PR, with
Closes #47in the description. Then verify on the forge that the PR exists, targetsmain, and carries the closing keyword:git log --oneline origin/47-due-dates -1 # the branch is on the remote # then open the PR in the forge UI and confirm "Closes #47" is in the description
-
Watch the pipeline run on your runner (M19): lint + tests (M14), then the security scan (M15). Don't proceed until it's green.
-
Review the diff as if a stranger wrote it (M10). Open
overdue()and answer, from the code:- Is the comparison strict (
<today) or inclusive (<=)? A task due today must not appear. - What happens for a task with
due == None? It must be skipped, not crash, not counted.
If either is wrong (and an AI gets at least one of these wrong more often than you'd like), have the AI fix it on the branch, let CI re-run, and review again. Catching this here, before merge, is the entire point of the gate.
- Is the comparison strict (
-
With CI green and the diff honest, squash-merge. Issue #47 closes itself.
-
Let delivery run, or run it locally if that's your setup (M18):
./deploy.sh # builds image tagged by commit SHA, injects env, health-checks, can roll back curl localhost:8000/overdueYou should see your overdue task served from the running container: the feature live in a reproducible artifact (M16), configured from the environment (M17), behind a self-rolling-back health check (M18).
-
Have the AI sync local
mainfirst. The squash-merge in step 9 happened on the forge, so the new commit lives only on the remote and your localmainis one behind. Tell the AI to pullmainand report the SHA of the squash commit you're about to rehearse undoing. Verify:git log --oneline -1 # the top line is your squash commit; note its SHA -
Prove you can undo it, without typing the git yourself. Direct the AI:
"Cut a throwaway branch off
main, revert the squash commit<sha>, run the tests, then delete the branch. The squash merge is a single-parent commit, so confirm a plain revert is correct and that you do not need-m 1."The
-m 1check is the teaching point you carried from Module 12: that flag is only for the two-parent merge commitsgit merge --no-ffmakes, and a squash merge isn't one. Have the AI say which it used and why. Then verify the rehearsal landed and left no mess:git branch # throwaway-revert-test is gone; you're back on main git status # clean
You just confirmed the escape hatch is real before you need it.
The main loop kept you in the driver's seat, directing each step. Now run the identical feature with autonomous agents inside the pipeline and watch how much of the loop keeps running when you step back. Do this only after the main loop succeeded; you can't supervise a pipeline you haven't driven yourself once.
The feature, the branch flow, the gates, and the deploy are unchanged. What changes is who does each step:
-
Issue-to-PR agent does the first pass (M25). Assign the issue to an autonomous agent instead of driving the work step by step yourself. It reads issue #47, creates the branch, implements across
tasks.py,cli.py, andserve.py, writes tests, and opens the PR, all landing as a reviewable PR behind CI, exactly like a human contributor's. It is allowed to propose, never to merge. The supervision is structural: the same CI (M14) and security (M15) gates stand whether the author is a human or an agent. -
An assistive reviewer comments first (M24). Before you look, an AI reviewer reads the diff against your committed rubric and posts comments on the PR, flagging, ideally, the very
overdue()boundary you hunted yourself. It comments; it does not approve and does not merge (M24). A human still decides. You read its comments, then read the diff yourself, and notice the reviewer caught the off-by-one, or notice it missed it, which is its own lesson about not trusting the assistant blindly. -
Evals tell you whether to trust any of it (M27). Turn the boundary cases from Part C into an eval set (due yesterday, due today, due tomorrow, no due date) and score the agent's implementation against it. Now do the thing the whole course was building to: swap the model behind the agent and re-run the same eval. If the new model's
overdue()regresses on the "due today" case, the eval catches it before the PR ever merges. That closes the thesis: evals are how you judge a model swap, so the swap you will make stays safe (M27).
When this runs, look at what's left for you: filing a crisp issue, reading a diff the assistant already annotated, and reading an eval score. The agent drafted, the gates held, the eval judged. The workflow didn't just make AI safe to use; it started running itself, with you supervising. That only works because every catch-net from Units 2–3 was already in place. Take those away and "let an agent open a PR" is reckless; with them, it's just another contributor (M11).
-
A finale is not a shortcut. The loop is fluent because you climbed the modules. Running the
capstone without the foundation (no protected
main, no CI, no tests) isn't "the full loop," it's the copy-paste problem with extra steps. The pipeline's value is entirely in the gates; skip them and you've kept the ceremony and thrown away the safety. -
Green CI is not correctness. Every gate in this loop is a filter, not a guarantee. CI proves the
tests pass; it can't prove the tests test the right thing. The
overdue()boundary trap passes a weak test suite happily. The human review step (M10) is load-bearing and stays load-bearing; the automation raises the floor, it doesn't remove the ceiling. - The stretch variant moves the work, it doesn't delete it. An issue-to-PR agent doesn't reduce the importance of a well-written issue; it raises it, because a vague issue now produces a vague PR with no human in the authoring loop to course-correct. The work shifts from typing toward specifying and judging. That shift is a good one, but it isn't free.
- Evals are only as honest as their cases. An eval set that omits the "due today" boundary will bless a broken model swap. The eval doesn't know what you forgot to test (M27); it can only scale the judgment you already bring to the cases you write.
You're done when:
- You shipped the due-dates feature from a filed issue to a running container, and
curl .../overduereturns the right tasks from the deployed artifact. - Issue #47 closed itself on merge,
mainis one clean commit ahead, and you caught (or consciously verified) theoverdue()boundary in review rather than in production. - You can point at each step and name the module it came from without looking, and explain why the order is the dependency chain, not an arbitrary checklist.
- You can state, from what you just did rather than from the syllabus, why the model is the swappable part: every step would survive replacing the model, and the stretch variant's eval is exactly how you'd prove a swap was safe.
If you ran the stretch variant, add one more: you watched an agent author the PR and an assistant review it, and you can name precisely which catch-nets from earlier units made it reasonable to hand that work to an agent at all.
That's the course. The model wrote the code. You built the workflow that made the code matter, and that's the part that's still yours when the next model ships.
Generated from the ai-workflow-course repo • the model is the cheap, swappable part; the workflow is the durable skill.
Unit 1: Get out of the chat window
- 1 · The Copy-Paste Problem
- 2 · Version Control as a Safety Net
- 3 · Version Control for Words, Not Just Code
- 4 · Getting the AI Out of the Browser
- 5 · Commit the AI's Config, Not Just the Code
- 6 · Branches as Sandboxes for Experiments
- 7 · Worktrees for Running Agents in Parallel
Unit 2: Make it shareable, reviewable, recoverable
- 8 · Remotes and Hosting (GitHub, the Alternatives, and Owning Your Repo)
- 9 · Issues and the Task Layer
- 10 · Reviewing Code You Didn't Write
- 11 · Collaboration: Humans and Agents on One Repo
- 12 · When It Goes Wrong: Revert, Reset, and Recovery
Unit 3: Automate the checking and shipping
- 13 · Testing in the AI Era
- 14 · Continuous Integration
- 15 · Security Scanning for AI-Generated Code
- 16 · Containers and Reproducible Environments
- 17 · Secrets, Config, and Environments
- 18 · Continuous Delivery and Deployment
- 19 · Runners, the Compute Behind the Automation
Unit 4: Extend the AI into your systems
- 20 · MCP Servers, Giving the AI Hands
- 21 · Skills: Teaching the AI Your Playbook
- 22 · Securing Third-Party MCP Servers and Skills
- 23 · Working with Existing Codebases
Unit 5: AI in the Loop
- 24 · Assistive Agents (AI Review and Issue Triage)
- 25 · Module 25. Autonomous Agents: Issue-to-PR and Self-Healing CI
- 26 · Orchestrating Multiple Agents
- 27 · Module 27. Evals: Trusting an Agent That Acts Without You
Finale