This repository stores prompt cases, expected outputs, and recorded runs for testing the agent_scaffold skills bundled with the public mainsequence library.
The structure separates three concerns:
cases/: prompt sets, seeded from the installed SDK versionruns/: model outputs grouped by SDK version, agent, and modelreports/: comparisons and summaries derived from runs
SdkAgentTraining/
├── cases/
│ ├── general/
│ ├── skills/
│ └── sdk/
│ └── <sdk-version>/
│ ├── manifest.json
│ ├── agent_scaffold/
│ │ └── AGENTS.md
│ └── skills/
│ └── <skill-path>/
│ ├── README.md
│ ├── skill.yaml
│ ├── source/
│ │ └── SKILL.md
│ └── cases/
├── docs/
├── reports/
├── runs/
│ └── sdk/<sdk-version>/<agent>/<model>/<timestamp>/
└── scripts/
cases/general/Reserved for prompts that are not owned by one specific skill. This folder is optional and can stay empty.cases/skills/General skill cases that are not tied to one installed SDK version. Use this when you want to test a skill conceptually across versions or keep reusable prompts outside a specific version snapshot.cases/sdk/<sdk-version>/The main training corpus for one installed SDK version.cases/sdk/<sdk-version>/manifest.jsonIndex of the copied skill bundle for that SDK version.cases/sdk/<sdk-version>/agent_scaffold/AGENTS.mdSnapshot of the installed top-levelagent_scaffoldinstructions for that version.cases/sdk/<sdk-version>/skills/<skill-path>/source/SKILL.mdExact installed skill text copied from the library.cases/sdk/<sdk-version>/skills/<skill-path>/skill.yamlMetadata for that skill in that SDK version.cases/sdk/<sdk-version>/skills/<skill-path>/cases/Actual prompt cases for evaluating that specific skill.runs/sdk/<sdk-version>/<agent>/<model>/<timestamp>/One concrete execution run for one SDK version, agent, and model.reports/Summaries, comparisons, and leaderboards generated from run data.scripts/Local helper scripts to populate versioned cases and create run folders.docs/Repository documentation, conventions, and structure notes.
uv sync
uv run python scripts/populate_training_skills.pyThe population script has no arguments. It reads the installed mainsequence and agent_scaffold packages and seeds cases/sdk/<installed-version>/.
- Populate or refresh the versioned skill seed from the installed SDK bundle.
- Add reusable non-version-specific skill cases under
cases/skills/<skill-path>/. - Add version-specific skill cases under
cases/sdk/<sdk-version>/skills/<skill-path>/cases/. - Create a versioned run folder before executing agents.
- Store outputs and evaluation artifacts inside the run folder.
uv run python scripts/populate_training_skills.py
uv run python scripts/create_run.py --agent codex --model gpt-5.4See docs/conventions.md for the case and run format, and docs/sdk-cli-notes.md for the CLI findings that drove this scaffold. See docs/structure.md for the folder-by-folder explanation. See docs/ollama-workflow.md for the local model testing workflow.