CodexGAS is a manual, LLM-driven model governance workflow.
modelgas/ is not executable code. It is a prompt + schema library: • modelgas/skills//skill.md: what to ask the LLM to do (rules + intent) • modelgas/skills//schema.json: the JSON shape the LLM must output (when required) • modelgas/skills//data/: supporting templates and rubric YAMLs
You (or an agent) run the “skills” by feeding the relevant project context to an LLM and saving the resulting artifacts to disk.
⸻
Critical rules for agents and reviewers
- Ignore out/
Do not read out/ for evidence, analysis, or inputs.
out/ contains generated artifacts and may be stale, partial, or from a different run. Treat it as output-only.
Only use input// (and other source folders if explicitly instructed) as inputs/evidence.
- Evidence-only claims
Any assertion, conclusion, or control must be backed by evidence from input//. • Cite the filename (and line range where practical). • If something cannot be supported by evidence, explicitly write “Not evidenced”.
- No fabrication
Do not invent: • files • interfaces • behaviors • controls
If required information is missing, stop and list exactly what is missing.
- Existing vs proposed controls
When discussing controls: • Clearly label existing controls (evidenced in inputs) • Clearly label proposed controls (recommendations only)
Do not blur the two.
⸻
How to run CodexGAS (manual)
- Pick an input project, the user should have told you
Choose an input// directory (example: input/FXChat/).
- Build a single evidence pack
Create out//aggregate.md by concatenating (copy/paste is sufficient): • input//model_description.md (or equivalent) • Relevant source files (.py, .md, configs) from input// • Example scripts that demonstrate expected usage
Notes: • Do not include anything from out/. • Keep the evidence pack focused on purpose, interfaces, behavior, constraints, and failure modes. • Observed outputs may be included as examples only; do not infer undocumented behavior from them.
Evidence pack checklist • Model purpose and non-goals • Entry points / APIs / scripts • Input and output schemas • Determinism controls (temperature, seed, retries) • Error handling and fallback behavior • Example inputs and expected outputs • Production assumptions (auth, rate limits, logging, storage)
⸻
- Run skills (ask the LLM)
Skills must be run in the following fixed order: 1. risk_tiering 2. methodology 3. alw 4. tests 5. opm 6. prod_controls 7. documentation 8. remediation_pack 9. iteration
For each skill in modelgas/skills/<skill_name>/: • Read skill.md for purpose and rules. • Provide the evidence pack plus any relevant data/* files (rubrics/templates).
⸻
JSON outputs (schema-bound)
For the following skills, output JSON that matches schema.json exactly: • risk_tiering • methodology • alw • tests • opm • prod_controls
Rules: • No extra keys • No missing required keys • If a field cannot be populated, use an explicit null or sentinel value as allowed by the schema
Naming convention: • out//skills__.json
Example: • out/FXChat/skills_FXChat_risk_tiering.json
⸻
Final report skills (Markdown)
These skills produce Markdown artifacts, not required JSON.
documentation • Write out//docs/model_doc.md • Use modelgas/skills/documentation/data/doc_template.md • Do not change headings or structure; only fill placeholders
remediation_pack • Write artifacts under out//remediation/ • Include patch instructions, test recommendations, and acceptance criteria
iteration • Write out//human/human_responsibility.md using responsibility_template.md • Write out//plan/plan_next.md using planning_template.md
If optional JSON summaries are produced for these skills, keep them clearly labeled and separate. Markdown artifacts are the required deliverables.
⸻
Document versioning and diffs
If a previous version of any output document exists: • Do not overwrite or destroy it • Create a new version alongside it • Produce a Markdown diff file: • out//diffs/diff_to.md
The diff must: • Clearly show additions, removals, and changes • Preserve the original document intact
⸻
FXChat example (input/FXChat)
Evidence pack contents
Include at least: • input/FXChat/model_description.md • input/FXChat/FXChatProcessor.py • input/FXChat/batch_run_50.py • input/FXChat/example_openai.py • (Optional) input/FXChat/batch_run_50_results.json as observed example outputs
Outputs to produce
JSON (per skill, per schema): • out/FXChat/skills_FXChat_risk_tiering.json • out/FXChat/skills_FXChat_methodology.json • out/FXChat/skills_FXChat_alw.json • out/FXChat/skills_FXChat_tests.json • out/FXChat/skills_FXChat_opm.json • out/FXChat/skills_FXChat_prod_controls.json
Markdown (final reports): • out/FXChat/docs/model_doc.md • out/FXChat/remediation/ • out/FXChat/human/human_responsibility.md • out/FXChat/plan/plan_next.md