A demonstration of the SpecOps methodology applied to the IRS Direct File project, showing how AI agent instruction sets can be used to extract and document institutional knowledge from government tax systems.
We've completed an initial demonstration of the SpecOps methodology using three different AI models (GPT-5, Gemini 2.5 Pro, and Claude Sonnet 4.5) to generate specifications from IRS Direct File code samples.
View the full evaluation findings →
| Specification | Model | Grade |
|---|---|---|
| Standard Deduction | GPT-5 | A |
| Dependent Qualification | Gemini 2.5 Pro | B |
| Tax Bracket Calculation | Claude Sonnet 4.5 | A- |
All three specifications successfully extracted business logic in plain language suitable for domain expert review. See the FINDINGS.md for detailed analysis against success criteria.
This repository is designed for you to replicate and extend the demonstration. Use the skills and examples provided to test the SpecOps methodology with your own AI models and evaluate the results.
This repository demonstrates how to create reusable AI agent instruction sets (skills) for analyzing tax system code and generating human-verifiable specifications. Rather than directly transpiling code, SpecOps focuses on preserving institutional knowledge in specifications that domain experts can review.
The IRS Direct File project is an excellent demonstration case because:
- Complex business logic: Interprets the Internal Revenue Code (26 USC)
- Domain expert verification: Tax policy experts can verify specifications
- Multi-technology stack: TypeScript, Scala, Java, JavaScript
- Public visibility: Well-known government project (4.5k+ GitHub stars)
- Institutional knowledge: Tax calculation rules that need preservation
spec-ops-demo/
├── FINDINGS.md # Evaluation of generated specifications
├── skills/ # AI agent instruction sets
│ ├── tax-logic-comprehension.md # Understanding tax code patterns
│ ├── scala-fact-graph-comprehension.md # Analyzing Fact Graph logic
│ ├── standard-deduction-calculation.md # Standard deduction skill
│ └── dependent-qualification-comprehension.md # Dependent rules skill
│
├── examples/ # Code samples from Direct File
│ ├── standard-deduction/ # Standard deduction logic
│ ├── dependent-qualification/ # Dependent rules
│ └── fact-graph-sample/ # Knowledge graph examples
│
├── specifications/ # Generated specifications
│ ├── standard-deduction-spec.md # Generated by GPT-5
│ ├── qualifying-dependent.md # Generated by Gemini 2.5 Pro
│ └── fact-graph-tax-bracket-spec.md # Generated by Claude Sonnet 4.5
│
└── README.md # This file
Review the instruction sets in skills/ to see how AI agents are guided to:
- Analyze tax system code
- Extract business logic
- Generate plain-language specifications
- Document institutional knowledge
Start with tax-logic-comprehension.md - this is the foundation skill that other specialized skills extend.
Look at the code samples in examples/ - these are real excerpts from IRS Direct File showing:
- Tax calculation logic
- Fact Graph reasoning patterns
- Test-driven business rules
See the specifications/ directory for examples of specifications generated using the skills, showing how complex tax logic is translated into human-readable documentation.
To replicate this demonstration:
- Load the parent skill (
skills/tax-logic-comprehension.md) into your AI agent - Load the specialized skill for your target domain
- Point the agent at the example code in
examples/ - Generate a specification following the skill templates
- Evaluate against the success criteria
Use the evaluation framework in FINDINGS.md to assess your generated specifications against the same criteria.
This demo illustrates key SpecOps phases:
Phase 1: Discovery - Identify target system components (e.g., standard deduction logic)
Phase 2: Specification Generation - Use AI with custom instruction sets to analyze code and generate initial specifications
Phase 3: Verification - Domain experts (tax policy professionals) review and validate specifications
Phase 4: Implementation - Use verified specifications to guide modern implementations
- The Specification is the Source of Truth - Specifications capture what the system does, independent of implementation
- Domain Experts Are the Arbiters - Tax experts verify specs, not code
- AI Assists, Humans Verify - AI analyzes code; humans validate accuracy
- Reusable Skills - Instruction sets work across different tax systems
- Tax Logic Comprehension: Understanding IRC references, tax calculations, and form dependencies
- Standard Deduction Calculation: Documenting standard vs. itemized deduction logic
- Scala Fact Graph Analysis: Analyzing declarative knowledge graph structures and XML-based business rules
- Dependent Qualification Rules: Capturing the five tests for qualifying child and four tests for qualifying relative
- SpecOps Methodology: https://spec-ops.ai
- SpecOps Repository: https://github.com/mheadd/spec-ops
- IRS Direct File: https://github.com/IRS-Public/direct-file
- GitHub spec-kit: https://github.com/github/spec-kit
This is a demonstration repository showing how SpecOps can be applied. The skills developed here are intended to be:
- Portable: Usable across different tax systems
- Shareable: Applicable to state and federal tax modernization
- Extensible: Templates for developing additional skills
- Run your own evaluation: Use different AI models and compare results
- Improve the skills: Suggest enhancements to instruction sets based on your findings
- Add new examples: Extract additional code samples from IRS Direct File
- Develop new skills: Create instruction sets for other tax domains (credits, schedules, etc.)
For questions about the SpecOps methodology, see:
- GitHub Discussions: https://github.com/mheadd/spec-ops/discussions
- Website: https://spec-ops.ai
This project is licensed under the MIT License - see the LICENSE file for details.