Skip to content

Spec-Kit Clean Architecture: An deterministic evolution for AI-assisted development #298

@thiagobutignon

Description

@thiagobutignon

Hi everyone,

Been working on an extension of spec-kit that addresses some challenges we've faced with AI-powered code generation in production environments. The project is still in active development, but I wanted to share it with the community to get feedback and see if others are facing similar issues.

Repository: https://github.com/thiagobutignon/spec-kit-clean-archicteture

The Problem We're Solving

Current AI coding assistants work as black boxes - you send a prompt, get a huge chunk of code back. This creates several issues in real projects:

  • Same prompt generates different code each time (non-deterministic)
  • No way to review changes atomically
  • Can't rollback specific parts when something breaks
  • AI often ignores existing architecture patterns
  • No quality gates between generation and execution

Our Approach

We built Spec-Kit Clean Architecture on top of the original spec-kit concepts, adding a deterministic workflow that treats code generation like a Traveling Salesman Problem - finding the optimal path through implementation steps.

The key insight: instead of generating all code at once, we create a YAML execution plan first, then execute it step by step with validation at each checkpoint.

Core Workflow

User Concept → Planning Phase → Human Review → Execution Phase → Git Commits

The planning phase generates a YAML file with discrete, atomic steps. Each step is scored using an RLHF system (-2 to +2) based on architecture compliance. If execution fails at step 7 of 20, you can fix just that step and resume - the first 6 steps are already committed.

Technical Implementation

The system uses a Hierarchical Reasoning Model adapted from recent AI research. Think of it this way:

  • Strategic Level: Understands domain boundaries and architecture patterns
  • Tactical Level: Plans specific implementations and dependencies
  • Execution Level: Generates code and runs validations

Each level operates independently but shares context through the YAML state file. This separation lets us maintain Clean Architecture, DDD principles, and TDD practices automatically.

What Makes This Different

  1. Deterministic execution: Same YAML always produces same code
  2. Resumable on failure: State persists in YAML, not in chat context
  3. Atomic commits: Each step is a separate git commit with clear scope
  4. Architecture enforcement: Automatic validation against Clean Architecture rules
  5. Progressive rollback: Can undo specific steps without losing everything

Current State

The framework is functional and we're using it internally for both greenfield and brownfield projects. It currently supports:

  • Domain layer generation with full test coverage
  • Automatic branch creation and PR workflows
  • Pattern learning from successful executions
  • Rollback management for failed steps
  • Architecture violation detection

Looking for Feedback

Particularly interested in thoughts on:

  • Is the TSP approach to code generation overengineering or does it make sense?
  • How are others handling non-deterministic AI outputs in CI/CD pipelines?
  • Would a plugin system for different architecture patterns be useful?
  • Any concerns about the YAML-as-state approach vs other alternatives?

The goal isn't to replace spec-kit but to offer an alternative approach for teams that need more predictability and control in their AI-assisted development workflow. We think both approaches have their place depending on project requirements.

Would love to hear if anyone else has tackled similar challenges or has suggestions for improvement. The project is MIT licensed and we're open to contributions.

Technical Details for the Curious

The RLHF scoring system prevents hallucinations by assigning score 0 (low confidence) when the AI is uncertain, forcing human review. Scores range from -2 (architecture violations) through +2 (perfect implementation with domain documentation). This creates a feedback loop that improves generation quality over time.

The system learns from each execution, building a pattern database specific to your project. Unlike general-purpose LLMs, this means it gets better at following your team's conventions with use.

Thanks for reading. Happy to answer questions or dive deeper into specific aspects if there's interest.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions