A Framework Based on Fixability and Semantic Transparency
This repository provides a practical diagnostic prompt for diagnosing causes of AI misinference during code generation and evaluating countermeasures (prompt improvements, additional metadata, explicit validation instructions, etc.) that should be given to AI. Developers can use this prompt to obtain diagnostics and countermeasures tailored to their project's specific context.
🚀 Want to diagnose your project now? → Quick Start Guide
Furthermore, as a framework for thinking to theoretically understand why AI misinfers, we devised a language ecosystem evaluation model. This framework provides a novel approach to evaluating programming language ecosystems in the AI coding era. Unlike traditional metrics focused on syntax or performance, our model prioritizes "fixability" — the ability of an ecosystem to support AI's correction loop through rich semantic information.
Key Insight:
In AI-assisted development, language strength is determined not by initial code generation quality, but by the ecosystem's capacity to provide semantic transparency for iterative correction.
This framework was developed as a thought experiment based on practical experience and observations in AI-assisted coding.
Current Status:
- ✅ Structured as a theoretical framework
- 🔄 Empirical validation in progress
- 💬 Community-driven extensions welcome
Design Philosophy:
- Tool-agnostic: Stable across changing AI coding agents and inference models
- Language-agnostic: Applicable regardless of programming language evolution or type (static/dynamic)
- Focus on structure: Identifies mis-inference-prone structures, not specific implementations
- Long-term stability: Principles over concrete tools that rapidly evolve
Scope:
- ✅ Structural vulnerabilities that cause AI mis-inference
⚠️ Non-functional requirements (security, performance) are distributed across the 9 axes as they relate to mis-inference- ℹ️ Project-specific non-functional requirements may require separate evaluation frameworks
Future Direction:
This framework's structural approach can be adapted to identify mis-inference patterns in other domains (security, performance, etc.) as domain-specific evaluation frameworks.
How to Use:
- As a starting point for language selection discussions
- For experimental evaluation in actual projects
- As inspiration for developing your own evaluation criteria
If you want to diagnose situations where AI mis-infers during code generation in your actual project, use these two documents:
diagnostic_prompt.md can be used with any AI model to diagnose your project's language ecosystem.
How to use:
- Copy the prompt
- Add your project context (language, version, major frameworks, etc.)
- Run it through an AI model
- Review the 9-axis diagnostic results
Use the results to:
- Identify points where AI is likely to stumble
- Understand structural vulnerabilities to watch for during code generation
- Inform your AGENT.md or prompt design
samples/ contains examples of diagnostic results for actual language configurations.
Available samples:
- C# + .NET
- Java + Spring Boot
- Python + Django/FastAPI
Referencing samples similar to your project configuration helps you understand what diagnostic results look like.
💡 Tip: For theoretical background, see the "📚 Documentation" section below.
This repository contains six interconnected document types:
1. Theoretical Framework (model.md)
- Purpose: Theoretical foundation with 4-layer architecture and 9-axis evaluation
- Structure: 4 Semantic Layers × 9 Evaluation Axes × Verification Loop
- Audience: Researchers, framework designers, those seeking deep understanding
2. Usage Guide (model-usage-guide.md)
- Purpose: Explains how to use and customize the framework
- Structure: How to use the 3 perspectives (4 Layers, 9 Axes, 7 Phases) and convert to project-specific evaluation axes
- Audience: Framework users, those considering customization
- Key Content:
- Independence and usage of the model's three perspectives
- Recommended usage by user type
- Conversion process to project-specific evaluation axes (7 steps)
- Customization principles
3. Philosophical Principles (philosophy.md)
- Purpose: Summarizes the framework's design philosophy and principles for long-term stability
- Structure: Principles such as neutrality in AI-assisted coding, respect for values, structural focus, etc.
- Audience: Those who want to understand the philosophical background of the framework, designers, researchers
4. Diagnostic Tool (diagnostic_prompt.md)
- Purpose: Practical diagnostic tool that transforms the 4-layer model into 9 diagnostic axes
- Structure: 9 diagnostic axes for identifying AI mis-inference points
- Audience: AI coding assistant users, project teams, developers writing AGENT.md
- Note: Focuses on the 9 axes; the 4-layer representation is abstracted away for practical use
- Important:
⚠️ Any AI model can be used for diagnostics (not limited to specific vendors)- 💡 Use with appropriate context and project-specific information for best results
- 📊 Diagnostic quality depends on the AI model's training data and capabilities
- 🔄 Results may vary between different AI models or versions
5. Diagnostic Results (samples/)
- Purpose: Real-world examples of applying the diagnostic prompt to specific language configurations
- Structure: Language + version + context → 9-axis diagnostic results
- Audience: Users seeking concrete examples, teams evaluating their tech stack
6. Thought Process (model-thought-process.md)
- Purpose: Chronicles the development journey behind this framework
- Structure: A narrative essay tracing the evolution from initial questions to the final 9-axis model
- Audience: Those interested in the reasoning process, framework designers, researchers
- Note: Written in an accessible essay style rather than technical documentation
- Highlights:
- The paradigm shift from "generation quality" to "fixability"
- The TCP/IP analogy for understanding correction loops
- Discovery of the dual perspective (human vs. AI)
- Recognition that syntax is UI for AI
Relationship:
Theoretical Model (model.md)
↓ Explains usage and customization
Usage Guide (model-usage-guide.md)
↓ Explains design philosophy
Philosophical Principles (philosophy.md)
↓ Practical transformation
Diagnostic Prompt (diagnostic_prompt.md)
↓ Applied to real projects
Diagnostic Samples (samples/)
Thought Process (model-thought-process.md)
↓ Explains the reasoning behind
Theoretical Model (model.md)
- English Version - Full framework documentation
- 日本語版 - 日本語ドキュメント
AI coding is not about "producing correct code in one shot."
It's about maintaining a stable correction loop:
Generation → Verification → Feedback → Regeneration
What matters is:
- ✅ The ability to fix errors when they occur
- ✅ The ability to provide AI with information needed for fixes
This requires not just language specifications, but rich semantic information from the entire ecosystem: language, runtime, toolchain, and community.
This framework was developed from 40+ years of programming experience across multiple languages and ecosystems.
The author has witnessed and participated in:
- The evolution of language design philosophies
- Trade-offs made for backward compatibility
- Cultural differences in how communities prioritize features
- The historical context behind major language decisions
Programming languages differ not because one is "better," but because they embody different values and priorities.
Java's Type Erasure
The decision to use type erasure for generics reflects Java's commitment to "write once, run anywhere" — maintaining backward compatibility with existing JVMs was more important than runtime type information. This is Java's pride in ecosystem stability.
C#'s MSIL Changes for Generics
C# chose to modify the runtime (MSIL) to support full generic type information, reflecting a pragmatic approach: preserve backward compatibility when possible, but evolve the platform when necessary for better developer experience.
Python's Breaking Changes
Python's willingness to introduce breaking changes (e.g., Python 2 → 3) prioritizes language evolution and improvement over absolute stability. Progress over compatibility is a cultural value.
- Java protects its ecosystem's long-term stability
- C# balances evolution with practical constraints
- Python embraces progress and modernization
These are value choices, not quality rankings.
This framework identifies how these choices affect AI's correction loop:
- Type erasure → AI needs runtime tests for type validation
- MSIL generics → AI can rely on rich compile-time metadata
- Breaking changes → AI needs version-aware knowledge
These are structural observations, not judgments.
The goal is to help AI work better with each ecosystem's unique characteristics, not to declare one superior to another.
Without understanding the historical context and cultural values:
- We risk misinterpreting design decisions as "mistakes"
- We might try to "fix" what is actually a deliberate trade-off
- We lose the richness that comes from diverse approaches
This framework respects that diversity while helping AI navigate it effectively.
- Layer1 Core Layer - Type system, scope rules, memory model
- Layer2 Service Layer - AST, LSP, static analysis APIs
- Layer3 Dependency Layer - Package management, standard library
- Layer4 Community Layer - OSS, Q&A, best practices
- Axis1 Public Knowledge Availability - Quality and quantity as training data
- Axis2 Static Semantic Consistency - Consistency of types, AST, scopes
- Axis3 Semantic Metadata Richness - Type annotations, LSP, contracts
- Axis4 Semantic Access & Automation - Roslyn, tsserver, Symbol API
- Axis5 Runtime Semantic Continuity - Consistency between implementation and runtime
- Axis6 Dependency Stability - Version management, ABI compatibility
- Axis7 Runtime Specification Conformance - API specs, RFC compliance
- Axis8 Compatibility Culture - Backward compatibility, LTS
- Axis9 Semantic Extensibility - Interfaces, plugins, modules
- Phase1 Static Knowledge - AI training data, language specs, common patterns
- Phase2 Generation - Initial code generation by AI
- Phase3 Static Semantic Verification - Type checking, syntax validation, linter
- Phase4 Startup Check - Build, dependency resolution, startup confirmation
- Phase5 Test Execution - Unit tests, integration tests
- Phase6 Test Feedback - Test results, error messages, traces
- Phase7 Regeneration - Fix based on feedback and loop back to Phase3
flowchart TD
A[Phase1 Static Knowledge]
B[Phase2 Generation]
C[Phase3 Static Semantic Validation]
D[Phase4 Startup Check]
E[Phase5 Test Execution]
F[Phase6 Test Feedback]
G[Phase7 Regeneration]
A --> B
B --> C
C -->|Failed| G
C -->|Passed| D
D -->|Failed| G
D -->|Passed| E
E --> F
F -->|Success| H[End]
F -->|Failed| G
G --> C
Traditional metrics focus on syntax or performance.
This model focuses on how well an ecosystem supports AI's correction loop.
Compilers are redefined as semantic verification engines, enabling unified evaluation of static and dynamic languages.
No ranking or scoring.
Different use cases require different trade-offs.
We're honest: this is a hypothesis based on practical observations, not empirical research.
Use as a starting point for discussions:
- Which axes matter for your project?
- What trade-offs are acceptable?
- How does your ecosystem support AI coding?
Identify gaps in semantic transparency:
- Missing API documentation?
- Inconsistent runtime behavior?
- Poor LSP support?
Use diagnostic_prompt.md with any AI model:
Best Practices:
- Provide context: Include language version, framework, and project-specific details
- Use multiple models: Different AI models may highlight different aspects
- Focus on patterns: Look for structural observations, not rankings
- Update regularly: Re-run diagnostics when dependencies change
Recommended Usage:
Trigger diagnostics:
- At project start
- When dependencies are updated
- When adopting new language features
- During architecture reviews
Test the hypothesis:
- Does "fixability" correlate with AI coding success?
- Which axes have the strongest impact?
- How do different AI tools behave?
This is a snapshot of a thought experiment — published as-is for the community to use and extend.
Feel free to:
- Fork it
- Extend it
- Apply it to your projects
- Share your results
Note:
This repository may not be actively maintained. If you find value in the framework, consider creating your own version or contributing to community discussions.
This is a starting point, not a finished product.
The community is encouraged to build upon it independently.
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) 2025 Masaki Honda
This thought experiment emerged from years of observing various language ecosystems in real-world projects.
Special thanks to:
- The open-source communities whose work made this analysis possible
- Colleagues and clients who provided the practical contexts for these observations
- The broader programming language community for creating diverse ecosystems worth studying
This framework stands on the shoulders of decades of language design, tooling development, and community building across all the ecosystems mentioned.
Read the full framework:
Keywords:
AI-assisted coding, semantic information, language ecosystems, fixability, verification loop, semantic transparency