A skills-based coding agent for flexible BIM information retrieval from IFC models using natural language queries.
This project implements a coding agent that uses IfcOpenShell domain knowledge encapsulated as skills to dynamically generate Python code for answering natural language queries about Building Information Models (BIM) encoded in Industry Foundation Classes (IFC) format.
Unlike traditional approaches that rely on predefined domain tools, this agent consults skill documentation to generate appropriate code on-the-fly, enabling flexible responses to diverse and unanticipated queries.
This implementation accompanies the following research paper:
A Skills-Based Coding Agent for Flexible BIM Information Retrieval
2026 European Conference on Computing in Construction (EC3)
Corfu, Greece, July 12–15, 2026
The agent achieves 84.7% overall accuracy on the FNDE-BIM-Bench benchmark (85 queries across 5 IFC disciplines) using GPT-5-mini as the LLM backbone.
- Python 3.10+
- OpenRouter API key (or OpenAI API key)
- IfcOpenShell
# Clone the repository
git clone https://github.com/rrdls/ifc-coding-agent.git
cd ifc-coding-agent
# Create virtual environment
python -m venv env
source env/bin/activate # Linux/Mac
# env\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root:
OPENROUTER_API_KEY=your_openrouter_api_key_here
# Or for OpenAI direct:
# OPENAI_API_KEY=your_openai_api_key_herepython main.py --model gpt-5-minipython main.py --model gpt-5-mini --query "How many walls are in the model?"# Run all queries with a specific model
python benchmark_runner.py --model gpt-5-mini
# Run a single query
python benchmark_runner.py --model gpt-5-mini --query ARQ_E01
# Run a range of queries (0-indexed)
python benchmark_runner.py --model gpt-5-mini --start 0 --end 9
# List available models
python benchmark_runner.py --list-modelsNote: Skill extraction is always enabled. Reusable functions are automatically extracted from successful queries.
# Interactive mode (asks for confirmation)
python reset_experiment.py
# Silent mode (no confirmation)
python reset_experiment.py --yes
# Dry-run (show what would be removed)
python reset_experiment.py --dry-run
# Create backup before reset
python reset_experiment.py --backup| Key | Provider | Model |
|---|---|---|
gpt-5-mini |
OpenRouter | openai/gpt-5-mini |
gpt-4o |
OpenRouter | openai/gpt-4o |
claude-haiku-4.5 |
OpenRouter | anthropic/claude-haiku-4.5 |
claude-sonnet-4.5 |
OpenRouter | anthropic/claude-sonnet-4.5 |
gemini-2.5-pro-high |
OpenRouter | google/gemini-2.5-pro-preview |
gpt-4o-openai |
OpenAI | gpt-4o |
gpt-5-mini-openai |
OpenAI | gpt-5-mini |
Run python benchmark_runner.py --list-models for the complete list.
ifc-coding-agent/
├── main.py # Agent initialization and interactive mode
├── benchmark_runner.py # Benchmark execution runner
├── skill_builder.py # Skill extraction from successful queries
├── skill_tracking_backend.py # Tracks skill access during execution
├── planning_middleware.py # Enforces planning before code execution
├── reset_experiment.py # Reset experiment environment
├── dataset/
│ └── benchmark_dataset.json # FNDE-BIM-Bench (85 queries)
├── projects/
│ └── fnde/ # IFC models (ARQ, ELE, EST, HAF, HEP)
├── skills/
│ ├── ifcopenshell-*/ # Base IfcOpenShell skills
│ └── learned/ # Generated skills library
├── sandbox/ # Generated scripts and plans
└── results/ # Benchmark results
The agent follows the canonical model-tools-instructions pattern:
- LLM Backbone: GPT-5-mini (configurable)
- Generic Tools: Shell command execution, file operations
- Skills Module: IfcOpenShell documentation via progressive disclosure
- Skill Generation: Extracts reusable functions from successful executions
- Code Validation: AST-based analyzer with auto-correction subagent
A custom benchmark with 85 queries across:
- 3 Query Types: DirectLookup (20), FilteredAggregation (36), MultiStep (29)
- 5 IFC Disciplines: Architecture, Electrical, Structural, Cold Water, Sewage
MIT License