Design Concepts

This document explains the architecture and design principles behind Skillful-Alhazen.

Core Philosophy: Curation Over Collection

The system exists to help you make sense of material, not just store it. This distinction is crucial:

Collection = passively accumulating information
Curation = actively interrogating, extracting meaning, building understanding

Every component serves the curation mission. We embody Alhazen's philosophy: be an enemy of all you read.

The Curation Workflow

All skills follow a five-stage workflow:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         CURATION WORKFLOW                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. FORAGING          2. INGESTION         3. SENSEMAKING                   │
│  ┌──────────┐        ┌──────────┐         ┌──────────────┐                  │
│  │ Discover │───────▶│  Capture │────────▶│ Agent reads  │                  │
│  │ sources  │        │   raw    │         │ & extracts   │                  │
│  └──────────┘        └──────────┘         └──────────────┘                  │
│       │                   │                      │                          │
│       ▼                   ▼                      ▼                          │
│  - URLs             - Artifacts            - Fragments                      │
│  - APIs             - Provenance           - Notes                          │
│  - Feeds            - Timestamps           - Relations                      │
│                                                                             │
│                              │                                              │
│                              ▼                                              │
│               4. ANALYZE/SUMMARIZE        5. REPORT                         │
│               ┌──────────────────┐       ┌──────────────┐                   │
│               │ Reason over many │──────▶│  Dashboard   │                   │
│               │ notes over time  │       │  & answers   │                   │
│               └──────────────────┘       └──────────────┘                   │
│                        │                        │                           │
│                        ▼                        ▼                           │
│                   - Synthesis notes        - Pipeline views                 │
│                   - Trend analysis         - Skills matrix                  │
│                   - Recommendations        - Strategic reports              │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Stage Descriptions

Stage	Purpose	Outputs
Foraging	Discover sources of information	URLs, API endpoints, feeds
Ingestion	Capture raw content with provenance	Artifacts with timestamps and sources
Sensemaking	Agent reads and extracts meaning	Fragments, notes, relationships
Analysis	Reason over accumulated knowledge	Synthesis notes, gap analysis
Reporting	Present actionable insights	Dashboards, answers, recommendations

Separation of Concerns

A key architectural principle: scripts handle I/O, Agent handles thinking.

Python Scripts Handle:

Fetching from APIs (pagination, rate limits, bulk operations)
Storing raw artifacts with provenance
TypeDB transactions
Returning structured data to the agent

Agent Handles:

Reading and comprehending content
Extracting entities and relationships
Creating structured notes
Synthesizing across sources
Recommending actions

This separation:

Minimizes token usage (scripts do the heavy lifting)
Maximizes comprehension (agent focuses on understanding)
Enables scaling (scripts handle pagination, bulk operations)

Claude Code as the Interface

You interact through natural conversation. You never:

Write TypeQL queries
Call APIs directly
Manage database transactions
Navigate file structures

The agent translates your intent into actions, choosing the right skill and executing the appropriate operations.

You: "What are my skill gaps across high-priority positions?"

Agent: [Internally]
  1. Query positions with priority:high tag
  2. For each, fetch requirements
  3. Compare against your skill profile
  4. Aggregate gaps by frequency
  5. Link to relevant learning resources

Agent: [Response]
  "Across 3 high-priority positions, the most common gaps are:
   1. Distributed Systems (required in 3/3)
   2. Kubernetes (required in 2/3)
   ..."

TypeDB as Ontological Memory

TypeDB provides a logic-driven knowledge graph. Key properties:

Schema as Conceptual Vocabulary

The schema defines the concepts the agent thinks with. The type hierarchy separates domain objects from information content:

identifiable-entity (abstract root)
├── domain-thing              # Real-world objects: papers, genes, jobs
├── collection                # Typed sets: corpora, search campaigns, case files
└── information-content-entity (abstract)
    ├── artifact              # Raw captured content (PDFs, HTML, API responses)
    ├── fragment              # Extracted pieces (sections, requirements, claims)
    └── note                  # Agent's analysis (fit scores, summaries, syntheses)

Key design principle: a gene or a job posting is not an information content entity. Only artifacts, fragments, and notes carry content. This separation keeps the ontology clean — domain objects are what you reason about, ICEs are what you reason with.

These aren't just storage tables—they're the vocabulary for reasoning about knowledge.

Provenance Preservation

Every piece of knowledge traces back to its source:

Artifact (raw job description)
    ↓
Fragment (extracted requirement)
    ↓
Note (your fit analysis)

You can always ask: "Where did this come from?"

Logical Queries

TypeDB uses pattern matching:

match
  $p isa jobhunt-position, has priority "high";
  $r isa jobhunt-requirement;
  (position: $p, requirement: $r) isa requires;
  $r has your-level "none";
fetch
  $p: title;
  $r: skill-name;

This finds all skill gaps in high-priority positions.

Embrace the Bitter Lesson

Following Richard Sutton's insight: general methods that leverage computation win in the long run.

We don't:

Over-engineer extraction pipelines
Hand-code entity recognizers
Build brittle rule systems
Require perfect structured input

Instead:

Let the agent read and comprehend
Store what the agent extracts
Query and synthesize

The system improves as the underlying model improves, without requiring code changes.

Skills Architecture

Skills are modular domain capabilities. Each skill has:

.claude/skills/<skill-name>/
├── SKILL.md     # Trigger conditions and brief command overview
├── USAGE.md     # Full step-by-step instructions for the agent
└── *.py         # TypeDB transaction scripts

SKILL.md vs USAGE.md

Skills split their instructions across two files for a specific reason: OpenClaw loads all SKILL.md files into the agent's system context at startup.

SKILL.md — Loaded at startup. Kept short. Tells the agent when to use this skill (triggers) and gives a brief command overview. If SKILL.md were long, every skill's full instructions would bloat the context on every message.
USAGE.md — Loaded on demand, when the agent is actually executing the skill. Contains the full step-by-step workflow, detailed command reference, sensemaking instructions, and TypeQL examples. This is what the agent reads to know how to do the work.

This split is what makes OpenClaw practical: dozens of skills can be registered without filling the context window with instructions that aren't relevant to the current task.

Python Scripts

Handle the mechanics:

API calls
TypeDB transactions
Data transformations
Bulk operations

Creating New Skills

Use the /domain-modeling meta-skill to design new domains following the curation pattern. It will help you:

Define the entity types
Design the schema
Write the SKILL.md
Create the transaction scripts

TypeDB as Domain Modeling Tool

TypeDB is not a document store or a traditional graph database — it's a conceptual modeling tool that enforces ontological rigor. The schema defines what concepts exist and how they relate before any data is stored.

Schema-First Design

Every skill starts with schema design, not data. Define your concepts in TypeDB's schema language (TypeQL), then write Python scripts that operate on those concepts. This forces clarity about the domain before implementation begins:

# A gene is a domain-thing, not information content
rd-gene sub domain-thing,
    owns rd-gene-symbol,
    owns rd-hgnc-id;

# A causal relationship between a gene and a disease
rd-gene-causes-disease sub relation,
    relates gene,
    relates disease,
    owns confidence;

This is different from JSON storage or a property graph: the schema is a logical contract about what is true in the domain.

The 3-Branch Hierarchy as a Skill Design Guide

The core hierarchy tells you where to place new concepts:

Branch	Ask Yourself	Examples
`domain-thing`	Is this a real-world entity you reason about?	disease, gene, company, job posting
`collection`	Is this a typed set of things with shared context?	investigation, search campaign, corpus
`artifact`	Is this raw captured content with a source?	API response, job description HTML, PDF
`fragment`	Is this an extracted piece of an artifact?	phenotype association, requirement, claim
`note`	Is this the agent's interpretation or analysis?	fit score, mechanism note, synthesis

The key rule: domain objects are not information content. A gene, a disease, a job posting — these are domain things. Only artifacts, fragments, and notes carry content. Conflating them creates ontological confusion.

Cross-Namespace Queries

TypeDB's pattern matching enables reasoning across skill boundaries using shared concepts. One concrete example: the addresses-requirement relation bridges the jobhunt and scilit namespaces:

# A scilit paper collection addressing a jobhunt skill gap
match
    $c isa collection;           # from scilit namespace
    $r isa jobhunt-requirement;  # from jobhunt namespace
    (resource: $c, requirement: $r) isa addresses-requirement;
fetch { "collection": $c.name, "skill": $r.skill-name };

This query works because the schema explicitly declares:

collection plays addresses-requirement:resource;

Another example: the rare-disease skill defines a shared mechanism model (total-loss, partial-loss, gain-of-function, etc.) in the schema. Any future skill dealing with genetic disease can reuse this vocabulary directly — the schema encodes it as a first-class concept rather than a free-text convention.

Data Model

The core schema is built on a three-branch hierarchy rooted at identifiable-entity:

Branch	Base Type	Purpose	Examples
Domain Objects	`domain-thing`	Real-world entities you track	Paper, company, position, gene, disease
Collections	`collection`	Typed sets of things	Corpus, job search, case file, disease family
Content	`information-content-entity`	Content-bearing entities	Artifacts, fragments, notes

The content branch has three concrete types:

ICE Type	Description	Examples
Artifact	Raw captured content with provenance	Job description HTML, paper PDF, API response
Fragment	Extracted portion of an artifact	Requirement, section, claim, figure
Note	Agent's analysis or annotation	Fit analysis, summary, synthesis, skill-gap note

Collections are typed per domain namespace rather than being generic containers. A disease family (e.g., "lysosomal storage diseases") is a collection of diseases, not itself a disease — this keeps domain-thing subtypes clean.

Domain-specific extensions (in namespaces/) add specialized types:

scilit.tql - Papers, datasets, preprints; scilit-corpus collections
jobhunt.tql - Positions, companies, skills, learning resources; jobhunt-search collections
apm.tql - Genes, variants, diseases, phenotypes, pathways; apm-case-file, apm-disease-family, apm-patient-cohort collections

See the Schema Reference for full details.

Home

Getting Started

Architecture

Core Skills

Domain Skills

Links

Uh oh!

Design Concepts

Design Concepts

Core Philosophy: Curation Over Collection

The Curation Workflow

Stage Descriptions

Separation of Concerns

Python Scripts Handle:

Agent Handles:

Claude Code as the Interface

TypeDB as Ontological Memory

Schema as Conceptual Vocabulary

Provenance Preservation

Logical Queries

Embrace the Bitter Lesson

Skills Architecture

SKILL.md vs USAGE.md

Python Scripts

Creating New Skills

TypeDB as Domain Modeling Tool

Schema-First Design

The 3-Branch Hierarchy as a Skill Design Guide

Cross-Namespace Queries

Data Model

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally