-
Notifications
You must be signed in to change notification settings - Fork 2
Design Concepts
This document explains the architecture and design principles behind Skillful-Alhazen.
The system exists to help you make sense of material, not just store it. This distinction is crucial:
- Collection = passively accumulating information
- Curation = actively interrogating, extracting meaning, building understanding
Every component serves the curation mission. We embody Alhazen's philosophy: be an enemy of all you read.
All skills follow a five-stage workflow:
┌─────────────────────────────────────────────────────────────────────────────┐
│ CURATION WORKFLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. FORAGING 2. INGESTION 3. SENSEMAKING │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Discover │───────▶│ Capture │────────▶│ Agent reads │ │
│ │ sources │ │ raw │ │ & extracts │ │
│ └──────────┘ └──────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ - URLs - Artifacts - Fragments │
│ - APIs - Provenance - Notes │
│ - Feeds - Timestamps - Relations │
│ │
│ │ │
│ ▼ │
│ 4. ANALYZE/SUMMARIZE 5. REPORT │
│ ┌──────────────────┐ ┌──────────────┐ │
│ │ Reason over many │──────▶│ Dashboard │ │
│ │ notes over time │ │ & answers │ │
│ └──────────────────┘ └──────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ - Synthesis notes - Pipeline views │
│ - Trend analysis - Skills matrix │
│ - Recommendations - Strategic reports │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Stage | Purpose | Outputs |
|---|---|---|
| Foraging | Discover sources of information | URLs, API endpoints, feeds |
| Ingestion | Capture raw content with provenance | Artifacts with timestamps and sources |
| Sensemaking | Agent reads and extracts meaning | Fragments, notes, relationships |
| Analysis | Reason over accumulated knowledge | Synthesis notes, gap analysis |
| Reporting | Present actionable insights | Dashboards, answers, recommendations |
A key architectural principle: scripts handle I/O, Agent handles thinking.
- Fetching from APIs (pagination, rate limits, bulk operations)
- Storing raw artifacts with provenance
- TypeDB transactions
- Returning structured data to the agent
- Reading and comprehending content
- Extracting entities and relationships
- Creating structured notes
- Synthesizing across sources
- Recommending actions
This separation:
- Minimizes token usage (scripts do the heavy lifting)
- Maximizes comprehension (agent focuses on understanding)
- Enables scaling (scripts handle pagination, bulk operations)
You interact through natural conversation. You never:
- Write TypeQL queries
- Call APIs directly
- Manage database transactions
- Navigate file structures
The agent translates your intent into actions, choosing the right skill and executing the appropriate operations.
You: "What are my skill gaps across high-priority positions?"
Agent: [Internally]
1. Query positions with priority:high tag
2. For each, fetch requirements
3. Compare against your skill profile
4. Aggregate gaps by frequency
5. Link to relevant learning resources
Agent: [Response]
"Across 3 high-priority positions, the most common gaps are:
1. Distributed Systems (required in 3/3)
2. Kubernetes (required in 2/3)
..."
TypeDB provides a logic-driven knowledge graph. Key properties:
The schema defines the concepts the agent thinks with. The type hierarchy separates domain objects from information content:
identifiable-entity (abstract root)
├── domain-thing # Real-world objects: papers, genes, jobs
├── collection # Typed sets: corpora, search campaigns, case files
└── information-content-entity (abstract)
├── artifact # Raw captured content (PDFs, HTML, API responses)
├── fragment # Extracted pieces (sections, requirements, claims)
└── note # Agent's analysis (fit scores, summaries, syntheses)
Key design principle: a gene or a job posting is not an information content entity. Only artifacts, fragments, and notes carry content. This separation keeps the ontology clean — domain objects are what you reason about, ICEs are what you reason with.
These aren't just storage tables—they're the vocabulary for reasoning about knowledge.
Every piece of knowledge traces back to its source:
Artifact (raw job description)
↓
Fragment (extracted requirement)
↓
Note (your fit analysis)
You can always ask: "Where did this come from?"
TypeDB uses pattern matching:
match
$p isa jobhunt-position, has priority "high";
$r isa jobhunt-requirement;
(position: $p, requirement: $r) isa requires;
$r has your-level "none";
fetch
$p: title;
$r: skill-name;
This finds all skill gaps in high-priority positions.
Following Richard Sutton's insight: general methods that leverage computation win in the long run.
We don't:
- Over-engineer extraction pipelines
- Hand-code entity recognizers
- Build brittle rule systems
- Require perfect structured input
Instead:
- Let the agent read and comprehend
- Store what the agent extracts
- Query and synthesize
The system improves as the underlying model improves, without requiring code changes.
Skills are modular domain capabilities. Each skill has:
.claude/skills/<skill-name>/
├── SKILL.md # Trigger conditions and brief command overview
├── USAGE.md # Full step-by-step instructions for the agent
└── *.py # TypeDB transaction scripts
Skills split their instructions across two files for a specific reason: OpenClaw loads all SKILL.md files into the agent's system context at startup.
-
SKILL.md — Loaded at startup. Kept short. Tells the agent when to use this skill (triggers) and gives a brief command overview. If SKILL.md were long, every skill's full instructions would bloat the context on every message.
-
USAGE.md — Loaded on demand, when the agent is actually executing the skill. Contains the full step-by-step workflow, detailed command reference, sensemaking instructions, and TypeQL examples. This is what the agent reads to know how to do the work.
This split is what makes OpenClaw practical: dozens of skills can be registered without filling the context window with instructions that aren't relevant to the current task.
Handle the mechanics:
- API calls
- TypeDB transactions
- Data transformations
- Bulk operations
Use the /domain-modeling meta-skill to design new domains following the curation pattern. It will help you:
- Define the entity types
- Design the schema
- Write the SKILL.md
- Create the transaction scripts
TypeDB is not a document store or a traditional graph database — it's a conceptual modeling tool that enforces ontological rigor. The schema defines what concepts exist and how they relate before any data is stored.
Every skill starts with schema design, not data. Define your concepts in TypeDB's schema language (TypeQL), then write Python scripts that operate on those concepts. This forces clarity about the domain before implementation begins:
# A gene is a domain-thing, not information content
rd-gene sub domain-thing,
owns rd-gene-symbol,
owns rd-hgnc-id;
# A causal relationship between a gene and a disease
rd-gene-causes-disease sub relation,
relates gene,
relates disease,
owns confidence;
This is different from JSON storage or a property graph: the schema is a logical contract about what is true in the domain.
The core hierarchy tells you where to place new concepts:
| Branch | Ask Yourself | Examples |
|---|---|---|
domain-thing |
Is this a real-world entity you reason about? | disease, gene, company, job posting |
collection |
Is this a typed set of things with shared context? | investigation, search campaign, corpus |
artifact |
Is this raw captured content with a source? | API response, job description HTML, PDF |
fragment |
Is this an extracted piece of an artifact? | phenotype association, requirement, claim |
note |
Is this the agent's interpretation or analysis? | fit score, mechanism note, synthesis |
The key rule: domain objects are not information content. A gene, a disease, a job posting — these are domain things. Only artifacts, fragments, and notes carry content. Conflating them creates ontological confusion.
TypeDB's pattern matching enables reasoning across skill boundaries using shared concepts. One concrete example: the addresses-requirement relation bridges the jobhunt and scilit namespaces:
# A scilit paper collection addressing a jobhunt skill gap
match
$c isa collection; # from scilit namespace
$r isa jobhunt-requirement; # from jobhunt namespace
(resource: $c, requirement: $r) isa addresses-requirement;
fetch { "collection": $c.name, "skill": $r.skill-name };
This query works because the schema explicitly declares:
collection plays addresses-requirement:resource;
Another example: the rare-disease skill defines a shared mechanism model (total-loss, partial-loss, gain-of-function, etc.) in the schema. Any future skill dealing with genetic disease can reuse this vocabulary directly — the schema encodes it as a first-class concept rather than a free-text convention.
The core schema is built on a three-branch hierarchy rooted at identifiable-entity:
| Branch | Base Type | Purpose | Examples |
|---|---|---|---|
| Domain Objects | domain-thing |
Real-world entities you track | Paper, company, position, gene, disease |
| Collections | collection |
Typed sets of things | Corpus, job search, case file, disease family |
| Content | information-content-entity |
Content-bearing entities | Artifacts, fragments, notes |
The content branch has three concrete types:
| ICE Type | Description | Examples |
|---|---|---|
| Artifact | Raw captured content with provenance | Job description HTML, paper PDF, API response |
| Fragment | Extracted portion of an artifact | Requirement, section, claim, figure |
| Note | Agent's analysis or annotation | Fit analysis, summary, synthesis, skill-gap note |
Collections are typed per domain namespace rather than being generic containers. A disease family (e.g., "lysosomal storage diseases") is a collection of diseases, not itself a disease — this keeps domain-thing subtypes clean.
Domain-specific extensions (in namespaces/) add specialized types:
-
scilit.tql- Papers, datasets, preprints;scilit-corpuscollections -
jobhunt.tql- Positions, companies, skills, learning resources;jobhunt-searchcollections -
apm.tql- Genes, variants, diseases, phenotypes, pathways;apm-case-file,apm-disease-family,apm-patient-cohortcollections
See the Schema Reference for full details.
Getting Started
Architecture
Core Skills
Domain Skills
- Skills: Scientific Literature
- Skills: ALG Precision Therapeutics
- Skills: Literature Trends
- Skills: They Said Whaaa
- Skills: DisMech
- Skills: Jobhunt
Links