Skills: Domain Modeling (archived)

Skills: Domain Modeling

Domain modeling is the process of designing a new skill from scratch — defining the TypeDB schema namespace, writing the skill files, and building the Python scripts that give an agent a coherent new domain to work in.

This is a meta-skill: rather than operating on a domain (job hunting, rare diseases), it helps you create one.

When You Need Domain Modeling

You need domain modeling when you want the agent to:

Track a new category of things over time (papers, experiments, competitors, grants, proteins…)
Build structured understanding from unstructured sources in a new area
Answer recurring questions that require accumulated, queryable knowledge

If a conversation-scoped answer is enough, you don't need a new skill. Domain modeling is justified when the knowledge should persist across sessions and accumulate into something more than the sum of individual notes.

The Central Design Question

Before writing a single line of schema, answer this:

What questions do you want to ask six months from now, once the knowledge graph has been accumulating data?

The answer determines what entities you need, what relationships matter, and what level of granularity is worth capturing. Work backwards from the queries you want to run.

Example (jobhunt):

"What skill gaps appear most often across my high-priority positions?" → need position, requirement, your-skill
"Which companies have I interviewed at?" → need company, position, status tracking
"What learning resources address my top gap?" → need learning-resource linked to requirement

Example (rare-disease):

"What genes cause this disease?" → need gene, disease, causal relation
"Which diseases are phenotypically similar?" → need phenotype, similarity score
"What drugs target those genes?" → need drug, gene, targeting relation

Placing Concepts in the 3-Branch Hierarchy

Every type you add to the schema belongs in one of three branches. Getting this right is the most important design decision.

Branch	Base Type	The Question to Ask
Domain objects	`domain-thing`	Is this a real-world thing I reason about?
Collections	`collection`	Is this a typed set of things with shared context?
Content	`information-content-entity`	Is this content I capture, extract, or annotate?

The content branch has three concrete subtypes:

ICE Subtype	The Question to Ask
`artifact`	Is this raw captured content with a source URL or provenance?
`fragment`	Is this an extracted piece of an artifact?
`note`	Is this the agent's interpretation, analysis, or annotation?

The key rule: domain objects are not information content. A gene, a job posting, a company, a disease — these are domain things. The HTML page describing the job posting is an artifact. The extracted skill requirement is a fragment. The fit analysis is a note.

Conflating these creates ontological confusion that makes queries brittle. A common mistake is making a "paper" both a domain thing (the intellectual object) and an artifact (the captured PDF). Keep them separate.

Worked Classification Exercise

New domain: tracking grant funding opportunities.

Concept	Branch	Reasoning
Funding agency (NIH, NSF, Wellcome)	`domain-thing`	Real-world organisation you reason about
Grant opportunity (RFA-CA-25-001)	`domain-thing`	Real-world object with a persistent identity
Your research group	`domain-thing`	Actor in the domain
Funding portfolio	`collection`	A typed set of grant opportunities being tracked
Grant opportunity announcement (HTML)	`artifact`	Raw captured content with a source URL
Extracted eligibility criterion	`fragment`	Extracted piece of the announcement artifact
Fit assessment	`note`	Agent's analysis of group fit against the opportunity

Schema Design

Namespace Prefix

All type names must use a consistent <domain>- prefix to avoid clashes with other namespaces. Choose a short, unambiguous prefix:

jobhunt-*    rd-*    techrecon-*    grant-*

Schema Structure

A minimal schema has four sections:

define

# 1. Attributes
grant-url sub attribute, value string;
grant-deadline sub attribute, value datetime;
grant-amount sub attribute, value long;
grant-fit-score sub attribute, value double;

# 2. Domain Things
grant-opportunity sub domain-thing,
    owns grant-url,
    owns grant-deadline,
    plays opportunity-at-agency:opportunity;

grant-agency sub domain-thing,
    plays opportunity-at-agency:agency;

# 3. Collections
grant-portfolio sub collection;

# 4. ICE subtypes
grant-announcement sub artifact;

grant-eligibility-criterion sub fragment,
    owns criterion-text,
    owns criterion-type;           # required / preferred / excluded

grant-fit-note sub note,
    owns grant-fit-score,
    owns fit-summary;

# 5. Relations
opportunity-at-agency sub relation,
    relates opportunity,
    relates agency;

Design Principles

Prefix everything. url is fine for a single-domain system; for a multi-skill knowledge graph it will collide. Use grant-url.

Attributes are reusable. If name already exists in the core schema, extend it rather than defining grant-name. Check the core schema first.

Keep domain-things clean. Don't put content attributes (summaries, extracted text) on domain things. Those belong on ICEs.

Relations are explicit. A job posting "requires" a skill. A gene "causes" a disease. A drug "targets" a gene. Make the relationship a named type, not just a pointer. This enables pattern-matching queries across the graph.

Collections are typed per domain. Don't use a generic collection — define grant-portfolio sub collection so queries can target it specifically.

The Sensemaking Chain

Every skill follows the same provenance chain. Design it explicitly:

Source URL / API
    ↓
Artifact (raw content + provenance)
    ↓
Fragments (extracted structured pieces)
    ↓
Notes (agent analysis, scoring, synthesis)

For each domain object ask:

What artifact captures the raw evidence about it?
What fragments are worth extracting from that artifact?
What notes will the agent write after reading?

Example (grant skill):

grants.nih.gov/grants/guide/rfa-files/RFA-CA-25-001.html
    ↓
grant-announcement artifact (HTML + URL + fetch timestamp)
    ↓
grant-eligibility-criterion fragments (one per eligibility rule)
    ↓
grant-fit-note (fit score + narrative + identified gaps)

This chain means you can always trace a fit score back to the exact text it was derived from.

Writing SKILL.md

SKILL.md is loaded at startup on every session. Keep it under one page. It contains only:

Frontmatter (name, description)
Trigger phrases — what the user might say that should invoke this skill
Prerequisites (TypeDB running, env vars, any auth)
Quick-start command examples
A pointer to USAGE.md for the full workflow

---
name: grant
description: Track grant opportunities, analyse eligibility, identify fit gaps
---

# Grant Tracking Skill

Use this skill to manage grant opportunities as a knowledge graph.

**When to use:** "add grant", "new funding opportunity", "analyze this RFA",
"show my grant pipeline", "funding gaps", "submission deadlines"

## Prerequisites
- TypeDB running: `make db-start`
- `uv sync --all-extras`

## Quick Start

```bash
uv run python .claude/skills/grant/grant.py ingest-opportunity \
    --url "https://grants.nih.gov/grants/guide/rfa-files/RFA-CA-25-001.html"

uv run python .claude/skills/grant/grant.py list-pipeline

Before executing any commands, read USAGE.md for the complete workflow, command reference, and sensemaking steps.


## Writing USAGE.md

USAGE.md is **loaded on demand** when executing the skill. It can be as long as needed. Structure it around the five curation stages:

```markdown
# Grant Tracking Usage

## 5-Phase Workflow

### Phase 1: Foraging
[how to discover grant opportunities — search, feeds, agency pages]

### Phase 2: Ingestion
[ingest-opportunity command, what it fetches and stores]

### Phase 3: Sensemaking
[step-by-step: read artifact → extract eligibility criteria as fragments
 → write fit-note with score]

### Phase 4: Analysis
[query commands: show-gaps, show-pipeline, show-deadlines]

### Phase 5: Reporting
[report commands, dashboard views]

## Sensemaking Workflow (Agent reasoning steps)

1. Run `show-artifact --id <id>` to get announcement text
2. Read the full announcement, identify eligibility criteria
3. For each criterion: run `add-criterion` to store as fragment
4. Assess group fit against each criterion
5. Run `add-note --type fit-analysis --fit-score 0.75 ...`
6. Report: score breakdown, key gaps, recommended actions

## Command Reference
[full argparse command table]

## TypeQL Examples
[representative queries]

Python Script Conventions

The script is the skill's I/O layer. The agent handles reasoning; the script handles data.

Required Structure

#!/usr/bin/env python3
"""Grant tracking CLI."""

import argparse, json, os, sys
from typedb.driver import TypeDB, SessionType, TransactionType

TYPEDB_HOST = os.getenv("TYPEDB_HOST", "localhost")
TYPEDB_PORT = int(os.getenv("TYPEDB_PORT", "1729"))
TYPEDB_DATABASE = os.getenv("TYPEDB_DATABASE", "alhazen_notebook")

def cmd_ingest(args):
    # ... fetch + store as artifact ...
    print(json.dumps({"success": True, "id": artifact_id}))  # stdout = structured result

def cmd_list(args):
    # ... query TypeDB ...
    print(json.dumps({"success": True, "items": results}))

def main():
    parser = argparse.ArgumentParser(description="Grant tracking CLI")
    sub = parser.add_subparsers(dest="command", required=True)
    # ... register subcommands ...
    args = parser.parse_args()
    commands[args.command](args)

if __name__ == "__main__":
    main()

Key Rules

JSON to stdout only. The agent parses stdout.
Progress and errors to stderr. print("Fetching...", file=sys.stderr) never pollutes the JSON.
One subcommand per operation. ingest-opportunity, list-pipeline, add-note — not a single run command with flags controlling everything.
No reasoning in the script. The script fetches, stores, queries. It does not summarise, score, or interpret. That is the agent's job.
Fail loudly. If TypeDB is unavailable or a required ID is missing, raise and let stderr carry the message. Don't silently return empty results.

Standard Command Categories

Most skills need commands in these four categories:

Category	Examples
Ingestion	`ingest-`, `init-` — fetch from external source, store as artifact
Sensemaking support	`show-artifact`, `list-artifacts` — feed content to the agent for reading
Annotation	`add-note`, `add-fragment`, `tag` — store what the agent extracts
Query / Report	`list-`, `show-`, `report-*` — retrieve structured results

Common Pitfalls

Putting everything in notes. Notes are analysis; fragments are extracted structure. If you're storing a list of 50 eligibility criteria as a single note blob, they should be fragments — one per criterion, queryable individually.

Skipping the artifact. It's tempting to skip storing the raw HTML and go straight to extracted entities. Don't. The artifact preserves provenance and lets you re-read the source if your extraction was incomplete.

Generic collection types. collection sub collection is useless. grant-portfolio sub collection lets you write $p isa grant-portfolio in queries. Always define a named subtype.

Forgetting namespace isolation. Two skills both defining url will conflict at schema load time. Always prefix.

Making the script do sensemaking. If your ingest command is computing scores, summarising text, or making classification decisions — stop. Return the raw data and let the agent do that work.

Reference Implementations

Both demonstration skills are complete working examples:

Skill	What to study
Skills: Jobhunt	Full vertical: schema + SKILL.md + USAGE.md + Python script + Next.js dashboard. The forager pattern (automated discovery → candidates → promote).
Skills: Rare Disease	Multi-source ingestion (Monarch, ClinicalTrials, ChEMBL). Structured mechanism model as a schema first-class concept. Cross-namespace queries.

Read their schema.tql files alongside Schema Reference to see how the 3-branch hierarchy is applied in practice.

Uh oh!

Skills: Domain Modeling (archived)

Skills: Domain Modeling

When You Need Domain Modeling

The Central Design Question

Placing Concepts in the 3-Branch Hierarchy

Worked Classification Exercise

Schema Design

Namespace Prefix

Schema Structure

Design Principles

The Sensemaking Chain

Writing SKILL.md

Python Script Conventions

Required Structure

Key Rules

Standard Command Categories

Common Pitfalls

Reference Implementations

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally