This issue bundles five concerns that surfaced during a real
project's use of `pk-team-create` and related team-management. They
are related — each makes it harder to reason about a project's team
as the resource model that the skill seems to be aiming at — but they
could be addressed individually. Filed as one issue because all five
showed up in a single ~3-hour session and together they shape one
recommendation; happy to split if you'd prefer.
Context: aibox project, processkit v0.25.6, Claude Code session.
Operator (a human) pre-existing team was Cora (PM/senior, AI agent) +
Bernhard (CEO/principal, human). During v0.25.6 implementation the
operator said "use the team for token and limit efficiency"; the
agent first misread this as "use the harness's generic Agent tool"
and only recovered after a clarifying round. Subsequent recovery
exposed five distinct rough edges in the team-creator model.
Gap 1 — `pk-team-create` does not consume the rich `context/roles/` catalog
`list_roles` returns ~50 catalog roles (account-executive,
ai-research-scientist, technical-writer, security-architect,
devops-engineer, software-engineer, …). `pk-team-create` derives a
team from 8 abstract archetypes (project-manager, senior-architect,
senior-researcher, junior-architect, developer, junior-researcher,
junior-developer, assistant). The 8 archetypes are written as new
Role entities into `context/roles/` by Step 6 of the skill —
parallel to, not selected from, the catalog.
Result: an operator looking at `context/roles/` sees a rich
vocabulary (`ROLE-cto`, `ROLE-technical-writer`,
`ROLE-security-architect`, `ROLE-qa-engineer`, …) and reasonably
expects `pk-team-create` to use those names. It doesn't. The
OpenWeave layer-4 `role-archetypes.yaml` lets you re-pin which
archetype maps to which tier, but it doesn't let you swap the
archetype role-IDs for catalog role-IDs (e.g. pin senior-architect to
`ROLE-cto` rather than invent a new `ROLE-senior-architect`).
Suggested fix: either consume the catalog directly in
`pk-team-create`, or add a layer-5 override that maps each
archetype to a catalog role-ID. The current behavior duplicates the
catalog and hides the duplication.
Gap 2 — Skill description uses internal codenames (`OpenWeave`)
The team-creator skill body and CHANGELOG refer to "OpenWeave layer 1
/ 2 / 3 / 4" extensively. There is no inline definition; the agent
had to grep the changelog to learn that `OpenWeave` is the codename
for the 4-layer override surface (`FEAT-OpenWeave`).
Codenames are fine for implementation projects but actively unhelpful
in a skill description that an LLM agent reads to decide whether and
how to call the skill. The codename adds zero discoverability and
forces the agent (or human) to look up the mapping every time.
Suggested fix: rename in user-facing surfaces to "the 4-layer
override stack" or similar self-explanatory phrasing. Keep the
`FEAT-OpenWeave` codename internal to your project tracking. Apply
the same rule to other codenames in skill bodies (`TeamWeaver Phase 3
dogfood` etc.).
Gap 3 — No first-class concept of consultants / ephemeral team members
The team-creator's model assumes a stable employee roster: 8 fixed
archetypes, mapped onto cost tiers, with a chartering
DecisionRecord. There is no first-class way to express "for this
specific task I'm temporarily engaging an ML expert / a security
auditor / a release engineer" — i.e. a consultant who is part of
the team for a window then leaves.
In practice, real teams routinely engage consultants. The current
workarounds are awkward: either (a) permanently expand the team
(over-staffs the project, conflates one-off cost with recurring
cost), or (b) ad-hoc dispatch via Claude Code's generic Agent tool
(loses processkit attribution and budget tracking). Neither is what
the operator means by "consultant".
Suggested fix: add a `type=consultant` or `employment=ephemeral`
TeamMember kind with required `engaged_for` (workitem ID or
free-text scope) and `engagement_window` (start/end dates or
"until-done"). The resolver picks the binding for the engagement
window only; a future `pk-team-review` flags consultants whose
window has expired but who haven't been deactivated. This gives the
operator a vocabulary for "fixed team I have, consultants I can hire
for specific tasks", which is how humans actually plan resource use.
Gap 4 — Team-member-as-person vs. team-member-as-cloneable-role
The TeamMember entity has personality, relationships, memory tiers,
joined_at, etc. — all attributes that imply a person. Yet
`pk-team-create`'s tier mapping has `clone_cap: 5` for most
archetypes — implying TeamMembers are spawnable role
instances, not people. The skill's "PM clone cap is hard-coded 1"
note even codifies that the PM is "the only person" while everyone
else is N parallel clones.
This conflict shows up in practice. In our session the agent created
a single TEAMMEMBER-avery, then dispatched two parallel Claude Code
subagents under Avery's identity — implicitly cloning Avery for
parallelism. The user reasonably asked "wait, you have one Avery,
but you cloned it like it was just a richer role definition?" The
answer is yes, that's what the model invites — but it makes humans
unable to reason about the team as resources.
Suggested fix: pick one model and lean into it.
- Person model: `clone_cap` always 1; parallelism requires
spawning additional named TeamMembers (`Avery-2`, `Avery-3`,
or better: `Robin`, `Jordan`, `Sage` etc.). Each clone has
its own identity, memory, relationships. Token budget = sum of
named members.
- Role-instance model: TeamMember is a class; bindings produce N
instances with deterministic identifiers
(`TEAMMEMBER-software-engineer-1`, `-2`). Personality,
memory, relationships live on the role rather than the instance.
- Hybrid: keep TeamMember as a person but introduce a separate
`RoleSlot` entity that pk-team-create writes (8 slots, each
with clone_cap, tier, archetype). TeamMembers are then assigned
to slots — a single Avery person can occupy the developer slot;
a parallel pair Avery+Robin can occupy slot capacity 2. This
is the cleanest version because it lets humans reason about
people (5 in our team) and capacity (3 developer-tier slots,
2 of which are filled) separately.
The hybrid model also naturally accommodates Gap 3 (consultants are
RoleSlot-only, no permanent TeamMember; they get instantiated for
the workitem and disposed at completion).
Gap 5 — Resource / budget model is implicit, not first-class
Once you fix gaps 3 and 4, the next missing piece is making budget
visible. Today an operator cannot easily answer:
- "How much of my Anthropic Pro/Max subscription is this team
burning per week?"
- "If I add a junior engineer (Haiku-tier), how much does that
reduce next week's burn rate vs. what it costs me?"
- "If I engage a security auditor consultant for one task, what is
the expected token cost vs. doing it in-house with Avery?"
- "Who on my team is currently over capacity (resolver pointed
away)?"
The TierScore formula tracks K (cost efficiency) per model, but the
chartering DecisionRecord doesn't compose those into a team-level
budget projection. `pk-team-review`'s tier-shift table is closest,
but it's read-only and doesn't surface dollar/token cost.
Suggested fix: add a `budget_projection` block to the chartering
DecisionRecord's `inputs_snapshot`: estimated weekly token
consumption per TeamMember, mapped through cost models for each
binding. `pk-team-review` then surfaces drift (e.g. "Avery is
running 2× projected; consider rebalance or capacity"). This gives
operators the language they actually use to think about team
composition: not "do I have an engineer?" but "can I afford to add
this engineer given my subscription tier?"
Why these matter together
Each gap is independently fixable, but the underlying issue is the
same: `pk-team-create` was built as a model-tier-allocator (given a
subscription, classify each archetype onto heavy/medium/light) and
not as a team-resource-planner (given the work I want to do, who do I
have, who do I hire, what does it cost). A human operator naturally
reaches for the second mental model. The current skill nudges them
toward the first, which is why "use the team" was so easy to misread,
and why the catalog/archetype duplication and the
person/clone-instance ambiguity feel like rough edges rather than
design choices.
Companion to #18 / #19
#18 (signal sentiment) and #19 (Claude Code-specific knobs) are about
how the contract lands in an agent's attention. This issue is
about what the underlying model represents. The three together
roughly describe what we'd want from team-creator on a v2 pass.
Reproduction notes
Same session as #18 and #19. Concrete artifacts on the aibox side
(I can share if helpful):
- `DEC-20260508_2043-TidyFern` — added Avery as a one-off TeamMember,
documenting why pk-team-create was the wrong shape.
- Session transcript showing the "use the team" misread + recovery.
- A second DEC (this session) noting the team expansion to 5 AI
agents (Robin, Jordan, Sage join Avery + Cora) — all added via
catalog-role direct-creation, not pk-team-create.
This issue bundles five concerns that surfaced during a real
project's use of `pk-team-create` and related team-management. They
are related — each makes it harder to reason about a project's team
as the resource model that the skill seems to be aiming at — but they
could be addressed individually. Filed as one issue because all five
showed up in a single ~3-hour session and together they shape one
recommendation; happy to split if you'd prefer.
Context: aibox project, processkit v0.25.6, Claude Code session.
Operator (a human) pre-existing team was Cora (PM/senior, AI agent) +
Bernhard (CEO/principal, human). During v0.25.6 implementation the
operator said "use the team for token and limit efficiency"; the
agent first misread this as "use the harness's generic Agent tool"
and only recovered after a clarifying round. Subsequent recovery
exposed five distinct rough edges in the team-creator model.
Gap 1 — `pk-team-create` does not consume the rich `context/roles/` catalog
`list_roles` returns ~50 catalog roles (account-executive,
ai-research-scientist, technical-writer, security-architect,
devops-engineer, software-engineer, …). `pk-team-create` derives a
team from 8 abstract archetypes (project-manager, senior-architect,
senior-researcher, junior-architect, developer, junior-researcher,
junior-developer, assistant). The 8 archetypes are written as new
Role entities into `context/roles/` by Step 6 of the skill —
parallel to, not selected from, the catalog.
Result: an operator looking at `context/roles/` sees a rich
vocabulary (`ROLE-cto`, `ROLE-technical-writer`,
`ROLE-security-architect`, `ROLE-qa-engineer`, …) and reasonably
expects `pk-team-create` to use those names. It doesn't. The
OpenWeave layer-4 `role-archetypes.yaml` lets you re-pin which
archetype maps to which tier, but it doesn't let you swap the
archetype role-IDs for catalog role-IDs (e.g. pin senior-architect to
`ROLE-cto` rather than invent a new `ROLE-senior-architect`).
Suggested fix: either consume the catalog directly in
`pk-team-create`, or add a layer-5 override that maps each
archetype to a catalog role-ID. The current behavior duplicates the
catalog and hides the duplication.
Gap 2 — Skill description uses internal codenames (`OpenWeave`)
The team-creator skill body and CHANGELOG refer to "OpenWeave layer 1
/ 2 / 3 / 4" extensively. There is no inline definition; the agent
had to grep the changelog to learn that `OpenWeave` is the codename
for the 4-layer override surface (`FEAT-OpenWeave`).
Codenames are fine for implementation projects but actively unhelpful
in a skill description that an LLM agent reads to decide whether and
how to call the skill. The codename adds zero discoverability and
forces the agent (or human) to look up the mapping every time.
Suggested fix: rename in user-facing surfaces to "the 4-layer
override stack" or similar self-explanatory phrasing. Keep the
`FEAT-OpenWeave` codename internal to your project tracking. Apply
the same rule to other codenames in skill bodies (`TeamWeaver Phase 3
dogfood` etc.).
Gap 3 — No first-class concept of consultants / ephemeral team members
The team-creator's model assumes a stable employee roster: 8 fixed
archetypes, mapped onto cost tiers, with a chartering
DecisionRecord. There is no first-class way to express "for this
specific task I'm temporarily engaging an ML expert / a security
auditor / a release engineer" — i.e. a consultant who is part of
the team for a window then leaves.
In practice, real teams routinely engage consultants. The current
workarounds are awkward: either (a) permanently expand the team
(over-staffs the project, conflates one-off cost with recurring
cost), or (b) ad-hoc dispatch via Claude Code's generic Agent tool
(loses processkit attribution and budget tracking). Neither is what
the operator means by "consultant".
Suggested fix: add a `type=consultant` or `employment=ephemeral`
TeamMember kind with required `engaged_for` (workitem ID or
free-text scope) and `engagement_window` (start/end dates or
"until-done"). The resolver picks the binding for the engagement
window only; a future `pk-team-review` flags consultants whose
window has expired but who haven't been deactivated. This gives the
operator a vocabulary for "fixed team I have, consultants I can hire
for specific tasks", which is how humans actually plan resource use.
Gap 4 — Team-member-as-person vs. team-member-as-cloneable-role
The TeamMember entity has personality, relationships, memory tiers,
joined_at, etc. — all attributes that imply a person. Yet
`pk-team-create`'s tier mapping has `clone_cap: 5` for most
archetypes — implying TeamMembers are spawnable role
instances, not people. The skill's "PM clone cap is hard-coded 1"
note even codifies that the PM is "the only person" while everyone
else is N parallel clones.
This conflict shows up in practice. In our session the agent created
a single TEAMMEMBER-avery, then dispatched two parallel Claude Code
subagents under Avery's identity — implicitly cloning Avery for
parallelism. The user reasonably asked "wait, you have one Avery,
but you cloned it like it was just a richer role definition?" The
answer is yes, that's what the model invites — but it makes humans
unable to reason about the team as resources.
Suggested fix: pick one model and lean into it.
spawning additional named TeamMembers (`Avery-2`, `Avery-3`,
or better: `Robin`, `Jordan`, `Sage` etc.). Each clone has
its own identity, memory, relationships. Token budget = sum of
named members.
instances with deterministic identifiers
(`TEAMMEMBER-software-engineer-1`, `-2`). Personality,
memory, relationships live on the role rather than the instance.
`RoleSlot` entity that pk-team-create writes (8 slots, each
with clone_cap, tier, archetype). TeamMembers are then assigned
to slots — a single Avery person can occupy the developer slot;
a parallel pair Avery+Robin can occupy slot capacity 2. This
is the cleanest version because it lets humans reason about
people (5 in our team) and capacity (3 developer-tier slots,
2 of which are filled) separately.
The hybrid model also naturally accommodates Gap 3 (consultants are
RoleSlot-only, no permanent TeamMember; they get instantiated for
the workitem and disposed at completion).
Gap 5 — Resource / budget model is implicit, not first-class
Once you fix gaps 3 and 4, the next missing piece is making budget
visible. Today an operator cannot easily answer:
burning per week?"
reduce next week's burn rate vs. what it costs me?"
the expected token cost vs. doing it in-house with Avery?"
away)?"
The TierScore formula tracks K (cost efficiency) per model, but the
chartering DecisionRecord doesn't compose those into a team-level
budget projection. `pk-team-review`'s tier-shift table is closest,
but it's read-only and doesn't surface dollar/token cost.
Suggested fix: add a `budget_projection` block to the chartering
DecisionRecord's `inputs_snapshot`: estimated weekly token
consumption per TeamMember, mapped through cost models for each
binding. `pk-team-review` then surfaces drift (e.g. "Avery is
running 2× projected; consider rebalance or capacity"). This gives
operators the language they actually use to think about team
composition: not "do I have an engineer?" but "can I afford to add
this engineer given my subscription tier?"
Why these matter together
Each gap is independently fixable, but the underlying issue is the
same: `pk-team-create` was built as a model-tier-allocator (given a
subscription, classify each archetype onto heavy/medium/light) and
not as a team-resource-planner (given the work I want to do, who do I
have, who do I hire, what does it cost). A human operator naturally
reaches for the second mental model. The current skill nudges them
toward the first, which is why "use the team" was so easy to misread,
and why the catalog/archetype duplication and the
person/clone-instance ambiguity feel like rough edges rather than
design choices.
Companion to #18 / #19
#18 (signal sentiment) and #19 (Claude Code-specific knobs) are about
how the contract lands in an agent's attention. This issue is
about what the underlying model represents. The three together
roughly describe what we'd want from team-creator on a v2 pass.
Reproduction notes
Same session as #18 and #19. Concrete artifacts on the aibox side
(I can share if helpful):
documenting why pk-team-create was the wrong shape.
agents (Robin, Jordan, Sage join Avery + Cora) — all added via
catalog-role direct-creation, not pk-team-create.