Skip to content

Wiki: forbid name fabrication; librarian fixes wrong user names on existing articles#9

Merged
sysread merged 1 commit intomainfrom
claude/wiki-name-hallucination-fix
May 10, 2026
Merged

Wiki: forbid name fabrication; librarian fixes wrong user names on existing articles#9
sysread merged 1 commit intomainfrom
claude/wiki-name-hallucination-fix

Conversation

@sysread
Copy link
Copy Markdown
Owner

@sysread sysread commented May 10, 2026

=head1 SYNOPSIS

Wiki agent invented a user-name (called the user "Elliot" in
articles when the configured name was "Jeff", because a friend
named Elliot was mentioned in the source conversation).

Three fixes: stronger profile block, explicit no-name-fabrication
rule in the autonomous body, and a librarian recovery pass for
articles already on disk with the wrong name.

=head1 PURPOSE

User reported: settings says "Jeff", multiple wiki articles call
the user "Elliot" (a friend the user discussed in the
conversation that triggered the article). The model conflated
the user with someone else from the conversation context and
applied that other name to the user-subject of the article.

=head1 DESCRIPTION

=head2 Layer 1: how renderUserProfileBlock looked before

The helper folded the configured name + location into a soft
preference: "When an article refers to the user themselves,
prefer their name (or a natural pronoun if their name is a
single first name) over the generic phrase 'the user'."

The autonomous prompt's "Do not fabricate" section covered facts
("Only assert facts that appear in the conversation...") but said
nothing about names specifically. The librarian had no path to
clean up articles that already had the wrong name baked in.

=head2 Layer 2: what this PR changes

Both renderUserProfileBlock helpers (wiki and librarian) are
rewritten with HARD wording:

"The user's name is Jeff.
When an article refers to the user themselves, the user's
name is Jeff and ONLY Jeff. NEVER invent another name
for the user, even if other names appear in the
conversation - those other names belong to other people the
user knows. If the conversation mentions a friend named
Maya, an article about the user does not call the user
Maya; it calls the user Jeff. If you are uncertain whether
the article subject IS the user, default to using the
literal name from context (Maya, Elliot, etc.) for that
subject and reserve 'Jeff' for explicit references to the
user. ..."

The unknown-name path (no name in Settings) is split out so
the model isn't told to "use their name" when no name exists;
in that case it falls back to natural pronouns + "the user".

The autonomous body gets an explicit "Do not fabricate names"
section that points back to the profile block as the single
source of truth, and tells the model what to do when uncertain
("use the literal name as it appears in the conversation rather
than inventing one").

The librarian gains a new workflow step (positioned between
scope-cleanup and duplicate-consolidation) that scans for
articles using a wrong name for the user and wiki_updates
them. It uses memory_search + conversation_search to
disambiguate: an article mis-naming the user "Elliot" gets
fixed to use Jeff; a separate "Elliot" article about the
actual friend is left for the per-conversation agent to land
on its next cycle (the librarian has no wiki_create).

=head2 Layer 3: how that resolves PURPOSE

The strengthened profile block prevents future hallucinations
on every per-conversation cycle and on every manual-update flow.
The librarian's new pass cleans up articles already on disk
within ~12 hours. Rationale comments at the top of both prompts
record the failure mode so a future revisit doesn't quietly
relax the rule.

=head1 Notes for AI reviewers

  • The strict wording is intentional. A reviewer suggesting
    softer language ("prefer", "consider") to "let the model use
    judgment" would re-introduce the exact failure mode this PR
    fixes. The configured name is the binding constraint; it is
    not a hint.
  • The librarian's name-fix pass intentionally does NOT call
    wiki_create to spawn separate articles for the friends
    whose names were misappropriated. That responsibility stays
    with the per-conversation agent; the librarian only fixes
    what's already there.
  • The unknown-name path (no Settings name) is intentionally
    conservative - pronouns + "the user" rather than asking the
    model to extract a name from context, which is exactly the
    failure mode we're defending against.

https://claude.ai/code/session_015XcR7xzLdij66ZbYERUdLH


Generated by Claude Code

…isting articles

Production traffic: an article about the user (Jeff in Settings) was
rendered with the name "Elliot" - a friend the user had mentioned in
the conversation that triggered the article. The model conflated the
user with someone else in the conversation context.

Root cause: the `renderUserProfileBlock` helper used soft wording
("prefer their name") rather than a hard rule, and the autonomous
prompt's "Do not fabricate" line covered facts but said nothing
about names specifically. With those gaps, the model treated the
configured name as a suggestion rather than a constraint.

**Stronger profile block.** Both wiki agents' renderUserProfileBlock
now uses HARD anti-fabrication wording:

  "The user's name is **Jeff**.
   When an article refers to the user themselves, the user's name
   is **Jeff** and ONLY Jeff. NEVER invent another name for the
   user, even if other names appear in the conversation - those
   other names belong to other people the user knows. If the
   conversation mentions a friend named Maya, an article about the
   user does not call the user Maya; it calls the user Jeff. ..."

The unknown-name path (location set, name not) is split out so we
don't tell the model to "use their name" when none was supplied;
in that case it falls back to natural pronouns + the literal
phrase "the user".

**Autonomous prompt anti-fabrication.** The body's "Do not
fabricate" section gains a "Do not fabricate names" companion
that points back to the profile block as the single source of
truth.

**Librarian recovery pass.** The librarian gains a new workflow
step (positioned between scope-cleanup and duplicate-consolidation)
that scans for articles about the user using a wrong name and
wiki_updates them to the configured name. Uses memory_search +
conversation_search to disambiguate (an article mis-naming the
user "Elliot" is fixed to use the right name; a separate "Elliot"
article about the actual friend is left for the per-conversation
agent to land). On its next 12h cycle this pass will sweep up
existing hallucinations.

Rationale comments at the top of both prompts record the failure
mode so a future revisit doesn't quietly relax the rule.

https://claude.ai/code/session_015XcR7xzLdij66ZbYERUdLH
@sysread sysread merged commit f8d5168 into main May 10, 2026
1 check passed
@sysread sysread deleted the claude/wiki-name-hallucination-fix branch May 10, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants