Conversation
…isting articles
Production traffic: an article about the user (Jeff in Settings) was
rendered with the name "Elliot" - a friend the user had mentioned in
the conversation that triggered the article. The model conflated the
user with someone else in the conversation context.
Root cause: the `renderUserProfileBlock` helper used soft wording
("prefer their name") rather than a hard rule, and the autonomous
prompt's "Do not fabricate" line covered facts but said nothing
about names specifically. With those gaps, the model treated the
configured name as a suggestion rather than a constraint.
**Stronger profile block.** Both wiki agents' renderUserProfileBlock
now uses HARD anti-fabrication wording:
"The user's name is **Jeff**.
When an article refers to the user themselves, the user's name
is **Jeff** and ONLY Jeff. NEVER invent another name for the
user, even if other names appear in the conversation - those
other names belong to other people the user knows. If the
conversation mentions a friend named Maya, an article about the
user does not call the user Maya; it calls the user Jeff. ..."
The unknown-name path (location set, name not) is split out so we
don't tell the model to "use their name" when none was supplied;
in that case it falls back to natural pronouns + the literal
phrase "the user".
**Autonomous prompt anti-fabrication.** The body's "Do not
fabricate" section gains a "Do not fabricate names" companion
that points back to the profile block as the single source of
truth.
**Librarian recovery pass.** The librarian gains a new workflow
step (positioned between scope-cleanup and duplicate-consolidation)
that scans for articles about the user using a wrong name and
wiki_updates them to the configured name. Uses memory_search +
conversation_search to disambiguate (an article mis-naming the
user "Elliot" is fixed to use the right name; a separate "Elliot"
article about the actual friend is left for the per-conversation
agent to land). On its next 12h cycle this pass will sweep up
existing hallucinations.
Rationale comments at the top of both prompts record the failure
mode so a future revisit doesn't quietly relax the rule.
https://claude.ai/code/session_015XcR7xzLdij66ZbYERUdLH
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
=head1 SYNOPSIS
Wiki agent invented a user-name (called the user "Elliot" in
articles when the configured name was "Jeff", because a friend
named Elliot was mentioned in the source conversation).
Three fixes: stronger profile block, explicit no-name-fabrication
rule in the autonomous body, and a librarian recovery pass for
articles already on disk with the wrong name.
=head1 PURPOSE
User reported: settings says "Jeff", multiple wiki articles call
the user "Elliot" (a friend the user discussed in the
conversation that triggered the article). The model conflated
the user with someone else from the conversation context and
applied that other name to the user-subject of the article.
=head1 DESCRIPTION
=head2 Layer 1: how renderUserProfileBlock looked before
The helper folded the configured name + location into a soft
preference: "When an article refers to the user themselves,
prefer their name (or a natural pronoun if their name is a
single first name) over the generic phrase 'the user'."
The autonomous prompt's "Do not fabricate" section covered facts
("Only assert facts that appear in the conversation...") but said
nothing about names specifically. The librarian had no path to
clean up articles that already had the wrong name baked in.
=head2 Layer 2: what this PR changes
Both
renderUserProfileBlockhelpers (wiki and librarian) arerewritten with HARD wording:
"The user's name is Jeff.
When an article refers to the user themselves, the user's
name is Jeff and ONLY Jeff. NEVER invent another name
for the user, even if other names appear in the
conversation - those other names belong to other people the
user knows. If the conversation mentions a friend named
Maya, an article about the user does not call the user
Maya; it calls the user Jeff. If you are uncertain whether
the article subject IS the user, default to using the
literal name from context (Maya, Elliot, etc.) for that
subject and reserve 'Jeff' for explicit references to the
user. ..."
The unknown-name path (no name in Settings) is split out so
the model isn't told to "use their name" when no name exists;
in that case it falls back to natural pronouns + "the user".
The autonomous body gets an explicit "Do not fabricate names"
section that points back to the profile block as the single
source of truth, and tells the model what to do when uncertain
("use the literal name as it appears in the conversation rather
than inventing one").
The librarian gains a new workflow step (positioned between
scope-cleanup and duplicate-consolidation) that scans for
articles using a wrong name for the user and
wiki_updatesthem. It uses
memory_search+conversation_searchtodisambiguate: an article mis-naming the user "Elliot" gets
fixed to use Jeff; a separate "Elliot" article about the
actual friend is left for the per-conversation agent to land
on its next cycle (the librarian has no
wiki_create).=head2 Layer 3: how that resolves PURPOSE
The strengthened profile block prevents future hallucinations
on every per-conversation cycle and on every manual-update flow.
The librarian's new pass cleans up articles already on disk
within ~12 hours. Rationale comments at the top of both prompts
record the failure mode so a future revisit doesn't quietly
relax the rule.
=head1 Notes for AI reviewers
softer language ("prefer", "consider") to "let the model use
judgment" would re-introduce the exact failure mode this PR
fixes. The configured name is the binding constraint; it is
not a hint.
wiki_createto spawn separate articles for the friendswhose names were misappropriated. That responsibility stays
with the per-conversation agent; the librarian only fixes
what's already there.
conservative - pronouns + "the user" rather than asking the
model to extract a name from context, which is exactly the
failure mode we're defending against.
https://claude.ai/code/session_015XcR7xzLdij66ZbYERUdLH
Generated by Claude Code