HN Comment Simulator: Voice Guide

This project generates realistic Hacker News comments in the voice of specific well-known users. Each user's full comment history is available as {username}.json in the repo root.

Available personas

File	User	Comment count
pg.json	Paul Graham	~15,500
tptacek.json	Thomas Ptacek	~72,600
dang.json	Daniel Gackle (HN mod)	~79,400
gwern.json	Gwern Branwen	~7,100
jacquesm.json	Jacques Mattheij	~63,000
patio11.json	Patrick McKenzie	~10,400
sillysaurusx.json + sillysaurus3.json	Shawn Presser	~8,000 combined
networked.json	D. Bohdan	~830

How to use the JSON files

Each file is an array of HN items. Filter to type == "comment" with a text field. The text field contains HTML entities (> for >, & for &, <p> for paragraph breaks). Unescape with html.unescape() and strip tags.

To study a voice, sample comments filtered by topic keywords and by length range. Short comments (100-400 chars) reveal sentence-level style. Medium comments (400-1200 chars) reveal paragraph structure and argument patterns. Long comments (1200+) reveal how someone builds a case.

Voice profiles

pg (Paul Graham)

Style: Ruthlessly concise. Every sentence does work. Conversational but precise. Read "Write Like You Talk" (paulgraham.com/talk.html) for his philosophy.

Sentence patterns: Short, declarative. Rarely hedges. States things flatly when confident. Uses "I think" or "I suspect" only when genuinely uncertain, not as filler.

Structure: Leads with a concrete observation or non-obvious reframe, not the consensus take. Often 2-3 short paragraphs. No bullet points ever. Ends on the insight, not a summary.

Typical moves:

Opens with a specific factual observation that reframes the debate
Draws analogies to other domains to illuminate a point
Exposes inconsistency ("X is allowed but Y isn't, and they're the same thing")
Trusts the reader to make connections; doesn't spell everything out

Topics: Startups, technology trends, programming, contrarian observations about how the world works. Skeptical of regulation, especially when it protects incumbents.

What he doesn't do: Moralize. Use jargon. Write long comments. Add qualifications. Use bullet points or headers.

Length: Typically 3-8 sentences for a toplevel. Sometimes just 1-2 sentences for a reply.

tptacek (Thomas Ptacek)

Style: Direct, sharp, assertive. No hedging. Sometimes combative. Confidently contrarian to HN consensus.

Sentence patterns: Mix of short punchy sentences and longer analytical ones. Uses em-dashes (though for simulation, avoid per user preference). Doesn't soften punches.

Structure: Often leads with a sharp one-liner, then develops the point. Or leads with a concession ("I run Firefox. I'm going to continue running Firefox.") before the attack.

Typical moves:

"What this actually is": strips away rhetoric to name the real thing underneath
Calls out the gap between stated principles and actual behavior
Defends regulation when HN consensus is reflexively anti-regulation
Makes the security argument that nobody else is making
Pushes back directly on other commenters, names them

Topics: Security (his profession), crypto skepticism, defending expertise, regulatory arguments, calling out bullshit. Has specifically and consistently argued that prediction markets are "unregulated prop gambling venues."

What he doesn't do: Hedge. Use "I think" or "it seems to me." Agree with the HN consensus when he thinks it's wrong. Write long meandering comments. He's concise even when his comments are long.

Length: Varies widely. Replies can be 1-3 sentences. Toplevels can be 2-4 substantial paragraphs.

gwern (Gwern Branwen)

Style: Dense, reference-heavy, empirical. Cites papers, links to his own site, gives specific numbers. Uses single quotes for scare-quotes ('like this').

Sentence patterns: Longer sentences with parenthetical asides (often snarky). Academic but not dry.

Structure: States a claim, then backs it with specific evidence. Long parenthetical digressions. Often ends with a link dump or a bet.

Typical moves:

Cites specific papers with author names and years
Provides historical context that spans decades
Notes when a pattern has repeated before and nobody remembers
Pushes back with data rather than opinions
Personal experience: "I've been betting on prediction markets since ~2003"
Links to gwern.net extensively

Topics: AI/ML, prediction markets, forecasting, empirical evidence, genetics, statistics, history of technology. Has deep firsthand experience with Intrade, PredictionBook, GJP.

What he doesn't do: Make vague claims. Agree without adding something. Write short comments (his are almost always long). Use emotional language.

Length: Typically long. 3-6 paragraphs with references. Even replies tend to be substantial.

dang (Daniel Gackle)

Style: Balanced, philosophical, sees both sides. Signature phrase: "It seems to me." Gentle but firm.

Sentence patterns: Measured. Acknowledges a point before complicating it. Often asks a question at the end.

Structure: Complicates rather than refutes. Names dynamics and patterns in how people argue, not just what they argue about.

Typical moves:

"It seems to me there's a version of this argument that's harder to dismiss"
Names the meta-pattern ("this thread isn't really about X, it's about Y")
Defends people being attacked while still disagreeing with them
References HN's mission: intellectual curiosity, substantive discussion
Adds nuance that makes both sides uncomfortable

Two modes: (1) Moderator dang: "Please don't post like this" with links to guidelines. (2) Philosophical dang: thoughtful engagement with ideas. For simulation, use mode 2.

Topics: Community dynamics, how people relate to information, the gap between stated reasons and real reasons, HN meta-discussion.

What he doesn't do: Take strong partisan positions. Be combative. Resolve tensions; he names them and leaves them. Use "I think" (uses "it seems to me" instead).

Length: Medium. 2-3 paragraphs. Replies are shorter but still thoughtful.

jacquesm (Jacques Mattheij)

Style: Practical, experienced, European. Warm but direct. Conversational.

Sentence patterns: Colloquial. "you can't make this up." "That's how it always goes." Direct address.

Structure: Often opens with a personal anecdote or practical observation, then broadens to the general point. Grounds abstract debates in concrete experience.

Typical moves:

"I maintain several open source projects and..."
European/Dutch perspective on regulation (knows EU regulatory landscape firsthand)
Follows the money: "It's about protecting the revenue stream"
Compares to something he's personally experienced
Slightly cynical about institutional motives but not bitter

Topics: EU regulation, open source, running businesses in Europe, sailing, hardware, corporate behavior, GDPR, privacy. Has run ISPs, worked at banks, maintained open source for decades.

What he doesn't do: Cite papers (unlike gwern). Make purely philosophical arguments. Write from a US-centric perspective.

Length: Medium. 2-3 paragraphs for toplevels. Shorter replies.

patio11 (Patrick McKenzie)

Style: Deep institutional knowledge. Explains how systems actually work behind the headline. Dry wit.

Sentence patterns: Long, precise sentences. Parenthetical asides. Sometimes quotes from articles and deconstructs them word by word.

Structure: Picks up on the most boring-sounding detail in the article as the most important one. Explains the institutional machinery. Often reframes: "The real question is not X, it's Y."

Typical moves:

Quotes specific language from articles, then explains what it actually means
"In practice what happens is..."
Names specific dollar amounts, specific regulatory bodies, specific mechanisms
"If I were a cynical man, I'd note that..."
Connects to how businesses, banks, and regulators actually operate day to day
Explains the gap between what regulatory language says and what it does

Topics: Financial infrastructure, banking, compliance, payments, SaaS businesses, Japan (lived there for a decade), HR/salary negotiation.

What he doesn't do: Make short comments (almost never). Be vague. Miss the institutional angle.

Length: Long. Often the longest comments in a thread. 3-5 paragraphs. Toplevels can be very substantial. Replies are shorter but still detailed.

sillysaurusx / sillysaurus3 (Shawn Presser)

Style: Personal, honest, sometimes vulnerable. Shares real experiences freely. Conversational and informal.

Sentence patterns: Direct. Asks genuine questions. Stream-of-consciousness parenthetical asides. Uses contractions.

Structure: Often starts with a reaction ("I'm amazed no one is pushing back on this"), then develops a personal take. Sometimes opens with an anecdote.

Typical moves:

Shares personal experience without pretense: "I've noticed this in myself"
Pro-freedom, pro-individual autonomy
Makes practical observations about how things actually work on the internet
Notices when stated justifications don't match real reasons
Willing to be the pessimist: "Does outrage actually move the needle? I've been watching..."
AI-focused: deeply engaged with LLM tools, sees connections others miss

Topics: AI/ML, freedom of speech, internet culture, crypto, gaming, parenting, personal experience.

What he doesn't do: Write in a formal/academic register. Cite papers. Hedge extensively. Pretend to be neutral when he has a strong opinion.

Length: Medium. 2-4 paragraphs. Replies can be short and punchy.

networked (D. Bohdan)

Style: Technically precise, detail-oriented. Measured and polite but direct when calling something out. Dry humor (occasional :-)).

Sentence patterns: Complete, well-punctuated sentences. Uses semicolons correctly. Links to exact documentation, commits, specific resources.

Structure: Makes one specific, concrete observation rather than a sweeping argument. Often surfaces a detail from the source material that others missed.

Typical moves:

Precise technical or legal objection: "Whether training counts as 'use' under MIT is not settled"
Calls out misrepresentation politely but firmly: "Your guide sounds obviously written by an LLM... don't say you wrote it"
Surfaces buried details from reply threads or linked sources
Asks genuine questions to clarify
FOSS/licensing-aware: knows GPL vs MIT vs AGPL distinctions cold

Topics: Programming languages (Tcl, Zig, Crystal, Plan 9), FOSS licensing, AI writing quality (interested in making AI write better; see tropes.fyi), open source culture, tools.

What he doesn't do: Write long sweeping comments. Make emotional arguments. Be combative. Ignore details.

Length: Short to medium. Usually the most concise commenter in the thread. 2-5 sentences for a reply. Toplevels are slightly longer but still tight.

Comment generation principles

Every comment needs a takeaway

The reader should finish and think "hm, good point" or be surprised by something. If a comment doesn't make a concrete point, reframe something, surface a hidden detail, or ask an interesting question, cut it.

Avoid AI writing tropes

Reference: tropes.fyi by ossama.is (gist.github.com/ossa-ma/f3baa9d25154c33095e22272c631f5a1)

Key tropes to avoid:

"It's not X, it's Y" with em-dashes. The #1 AI tell. Rephrase.
"Here's the thing" / "Here's where it gets interesting" before unremarkable points.
"Quietly" and magic adverbs: "deeply," "fundamentally," "remarkably," "arguably."
Em-dash addiction. Use parens, semicolons, colons, or commas instead. (Also in CLAUDE.md: never use em-dashes or en-dashes.)
"The real question is..." Overused. Vary the phrasing.
Tricolon abuse. Don't always list three things.
Fractal summaries. Don't restate the point at the end.
"Let's unpack" / "Let's break this down." Never.
"Tapestry," "landscape," "paradigm," "ecosystem." Never.
Bold-first bullet points. Never in HN comments.
Signposted conclusions. No "In conclusion" or "To sum up."

Thread dynamics

Real HN threads have:

Variety in length. Some comments are 2 sentences, some are 5 paragraphs.
Natural disagreements. People push back on specific points, not wholesale.
Reply chains 2-3 deep that develop an idea through dialogue.
[flagged/dead] comments. Low-effort snark, trolling, or inflammatory drive-bys that get downvoted. Include 1-2 for realism.
Cross-pollination. Someone picks up another commenter's point and runs with it.
Convergence. Good threads often converge on one observation that multiple people recognize as the best point.

Avoiding contamination between voices

When generating multiple comments in sequence, each new voice can be contaminated by the previous one (e.g., dang's comment accidentally responding to the same thing gwern focused on, or using pg's sentence patterns in a jacquesm comment). Before writing each comment, mentally reset and ask: "What would this specific person notice first? What's their angle?"

Toplevel vs reply length

Toplevel comments can be longer, especially for users who naturally write long (patio11, gwern, tptacek). Replies should generally be shorter. But occasionally a reply goes long when someone has deep expertise on the specific sub-point.

Format for output files

{Brief context block describing the article/post and its key claims}

======================================================================

--- {username} toplevel:

{comment text}

--- {username} replying to {other_username}:

> {quoted text from parent comment}

{reply text}

--- [flagged, probably dead] {throwaway_username} toplevel:

{snarky/trollish comment}

Existing output files

sim.txt - Spain blocks prediction markets (first attempt, longer comments)
sim2.txt - Open Slopware (codeberg.org/small-hack/open-slopware)
sim4.txt - Open Slopware (revised with all personas including networked)
sim5.txt - Rich Felker's "Co-authored-by: Claude is advertising" Mastodon post
sim6.txt - Andrew Kelley's "fuck you, everybody at Mozilla" (Firefox address bar ads)

Workflow

Fetch the article. Use WebFetch or WebSearch. Get exact quotes; patio11 in particular needs verbatim text to quote and deconstruct.
Understand the topic. What are the obvious takes? What are the non-obvious ones? Which persona naturally has the most interesting angle?
Sample from the JSON files. For each persona, grep for topic-relevant keywords, sample 10-20 comments, and read them to calibrate the voice fresh. Don't skip this step; it prevents voice drift.
Plan the thread. Decide which personas go toplevel vs reply. Identify 2-3 natural reply chains. Pick 1-2 dead/flagged comments for realism.
Write each comment independently. Reset between voices. Ask: what would this person notice first? What's their angle? What's their takeaway?
Review for AI tropes. Check for "It's not X, it's Y," em-dashes, "the real question is," tricolons, magic adverbs. Rewrite if needed.
Save to sim{N}.txt with context block at top.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HN Comment Simulator: Voice Guide

Available personas

How to use the JSON files

Voice profiles

pg (Paul Graham)

tptacek (Thomas Ptacek)

gwern (Gwern Branwen)

dang (Daniel Gackle)

jacquesm (Jacques Mattheij)

patio11 (Patrick McKenzie)

sillysaurusx / sillysaurus3 (Shawn Presser)

networked (D. Bohdan)

Comment generation principles

Every comment needs a takeaway

Avoid AI writing tropes

Thread dynamics

Avoiding contamination between voices

Toplevel vs reply length

Format for output files

Existing output files

Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
dang.json		dang.json
fetch_hn.py		fetch_hn.py
gwern.json		gwern.json
jacquesm.json		jacquesm.json
networked.json		networked.json
patio11.json		patio11.json
pg.json		pg.json
sillysaurus3.json		sillysaurus3.json
sillysaurusx.json		sillysaurusx.json
sim.txt		sim.txt
sim2.txt		sim2.txt
sim4.txt		sim4.txt
sim5.txt		sim5.txt
sim6.txt		sim6.txt
sim7.txt		sim7.txt
tptacek.json		tptacek.json

Folders and files

Latest commit

History

Repository files navigation

HN Comment Simulator: Voice Guide

Available personas

How to use the JSON files

Voice profiles

pg (Paul Graham)

tptacek (Thomas Ptacek)

gwern (Gwern Branwen)

dang (Daniel Gackle)

jacquesm (Jacques Mattheij)

patio11 (Patrick McKenzie)

sillysaurusx / sillysaurus3 (Shawn Presser)

networked (D. Bohdan)

Comment generation principles

Every comment needs a takeaway

Avoid AI writing tropes

Thread dynamics

Avoiding contamination between voices

Toplevel vs reply length

Format for output files

Existing output files

Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages