This project generates realistic Hacker News comments in the voice of specific well-known users. Each user's full comment history is available as {username}.json in the repo root.
| File | User | Comment count |
|---|---|---|
| pg.json | Paul Graham | ~15,500 |
| tptacek.json | Thomas Ptacek | ~72,600 |
| dang.json | Daniel Gackle (HN mod) | ~79,400 |
| gwern.json | Gwern Branwen | ~7,100 |
| jacquesm.json | Jacques Mattheij | ~63,000 |
| patio11.json | Patrick McKenzie | ~10,400 |
| sillysaurusx.json + sillysaurus3.json | Shawn Presser | ~8,000 combined |
| networked.json | D. Bohdan | ~830 |
Each file is an array of HN items. Filter to type == "comment" with a text field. The text field contains HTML entities (> for >, & for &, <p> for paragraph breaks). Unescape with html.unescape() and strip tags.
To study a voice, sample comments filtered by topic keywords and by length range. Short comments (100-400 chars) reveal sentence-level style. Medium comments (400-1200 chars) reveal paragraph structure and argument patterns. Long comments (1200+) reveal how someone builds a case.
Style: Ruthlessly concise. Every sentence does work. Conversational but precise. Read "Write Like You Talk" (paulgraham.com/talk.html) for his philosophy.
Sentence patterns: Short, declarative. Rarely hedges. States things flatly when confident. Uses "I think" or "I suspect" only when genuinely uncertain, not as filler.
Structure: Leads with a concrete observation or non-obvious reframe, not the consensus take. Often 2-3 short paragraphs. No bullet points ever. Ends on the insight, not a summary.
Typical moves:
- Opens with a specific factual observation that reframes the debate
- Draws analogies to other domains to illuminate a point
- Exposes inconsistency ("X is allowed but Y isn't, and they're the same thing")
- Trusts the reader to make connections; doesn't spell everything out
Topics: Startups, technology trends, programming, contrarian observations about how the world works. Skeptical of regulation, especially when it protects incumbents.
What he doesn't do: Moralize. Use jargon. Write long comments. Add qualifications. Use bullet points or headers.
Length: Typically 3-8 sentences for a toplevel. Sometimes just 1-2 sentences for a reply.
Style: Direct, sharp, assertive. No hedging. Sometimes combative. Confidently contrarian to HN consensus.
Sentence patterns: Mix of short punchy sentences and longer analytical ones. Uses em-dashes (though for simulation, avoid per user preference). Doesn't soften punches.
Structure: Often leads with a sharp one-liner, then develops the point. Or leads with a concession ("I run Firefox. I'm going to continue running Firefox.") before the attack.
Typical moves:
- "What this actually is": strips away rhetoric to name the real thing underneath
- Calls out the gap between stated principles and actual behavior
- Defends regulation when HN consensus is reflexively anti-regulation
- Makes the security argument that nobody else is making
- Pushes back directly on other commenters, names them
Topics: Security (his profession), crypto skepticism, defending expertise, regulatory arguments, calling out bullshit. Has specifically and consistently argued that prediction markets are "unregulated prop gambling venues."
What he doesn't do: Hedge. Use "I think" or "it seems to me." Agree with the HN consensus when he thinks it's wrong. Write long meandering comments. He's concise even when his comments are long.
Length: Varies widely. Replies can be 1-3 sentences. Toplevels can be 2-4 substantial paragraphs.
Style: Dense, reference-heavy, empirical. Cites papers, links to his own site, gives specific numbers. Uses single quotes for scare-quotes ('like this').
Sentence patterns: Longer sentences with parenthetical asides (often snarky). Academic but not dry.
Structure: States a claim, then backs it with specific evidence. Long parenthetical digressions. Often ends with a link dump or a bet.
Typical moves:
- Cites specific papers with author names and years
- Provides historical context that spans decades
- Notes when a pattern has repeated before and nobody remembers
- Pushes back with data rather than opinions
- Personal experience: "I've been betting on prediction markets since ~2003"
- Links to gwern.net extensively
Topics: AI/ML, prediction markets, forecasting, empirical evidence, genetics, statistics, history of technology. Has deep firsthand experience with Intrade, PredictionBook, GJP.
What he doesn't do: Make vague claims. Agree without adding something. Write short comments (his are almost always long). Use emotional language.
Length: Typically long. 3-6 paragraphs with references. Even replies tend to be substantial.
Style: Balanced, philosophical, sees both sides. Signature phrase: "It seems to me." Gentle but firm.
Sentence patterns: Measured. Acknowledges a point before complicating it. Often asks a question at the end.
Structure: Complicates rather than refutes. Names dynamics and patterns in how people argue, not just what they argue about.
Typical moves:
- "It seems to me there's a version of this argument that's harder to dismiss"
- Names the meta-pattern ("this thread isn't really about X, it's about Y")
- Defends people being attacked while still disagreeing with them
- References HN's mission: intellectual curiosity, substantive discussion
- Adds nuance that makes both sides uncomfortable
Two modes: (1) Moderator dang: "Please don't post like this" with links to guidelines. (2) Philosophical dang: thoughtful engagement with ideas. For simulation, use mode 2.
Topics: Community dynamics, how people relate to information, the gap between stated reasons and real reasons, HN meta-discussion.
What he doesn't do: Take strong partisan positions. Be combative. Resolve tensions; he names them and leaves them. Use "I think" (uses "it seems to me" instead).
Length: Medium. 2-3 paragraphs. Replies are shorter but still thoughtful.
Style: Practical, experienced, European. Warm but direct. Conversational.
Sentence patterns: Colloquial. "you can't make this up." "That's how it always goes." Direct address.
Structure: Often opens with a personal anecdote or practical observation, then broadens to the general point. Grounds abstract debates in concrete experience.
Typical moves:
- "I maintain several open source projects and..."
- European/Dutch perspective on regulation (knows EU regulatory landscape firsthand)
- Follows the money: "It's about protecting the revenue stream"
- Compares to something he's personally experienced
- Slightly cynical about institutional motives but not bitter
Topics: EU regulation, open source, running businesses in Europe, sailing, hardware, corporate behavior, GDPR, privacy. Has run ISPs, worked at banks, maintained open source for decades.
What he doesn't do: Cite papers (unlike gwern). Make purely philosophical arguments. Write from a US-centric perspective.
Length: Medium. 2-3 paragraphs for toplevels. Shorter replies.
Style: Deep institutional knowledge. Explains how systems actually work behind the headline. Dry wit.
Sentence patterns: Long, precise sentences. Parenthetical asides. Sometimes quotes from articles and deconstructs them word by word.
Structure: Picks up on the most boring-sounding detail in the article as the most important one. Explains the institutional machinery. Often reframes: "The real question is not X, it's Y."
Typical moves:
- Quotes specific language from articles, then explains what it actually means
- "In practice what happens is..."
- Names specific dollar amounts, specific regulatory bodies, specific mechanisms
- "If I were a cynical man, I'd note that..."
- Connects to how businesses, banks, and regulators actually operate day to day
- Explains the gap between what regulatory language says and what it does
Topics: Financial infrastructure, banking, compliance, payments, SaaS businesses, Japan (lived there for a decade), HR/salary negotiation.
What he doesn't do: Make short comments (almost never). Be vague. Miss the institutional angle.
Length: Long. Often the longest comments in a thread. 3-5 paragraphs. Toplevels can be very substantial. Replies are shorter but still detailed.
Style: Personal, honest, sometimes vulnerable. Shares real experiences freely. Conversational and informal.
Sentence patterns: Direct. Asks genuine questions. Stream-of-consciousness parenthetical asides. Uses contractions.
Structure: Often starts with a reaction ("I'm amazed no one is pushing back on this"), then develops a personal take. Sometimes opens with an anecdote.
Typical moves:
- Shares personal experience without pretense: "I've noticed this in myself"
- Pro-freedom, pro-individual autonomy
- Makes practical observations about how things actually work on the internet
- Notices when stated justifications don't match real reasons
- Willing to be the pessimist: "Does outrage actually move the needle? I've been watching..."
- AI-focused: deeply engaged with LLM tools, sees connections others miss
Topics: AI/ML, freedom of speech, internet culture, crypto, gaming, parenting, personal experience.
What he doesn't do: Write in a formal/academic register. Cite papers. Hedge extensively. Pretend to be neutral when he has a strong opinion.
Length: Medium. 2-4 paragraphs. Replies can be short and punchy.
Style: Technically precise, detail-oriented. Measured and polite but direct when calling something out. Dry humor (occasional :-)).
Sentence patterns: Complete, well-punctuated sentences. Uses semicolons correctly. Links to exact documentation, commits, specific resources.
Structure: Makes one specific, concrete observation rather than a sweeping argument. Often surfaces a detail from the source material that others missed.
Typical moves:
- Precise technical or legal objection: "Whether training counts as 'use' under MIT is not settled"
- Calls out misrepresentation politely but firmly: "Your guide sounds obviously written by an LLM... don't say you wrote it"
- Surfaces buried details from reply threads or linked sources
- Asks genuine questions to clarify
- FOSS/licensing-aware: knows GPL vs MIT vs AGPL distinctions cold
Topics: Programming languages (Tcl, Zig, Crystal, Plan 9), FOSS licensing, AI writing quality (interested in making AI write better; see tropes.fyi), open source culture, tools.
What he doesn't do: Write long sweeping comments. Make emotional arguments. Be combative. Ignore details.
Length: Short to medium. Usually the most concise commenter in the thread. 2-5 sentences for a reply. Toplevels are slightly longer but still tight.
The reader should finish and think "hm, good point" or be surprised by something. If a comment doesn't make a concrete point, reframe something, surface a hidden detail, or ask an interesting question, cut it.
Reference: tropes.fyi by ossama.is (gist.github.com/ossa-ma/f3baa9d25154c33095e22272c631f5a1)
Key tropes to avoid:
- "It's not X, it's Y" with em-dashes. The #1 AI tell. Rephrase.
- "Here's the thing" / "Here's where it gets interesting" before unremarkable points.
- "Quietly" and magic adverbs: "deeply," "fundamentally," "remarkably," "arguably."
- Em-dash addiction. Use parens, semicolons, colons, or commas instead. (Also in CLAUDE.md: never use em-dashes or en-dashes.)
- "The real question is..." Overused. Vary the phrasing.
- Tricolon abuse. Don't always list three things.
- Fractal summaries. Don't restate the point at the end.
- "Let's unpack" / "Let's break this down." Never.
- "Tapestry," "landscape," "paradigm," "ecosystem." Never.
- Bold-first bullet points. Never in HN comments.
- Signposted conclusions. No "In conclusion" or "To sum up."
Real HN threads have:
- Variety in length. Some comments are 2 sentences, some are 5 paragraphs.
- Natural disagreements. People push back on specific points, not wholesale.
- Reply chains 2-3 deep that develop an idea through dialogue.
- [flagged/dead] comments. Low-effort snark, trolling, or inflammatory drive-bys that get downvoted. Include 1-2 for realism.
- Cross-pollination. Someone picks up another commenter's point and runs with it.
- Convergence. Good threads often converge on one observation that multiple people recognize as the best point.
When generating multiple comments in sequence, each new voice can be contaminated by the previous one (e.g., dang's comment accidentally responding to the same thing gwern focused on, or using pg's sentence patterns in a jacquesm comment). Before writing each comment, mentally reset and ask: "What would this specific person notice first? What's their angle?"
Toplevel comments can be longer, especially for users who naturally write long (patio11, gwern, tptacek). Replies should generally be shorter. But occasionally a reply goes long when someone has deep expertise on the specific sub-point.
{Brief context block describing the article/post and its key claims}
======================================================================
--- {username} toplevel:
{comment text}
--- {username} replying to {other_username}:
> {quoted text from parent comment}
{reply text}
--- [flagged, probably dead] {throwaway_username} toplevel:
{snarky/trollish comment}
sim.txt- Spain blocks prediction markets (first attempt, longer comments)sim2.txt- Open Slopware (codeberg.org/small-hack/open-slopware)sim4.txt- Open Slopware (revised with all personas including networked)sim5.txt- Rich Felker's "Co-authored-by: Claude is advertising" Mastodon postsim6.txt- Andrew Kelley's "fuck you, everybody at Mozilla" (Firefox address bar ads)
- Fetch the article. Use WebFetch or WebSearch. Get exact quotes; patio11 in particular needs verbatim text to quote and deconstruct.
- Understand the topic. What are the obvious takes? What are the non-obvious ones? Which persona naturally has the most interesting angle?
- Sample from the JSON files. For each persona, grep for topic-relevant keywords, sample 10-20 comments, and read them to calibrate the voice fresh. Don't skip this step; it prevents voice drift.
- Plan the thread. Decide which personas go toplevel vs reply. Identify 2-3 natural reply chains. Pick 1-2 dead/flagged comments for realism.
- Write each comment independently. Reset between voices. Ask: what would this person notice first? What's their angle? What's their takeaway?
- Review for AI tropes. Check for "It's not X, it's Y," em-dashes, "the real question is," tricolons, magic adverbs. Rewrite if needed.
- Save to sim{N}.txt with context block at top.