Tools for coding, teaching, and presentations with AI assistance.
This is a collection of tools, templates, and philosophies I've developed while using Claude Code for:
- Coding (data analysis scripts, replication code, automation)
- Teaching (course materials, lecture decks, pedagogical tools)
- Presentations (Beamer decks, slides for talks and seminars)
As I develop new approaches, I'll add them here. Anyone is free to use them.
Take everything with a grain of salt. These are workflows that work for me. Your mileage may vary.
Scott Cunningham — Professor of Economics at Baylor University
- Website: www.scunning.com
- Substack: causalinf.substack.com — I write regularly about causal inference, Claude Code, and random things
- Free book: Causal Inference: The Mixtape — available online
Location: workflow.md | Deck: presentations/examples/workflow_deck/
Before diving into specific tools, read my workflow document. It explains how I think about using Claude Code for empirical research—not just the tools, but the philosophy behind them.
Key concepts:
| Concept | What It Means |
|---|---|
| Thinking partner, not code monkey | Claude is a collaborator who reasons about problems, not just a code generator |
| External memory via markdown | Claude has amnesia between sessions; markdown files provide institutional memory |
| Cross-software replication | R = Stata = Python to 6 decimal places, or something is wrong |
| Adversarial review (Referee 2) | Fresh Claude instance audits your work; you can't grade your own homework |
| Verification through visualization | Trust pictures over numbers; errors become visible |
| Documentation as first-class output | If it's not documented, it didn't happen |
Everything else in this repo implements these principles.
Location: skills/referee2/ | personas/referee2.md (full protocol)
Referee 2 is a health inspector for empirical research — a systematic five-audit protocol with cross-language replication, formal referee reports, and a revise & resubmit process. It runs after a project is complete, in a fresh terminal, by a Claude instance that has never seen the work. The separation is what makes it independent: the Claude that built the pipeline cannot objectively audit it.
The Five Audits:
| Audit | What It Does |
|---|---|
| Code Audit | Coding errors, missing value handling, merge diagnostics, variable construction |
| Cross-Language Replication | Replication scripts in 2 other languages (R/Stata/Python), results compared to 6 decimal places |
| Directory Audit | Folder structure, relative paths, naming conventions — replication-package ready? |
| Output Automation Audit | Are tables and figures programmatically generated or manually created? |
| Econometrics Audit | Identification strategy, standard errors, fixed effects, parallel trends, first stage |
Critical Rule: Referee 2 NEVER modifies author code. It only creates its own replication scripts. Only the author modifies the author's code.
Referee 2 is one of two complementary audit tools. See below.
Location: skills/blindspot/ | .claude/skills/blindspot/SKILL.md (actual skill)
Blindspot is a peripheral vision audit for empirical output — a structured protocol for finding what the author cannot see. Problems hiding in plain sight (vices) and opportunities being overlooked (virtues).
The Shklovsky principle: Viktor Shklovsky, the Soviet literary theorist, argued that art exists to restore perception. A man who walks barefoot up a mountain eventually cannot feel his feet. Art exists to make the stone stony again. Research has the same problem: by the time you've spent months on a paper, the main finding has collapsed your attention, and everything else — the spike at t=1, the missing subgroup, the heterogeneity richer than the average — has become invisible.
Blindspot makes the stone stony again.
The Blindspot Grid — four quadrants, two vices and two virtues:
| What's there but unseen | What's absent but unnoticed | |
|---|---|---|
| Problems | Vice 1: The Unexplained Feature — a spike, a sign flip, a sample-size drop nobody asked about | Vice 2: The Convenient Absence — a robustness check never run, a subgroup never examined, a dog that didn't bark |
| Opportunities | Virtue 1: The Unasked Question — heterogeneity richer than the average, a mechanism visible in the data but absent from the hypothesis | Virtue 2: The Unexploited Strength — an identification argument stronger than the paper claims, a falsification test that would crush the main objection |
Ruling: CLEAR / CONDITIONAL / HOLD
Usage: /blindspot path/to/figure-or-table "what I think the main finding is"
Read the full documentation: skills/blindspot/README.md
These two tools address different failure modes at different stages of the research process. Both should be run. Neither replaces the other.
| Referee 2 | Blindspot | |
|---|---|---|
| Core question | Is this implemented correctly? | Can you see what's in front of you? |
| Failure mode it catches | Coding errors, bad merges, wrong SEs, non-replicating results | Overlooked problems (vices) and overlooked opportunities (virtues) |
| When it runs | After the project is complete | When output first appears, before writing begins |
| Session | Fresh terminal — independence is structural | Same session — you need the person closest to the work |
| Persona | Health inspector with a checklist | Shklovsky — restoring perception |
| Would have caught a merge error? | Yes | Maybe |
| Would have caught the t=1 spike? | No | Yes |
Why separate sessions for Referee 2 but not Blindspot?
Referee 2 needs a fresh session because it's auditing implementation — the Claude that built the code will rationalize its own choices. True independence requires structural separation.
Blindspot doesn't need separation because it's auditing perception — your own understanding of output you produced. You're the right person to do that, with a structured forcing function to look past what you expect to see.
The workflow:
- Produce output →
/blindspot→ interpret and write - Complete project → open fresh terminal →
/referee2
Location: presentations/
My philosophy of slide design, plus a tested prompt for generating Beamer presentations. The key insight: aim for MB/MC equivalence across slides (smoothness), not maximum density.
Core principles:
- Beauty earns attention; attention enables communication
- Titles are assertions, not labels
- One idea per slide
- Bullets are defeat—find the structure hiding in your list
Location: skills/split-pdf/ (human-readable guide) | .claude/skills/split-pdf/SKILL.md (actual skill)
A Claude Code skill — an invocable /split-pdf command that automates the full pipeline for reading academic papers:
- Acquire the PDF (web search + download, or use a local file in place)
- Check for an existing
_text.mdextract or existing splits — offer to reuse - Split into 4-page chunks via PyPDF2, stored in a
_build/directory - Read 3 chunks at a time (~12 pages), pausing between batches
- Extract structured reading notes across 8 dimensions into
notes.md - Persist the final extraction as
<basename>_text.mdalongside the source PDF
Why not just read the full PDF? Long PDFs either crash the session ("prompt too long" — unrecoverable) or produce shallow, hallucinated output. Splitting forces Claude to attend carefully to every section and externalizes understanding into markdown notes incrementally.
Key features: In-place PDF handling (no centralized articles/ folder), persistent _text.md extracts (skip re-reading on future invocations), split reuse, and an agent isolation protocol that prevents context bloat when other skills call /split-pdf.
Usage: Type /split-pdf path/to/paper.pdf or /split-pdf "search query for paper"
Read the full documentation: skills/split-pdf/README.md
Location: skills/beautiful_deck/ | .claude/skills/beautiful_deck/SKILL.md (actual skill)
A Claude Code skill — invoke with /beautiful_deck — that runs the full deck-generation pipeline. This is the operational version of the prompt that used to live at presentations/deck_generation_prompt.md.
What the skill enforces:
- Audience triage before any slide is written — commits to a rhetorical balance (ethos / pathos / logos) that fits the audience (academic seminar, teaching lecture, conference talk, working deck, external non-academic)
- Original theme, never boilerplate — a custom
.stytuned to the audience. May build onmetropolis,moloch,focus, etc. as a foundation, but a reader should not be able to tell what theme package is underneath - Pedagogical movement: Narrative → Application → Picture → Codeblock → Technical — intuition first, technical last. The anti-pattern is the lecture that opens with definitions and ends with an example "for intuition"
- Format flexibility — Beamer by default. Accepts Quarto, Typst, reveal.js, Marp on explicit user request
- Code-first figure generation — standalone scripts run before
\includegraphics{}is written - Zero-warning compile loop — Overfull / Underfull / font / reference warnings all must return zero at every checkpoint, not just the final compile
/tikzcleanup — invoked automatically to catch label collisions and coordinate drift- Rhetoric audit (sub-agent) — checks titles-as-assertions, one-idea-per-slide, MB/MC balance, narrative arc, Devil's Advocate presence
- Graphics audit (sub-agent) — checks numerical accuracy, label positioning, axis coherence, color consistency, font sizing
Usage: /beautiful_deck [optional content path or description]
Location: .claude/commands/compiledeck.md
A Claude Code command — invoke with /compiledeck — that embeds the full Rhetoric of Decks philosophy so you don't have to explain it each time.
The skill asks two questions:
-
Who is the audience?
- External (seminar, conference, teaching) — sparse, performative, one idea per slide
- Working (coauthors, yourself) — can be more detailed, documents reasoning
-
What's the tone?
- Professional/Academic — your consistent "house style" for outward-facing work
- Colorful/Expressive — unique, creative design each time
Why separate these? External presentations need polish and restraint. Working decks can be messier—they're thinking tools. Some people want the same style for both; others want creative freedom internally while maintaining a professional brand externally.
House style: Define your preferred "Professional/Academic" palette in your CLAUDE.md. The skill checks for it. If none is defined, it uses a sensible default.
What's embedded:
- The Three Laws (Beauty is Function, Cognitive Load is Enemy, Slide Serves Spoken Word)
- Titles as assertions, not labels
- MB/MC equivalence across slides
- The compile loop (compile → fix errors → fix warnings → visual check → repeat)
- TikZ coordinate checking and figure label verification
Usage: Type /compiledeck when creating or editing a Beamer deck.
Location: skills/tikz/ | .claude/skills/tikz/SKILL.md (actual skill)
A Claude Code skill — invoke with /tikz path/to/file.tex — that systematically audits and fixes every visual collision in every TikZ figure in a LaTeX file. Labels sitting on arrows, text inside boxes, arrows crossing each other — found and fixed using measurement, not intuition.
The problem it solves: TikZ compiles silently even when labels overlap arrows or text bleeds into box edges. The compiler catches nothing. This skill catches everything.
How it works: Six ordered passes, each targeting a specific class of collision:
| Pass | What it checks |
|---|---|
| Pass 0 | Cross-slide consistency — same diagram on multiple slides must be identical except for deliberate changes |
| Pass 1 | Bézier curves first — computes max curve depth using (chord/2) × tan(bend/2), checks every label against the danger zone |
| Pass 2 | Gap calculations — estimates label width in cm, compares against usable space between nodes |
| Pass 3 | Arrow label keywords — every label must have above, below, left, or right |
| Pass 4 | Boundary rule — labels within 0.4cm of any circle, rectangle, or filled shape are a collision |
| Pass 5 | Margin check — minimum clearances between all object pairs |
Most common pattern it catches: Step labels on flow diagrams that are wider than the arrow between boxes — they look right in code but overlap box text when rendered.
Full formulas and reference tables: compiledeck/tikz_rules.md
Usage: /tikz path/to/deck.tex
Location: .claude/commands/
| Command | Description |
|---|---|
/compiletex [file.tex] |
Compile any LaTeX file and report errors/warnings. Aims for zero warnings. |
/newproject [name] |
Scaffold a new research project with standard folder structure and CLAUDE.md. Also available as a skill. |
Location: claude/CLAUDE.md
A template for giving Claude persistent memory within a project. Copy it to your project root and fill in the specifics. Claude Code will automatically read it every session.
MixtapeTools/
├── README.md # You are here
├── workflow.md # How I use Claude Code for research (START HERE)
├── skills/ # Human-readable guides to Claude Code skills
│ ├── README.md # What skills are, how to use them, how to install
│ ├── blindspot/ # Blindspot: peripheral vision audit for output
│ │ └── README.md # Full essay, origin story, six steps
│ ├── split-pdf/ # Documentation and examples for the split-pdf skill
│ │ └── README.md # Detailed guide with methodology and examples
│ ├── newproject/ # Documentation for the new-project scaffold skill
│ └── tikz/ # Documentation for the TikZ collision audit skill
│ └── README.md # Philosophy, folder purposes, installation
├── .claude/
│ ├── commands/ # Slash commands (invoke with /command-name)
│ │ ├── compiledeck.md # /compiledeck — Beamer presentations with Rhetoric of Decks
│ │ ├── compiletex.md # /compiletex — Compile LaTeX, report errors/warnings
│ │ └── newproject.md # /newproject — Scaffold new research project
│ └── skills/
│ ├── blindspot/ # Skill: make the stone stony again
│ │ └── SKILL.md # Instructions Claude follows (invoke with /blindspot)
│ ├── tikz/ # Skill: audit and fix TikZ visual collisions
│ │ └── SKILL.md # Instructions Claude follows (invoke with /tikz)
│ ├── split-pdf/ # Skill: download, split, and deep-read PDFs
│ │ ├── SKILL.md # Instructions Claude follows
│ │ └── methodology.md # Why this method works (for humans)
│ └── newproject/ # Skill: scaffold new research projects
│ └── SKILL.md # Instructions Claude follows
├── claude/ # Templates for working with Claude
│ ├── CLAUDE.md # Project context template (copy to your projects)
│ └── README.md
├── personas/ # Systematic audit & replication protocols
│ ├── referee2.md # The 5-audit protocol for empirical research
│ └── README.md
└── presentations/ # Everything about slide decks
├── rhetoric_of_decks.md # Practical principles (condensed)
├── rhetoric_of_decks_full_essay.md # Full intellectual framework (600+ lines)
├── deck_generation_prompt.md # The prompt + iterative workflow
├── README.md
└── examples/
├── workflow_deck/ # Visual presentation of the workflow
├── rhetoric_of_decks/ # The philosophy deck (45 slides)
└── gov2001_probability/ # A lecture deck
During estimation and analysis, focus entirely on whether the specification is correct. Results are meaningless until the "experiment" is designed on purpose. Don't get excited or worried about point estimates until the design is intentional.
AI makes confident mistakes. Cross-software replication (R = Stata = Python) catches bugs that single-language analysis misses. If results aren't identical to 6+ decimal places across implementations, something is wrong.
If you ask the same Claude that wrote code to review it, you're asking a student to grade their own exam. True adversarial review requires a new terminal with fresh context and no prior commitments.
The audit must be independent. Referee 2 creates its own replication scripts but never touches the author's code. Only the author modifies the author's code. This separation ensures the audit is truly external.
Checklists beat intuition. The Referee 2 protocol works because it specifies exactly what to check, requires concrete deliverables (replication scripts, comparison tables, referee reports), and creates a paper trail.
If it's not documented, it didn't happen. Every audit produces a dated referee report filed in correspondence/. Every response is documented. Replication scripts are permanent artifacts. Future you (or your collaborators) can reconstruct exactly what happened.
Start with workflow.md to understand the philosophy.
Copy claude/CLAUDE.md to your project root. Fill in your project specifics.
Work with Claude as a thinking partner, not a code generator. Ask it to explain its understanding. Verify outputs visually. Document as you go.
When you have results worth checking:
- Open a new terminal (fresh context is essential)
- Paste the contents of
personas/referee2.md - Say: "Please audit and replicate the project at [path]. Primary language is [R/Stata/Python]."
- Respond to the referee report (fix or justify each concern)
- Iterate until verdict is Accept
For the Referee 2 workflow to function properly, your research projects should include:
your_project/
├── CLAUDE.md # Project context for Claude
├── correspondence/
│ └── referee2/
│ ├── 2026-02-01_round1_report.md # Detailed written report
│ ├── 2026-02-01_round1_deck.pdf # Visual presentation of findings
│ ├── 2026-02-02_round1_response.md # Author response
│ └── ...
├── code/
│ ├── R/ # Author's code (ONLY author modifies)
│ ├── stata/
│ ├── python/
│ └── replication/ # Referee 2's replication scripts
├── data/
│ ├── raw/
│ └── clean/
└── output/
├── tables/
└── figures/
Have improvements or additions? PRs welcome. I'm particularly interested in:
- Additional audit protocols (security reviewer, pedagogy reviewer, etc.)
- Examples showing the Referee 2 workflow catching real bugs
- Tools for other aspects of coding and teaching
Inspired by Boris Cherny's ChernyCode template for AI coding best practices.
Use freely. Attribution appreciated but not required.
Last updated: March 2026