CalledThird Research

Open-source baseball analytics research from CalledThird.com.

This repository contains the data pipelines, analysis scripts, and methodology documents behind CalledThird's published research. Each folder corresponds to one flagship article or analysis project.

Projects

📍 ABS Walk Spike — Round 1

Did the new 2026 ABS-defined zone cause the walk-rate spike? How much of it?

Article: ABS Took the High Strike — and That's Roughly 40-50% of the Walk Spike. Pitchers Own the Rest.
Prior position (now updated): The Walk Rate Spike: Umpires or Pitchers? — said "pitchers, not umpires"; this Round 1 piece is the honest update.
Data: 2025 vs 2026 Statcast pitch-by-pitch (Mar 27 – Apr 22 matched window), plus 2018–2025 April aggregates for the Z-score baseline
Approach: Dual-agent (Claude interpretability vs Codex ML) with adjudication round to resolve a 96-point counterfactual sign disagreement; third independent implementation as triangulation; both agents issued written publish-readiness reviews on the resolved synthesis
Key finding: The 2026 zone moved up — top edge shrank ~7-8pp called-strike rate, bottom expanded ~5-6pp. Roughly 40-50% of the +0.82pp walk spike is the zone change. Pitcher behavior accounts for the rest. Walk spike is +4.4σ above the 2018-2025 April distribution (not seasonality). 3-2 walk-rate delta is essentially zero (Cochran's Q p=0.67) — no count-leverage concentration.
Methodology note: The original Codex normalized-coordinate counterfactual returned −56.17% attribution; Claude's cross-review identified a Statcast schema artifact (2026 sz_top/sz_bot switched to deterministic per-batter ABS values), and the absolute-coord rerun flipped to +40.46%. The buggy orchestrator first attempt is included in scripts/ for transparency.

🧭 The Coaching Gap

Does pitcher predictability translate into hitter wOBA — and if so, which hitters actually extract the edge?

Article: The Coaching Gap That Lives Where Hitters Don't Chase
Live tracker: Explore → Coaching Gap (chase trajectories, live 2025→2026 scatter)
Data: ~2.9M pitches across 5 seasons (2022–2026); 371-pitcher pre-registered cohort; 659 completed hitter-season transitions
Approach: Six rounds of dual-agent (Claude + Codex) independent analysis with cross-review at every round; 17 hypotheses pre-registered with kill criteria
Key finding: 16 of 17 hypotheses died. The one survivor: low-chase hitters extract ~0.04 more wOBA on predictable pitches than chasers do (pooled/FE converged across both agents; replicates every season 2022–2026 including strict 2026 holdout). Mechanism validated at the per-hitter level — reducing overall chase by 1pp cuts chase on predictable bait by ~1pp too (Spearman +0.53 across 659 transitions).

🎯 Pitch Tunneling Atlas

League-wide pitch tunneling model measuring deception via trajectory physics.

Article: The Pitch Tunneling Atlas
Physics companion: The Physics Behind the Tunneling Atlas
Data: Full 2025 Statcast (739,820 pitches, 654 pitchers with 200+ pitches)
Approach: Dual-agent independent analysis with cross-review
Key finding: Plate separation adds +9.0% R² to whiff prediction beyond stuff; decision-point tightness adds +0.8% more (p=0.016). Both matter, but plate diversity dominates.

🥊 Bench-Clearing Incidents & Umpire Behavior

Do umpires change the zone after bench-clearing incidents?

Article: After a Fight, the Zone Gets Cleaner
Data: 7 Statcast-era incidents (2019–2026), umpire-specific follow-up games
Key finding: No significant zone-size change (p=0.302), but accuracy improves unanimously (+2.0pp, p=0.001). Umpires get more precise, not more aggressive.

🔥 The Fireman's Dilemma

How much of a reliever's inherited-runner outcome is entry situation vs individual skill?

Article: The Fireman's Dilemma
Data: 4,044 reliever entries, 6,516 inherited runners (2025), MLB play-by-play responsiblePitcher attribution
Key finding: Outs gradient dominates — 44% strand at 0 outs, 61% at 1 out, 82% at 2 outs. League strand rate 68.3%. Cross-season skill persistence is near-zero (r=0.098) but 2026 samples are thin.

⚾ The Schlittler Three-Fastball Blueprint

How a rookie's three distinct fastballs (sinker / four-seamer / cutter) complement each other.

Article: The Schlittler Three-Fastball Blueprint
Data: Schlittler's 2026 pitches via Statcast + Baseball Savant arsenal context
Key finding: The three fastballs occupy distinct horizontal movement bands, creating a deception grid where hitters can't sit on one shape.

🧠 The Count That Matters Most

Which counts deliver the highest RE288 value per ABS challenge, and how do pitchers vs hitters differ?

Article: The Count That Matters Most
Data: All 2026 ABS challenges with RE288 count-state linear weights
Key finding: Value per challenge swings by 3×+ across counts. Hitters and pitchers have different optimal challenge counts.

🥎 Catchers Are Better Challengers Than Hitters

Why catchers succeed more often on ABS challenges than hitters do.

Article: Catchers Are Better Challengers
Data: 2026 ABS challenges split by challenger role (pitcher / catcher / hitter)
Key finding: Catchers lead in success rate by a significant margin, consistent with framing-era ball/strike perception advantage.

🧮 Team Challenge IQ

Which teams challenge smartly (high success, normalized per game) vs which over-challenge?

Article: Twins vs Reds: A Tale of Two Challenge Strategies
Live tool: Explore → Team Strategy
Data: Every 2026 ABS challenge, normalized per game, joined with outcome
Key finding: Minnesota and Cincinnati anchor opposite ends of the challenge-efficiency spectrum with similar volume.

🎯 Do Pitchers Lose Their Command?

Within-outing plate-location scatter change from the first third to the last third of starts.

Article: Do Pitchers Lose Their Command?
Data: 4,892 true starts in 2025 (30+ pitches), 729,827 pitches total
Key finding: Population mean scatter is flat across pitch counts (r=0.007), but distribution is asymmetric — 14.0% blow up vs 5.2% tighten (2.7:1 ratio).

👨‍⚖️ CB Bucknor By The Numbers

A data profile of one of MLB's most-derided umpires vs the 82-umpire field.

Article: CB Bucknor By The Numbers
Data: 2025 umpire personality dataset (83 qualified umps) + 2026 nightly cache
Key finding: 3rd-worst accuracy (91.02%, p=0.0002 vs league) and #1 worst miss distance (1.34"). The miss magnitude, not the zone shape, is what makes Bucknor an outlier.

How This Works

CalledThird runs two independent AI research agents (Claude + Codex) on the same hypothesis and data for flagship projects. Each produces an analysis script, a report, and charts. Agents cross-review each other's work. A comparison memo synthesizes the results. The final published article uses the stronger methodology on each dimension.

This repo publishes the research scripts and methodology documents — not the Statcast data itself (available from pybaseball or Baseball Savant).

Reproducing a Project

Each project folder includes a README.md describing the question, data, and findings. Most folders also include:

A research brief or proposal (RESEARCH_BRIEF.md, RESEARCH_PROPOSAL.md)
One or more analysis scripts (analyze_*.py, compute_*.py, pull_*.py)
A findings memo (memo.md, findings.md, COMPARISON_MEMO.md)

To reproduce:

Install dependencies: pip install pybaseball pandas numpy scipy statsmodels matplotlib
Pull Statcast data as specified in the project's brief
Run the analysis script; it writes reports and charts to subdirectories

Methodology Notes

Statistical rigor: All claims include p-values, confidence intervals, and sample sizes
Physics transparency: Trajectory models validated against Statcast ground truth; limitations documented
Kill criteria: Every project specifies what result would NOT be publishable before analysis begins
Dual validation: Flagship findings require agreement across two independent agents

License

All research content is licensed under CC BY 4.0. You're welcome to reproduce, extend, or critique these analyses. Please cite:

CalledThird (2026). "[Article Title]." CalledThird.com. https://calledthird.com/analysis/[slug]

Contact

Site: calledthird.com
Twitter/X: @CalledThirdMLB
Bluesky: @calledthird.com
Data inquiries: hello@calledthird.com

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
abs-catcher-edge		abs-catcher-edge
abs-count-leverage		abs-count-leverage
abs-walk-spike		abs-walk-spike
bucknor-profile		bucknor-profile
coaching-gap		coaching-gap
firemans-dilemma		firemans-dilemma
pitch-tunneling-atlas		pitch-tunneling-atlas
pitcher-command		pitcher-command
schlittler-arsenal		schlittler-arsenal
soler-lopez-brawl		soler-lopez-brawl
team-challenge-iq		team-challenge-iq
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CalledThird Research

Projects

📍 ABS Walk Spike — Round 1

🧭 The Coaching Gap

🎯 Pitch Tunneling Atlas

🥊 Bench-Clearing Incidents & Umpire Behavior

🔥 The Fireman's Dilemma

⚾ The Schlittler Three-Fastball Blueprint

🧠 The Count That Matters Most

🥎 Catchers Are Better Challengers Than Hitters

🧮 Team Challenge IQ

🎯 Do Pitchers Lose Their Command?

👨‍⚖️ CB Bucknor By The Numbers

How This Works

Reproducing a Project

Methodology Notes

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CalledThird Research

Projects

📍 ABS Walk Spike — Round 1

🧭 The Coaching Gap

🎯 Pitch Tunneling Atlas

🥊 Bench-Clearing Incidents & Umpire Behavior

🔥 The Fireman's Dilemma

⚾ The Schlittler Three-Fastball Blueprint

🧠 The Count That Matters Most

🥎 Catchers Are Better Challengers Than Hitters

🧮 Team Challenge IQ

🎯 Do Pitchers Lose Their Command?

👨‍⚖️ CB Bucknor By The Numbers

How This Works

Reproducing a Project

Methodology Notes

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages