GitHub for the Mathematical Sciences — a collaboration platform that eliminates researcher friction.
Researchers upload notes in LaTeX. ResearchHub automatically translates theorems, lemmas, and definitions into Lean 4, verifies proofs against Mathlib, and organizes everything into version-controlled repositories with inline annotations linking formal code back to its source, allowing research insights to be captured in git history.
Publishing a mathematical result today means writing LaTeX in isolation, emailing PDFs, and hoping reviewers catch errors. Formal verification exists but demands months of Lean expertise most researchers don't have. The gap between "paper accepted" and "proof machine-checked" is measured in years — if it happens at all.
ResearchHub closes that gap to minutes.
There's no more faffing around reading irrelevant arXiv paper, getting aired by collorators, and struggling finding the source material for a research area.
Mathematical progress stalls when collaboration has too much overhead. ResearchHub reduces the cost of contributing a proof, a lemma, or even a single definition to a drag-and-drop upload. Lower friction means more frequent contributions. More contributions means faster iteration on open problems, shared proof libraries that compound over time, and a living record of how results evolve.
The structured commit history of a ResearchHub repository — LaTeX source paired with its Lean 4 formalization, verification logs, inline annotations, and collaborative discussion threads — is exactly the kind of data that doesn't exist at scale today. Each contribution produces:
- Paired natural-formal language data: LaTeX theorems aligned with their Lean 4 translations, ideal for training models that bridge informal and formal mathematics
- Verification signals: Machine-checked proof outcomes (pass/fail, error diagnostics) that teach models what correct formalization looks like
- Research reasoning traces: Chat discussions, annotation context, and iterative refinements captured in commit diffs — the "thinking" behind the math, not just the final result
This is not scraped web data. It is expert-produced, semantically rich with research intuition and insights, and structured by design.
Upload LaTeX — Drop a .tex file into any repository. The parser extracts every theorem, lemma, definition, corollary, and proposition automatically.
Auto-translate to Lean 4 — Each extracted block is translated to Lean 4 with proper Mathlib imports, tactic-style proofs, and namespace isolation. Uses GPT for intelligent translation with fallback to deterministic scaffolding.
Verify with Lean 4 + Mathlib — Generated code is compiled by lake env lean against a full Mathlib installation. Real type-checking, real error diagnostics, real verification status.
Annotated code view — Monaco editor with Lean 4 syntax highlighting. Hover over any theorem to see the original LaTeX source, the contributor who uploaded it, and the file it came from.
Collaborative repos — Repositories with file trees, auto-generated READMEs, contribution tracking, and per-repo chat channels for discussing proofs.
Real-time messaging — Slack-style channels per repo and direct messages between researchers. Reference specific lines of code in discussions.
LaTeX Upload
│
▼
┌─────────────────┐
│ latex-parser │ Extract theorem/lemma/definition/proof environments
└────────┬────────┘
▼
┌─────────────────┐
│ lean-translator │ GPT-powered LaTeX → Lean 4 with Mathlib conventions
└────────┬────────┘
▼
┌─────────────────┐
│ lean-verifier │ lake env lean (Lean 4 + Mathlib type-checking)
└────────┬────────┘
▼
┌─────────────────┐
│ pipeline │ Save files, create annotations, generate README
└────────┬────────┘
▼
Verified Repo
| Layer | Technology |
|---|---|
| Framework | Next.js 14 (App Router) + TypeScript |
| UI | Tailwind CSS + shadcn/ui |
| Code Viewer | Monaco Editor (Lean 4 Monarch tokenizer) |
| Database | Prisma + SQLite |
| Auth | Auth.js v5 (GitHub OAuth) |
| AI Translation | OpenAI API |
| Proof Verification | Lean 4 + Mathlib via Lake |
| File Upload | react-dropzone |
| Chat | Cursor-based polling (2s interval) |
- Node.js 18+
- elan (Lean 4 toolchain manager)
# Install dependencies
npm install
# Set up environment variables
cp .env .env.local
# Fill in AUTH_GITHUB_ID, AUTH_GITHUB_SECRET, OPENAI_API_KEY
# Initialize database
npx prisma db push
npm run db:seed
# Build Mathlib (first time only — downloads prebuilt oleans)
cd lean-verify && lake build && cd ..
# Start development server
npm run devOpen http://localhost:3000.
- Sign in with GitHub
- Browse seeded repositories (Prime Gaps, Spectral Graph Theory)
- Open a repo → click Lean files → hover over theorems to see LaTeX annotations
- Upload a
.texfile → watch it parse, translate, and verify in real time - Open repo chat → discuss proofs with collaborators
src/
├── app/ # Next.js App Router pages + API routes
│ ├── api/
│ │ ├── upload/ # File upload endpoint
│ │ ├── channels/ # Chat channels + messages
│ │ ├── dm/ # Direct messaging
│ │ ├── annotations/ # Code annotations
│ │ └── repos/ # Repository CRUD
│ ├── dashboard/ # Repo listing
│ ├── repo/[slug]/ # Repo view + chat
│ ├── upload/ # Upload UI
│ └── messages/ # DM inbox
├── components/
│ ├── repo/ # CodeViewer, FileTree, RepoHeader
│ └── chat/ # MessageList, ChannelList, MessageInput
└── lib/
├── pipeline/
│ ├── latex-parser.ts # LaTeX environment extraction
│ ├── lean-translator.ts # LaTeX → Lean 4 translation
│ ├── lean-verifier.ts # Real Lean 4 verification
│ ├── text-cleaner.ts # LaTeX → Markdown
│ ├── readme-generator.ts # Auto-generate README
│ └── pipeline.ts # Orchestrator
├── prisma.ts
└── auth.ts
lean-verify/ # Lake project with Mathlib dependency
prisma/
├── schema.prisma # 10 models
└── seed.ts # Demo data
MIT