Skip to content

ryanlin10/ResearchHub

Repository files navigation

ResearchHub

GitHub for the Mathematical Sciences — a collaboration platform that eliminates researcher friction.

Researchers upload notes in LaTeX. ResearchHub automatically translates theorems, lemmas, and definitions into Lean 4, verifies proofs against Mathlib, and organizes everything into version-controlled repositories with inline annotations linking formal code back to its source, allowing research insights to be captured in git history.

Why This Matters

The researcher workflow is broken

Publishing a mathematical result today means writing LaTeX in isolation, emailing PDFs, and hoping reviewers catch errors. Formal verification exists but demands months of Lean expertise most researchers don't have. The gap between "paper accepted" and "proof machine-checked" is measured in years — if it happens at all.

ResearchHub closes that gap to minutes.

There's no more faffing around reading irrelevant arXiv paper, getting aired by collorators, and struggling finding the source material for a research area.

Contribution frequency drives discovery

Mathematical progress stalls when collaboration has too much overhead. ResearchHub reduces the cost of contributing a proof, a lemma, or even a single definition to a drag-and-drop upload. Lower friction means more frequent contributions. More contributions means faster iteration on open problems, shared proof libraries that compound over time, and a living record of how results evolve.

Every commit is high-value AI training data

The structured commit history of a ResearchHub repository — LaTeX source paired with its Lean 4 formalization, verification logs, inline annotations, and collaborative discussion threads — is exactly the kind of data that doesn't exist at scale today. Each contribution produces:

  • Paired natural-formal language data: LaTeX theorems aligned with their Lean 4 translations, ideal for training models that bridge informal and formal mathematics
  • Verification signals: Machine-checked proof outcomes (pass/fail, error diagnostics) that teach models what correct formalization looks like
  • Research reasoning traces: Chat discussions, annotation context, and iterative refinements captured in commit diffs — the "thinking" behind the math, not just the final result

This is not scraped web data. It is expert-produced, semantically rich with research intuition and insights, and structured by design.

What It Does

Upload LaTeX — Drop a .tex file into any repository. The parser extracts every theorem, lemma, definition, corollary, and proposition automatically.

Auto-translate to Lean 4 — Each extracted block is translated to Lean 4 with proper Mathlib imports, tactic-style proofs, and namespace isolation. Uses GPT for intelligent translation with fallback to deterministic scaffolding.

Verify with Lean 4 + Mathlib — Generated code is compiled by lake env lean against a full Mathlib installation. Real type-checking, real error diagnostics, real verification status.

Annotated code view — Monaco editor with Lean 4 syntax highlighting. Hover over any theorem to see the original LaTeX source, the contributor who uploaded it, and the file it came from.

Collaborative repos — Repositories with file trees, auto-generated READMEs, contribution tracking, and per-repo chat channels for discussing proofs.

Real-time messaging — Slack-style channels per repo and direct messages between researchers. Reference specific lines of code in discussions.

Architecture

LaTeX Upload
    │
    ▼
┌─────────────────┐
│  latex-parser    │  Extract theorem/lemma/definition/proof environments
└────────┬────────┘
         ▼
┌─────────────────┐
│ lean-translator  │  GPT-powered LaTeX → Lean 4 with Mathlib conventions
└────────┬────────┘
         ▼
┌─────────────────┐
│  lean-verifier   │  lake env lean (Lean 4 + Mathlib type-checking)
└────────┬────────┘
         ▼
┌─────────────────┐
│    pipeline      │  Save files, create annotations, generate README
└────────┬────────┘
         ▼
   Verified Repo

Tech Stack

Layer Technology
Framework Next.js 14 (App Router) + TypeScript
UI Tailwind CSS + shadcn/ui
Code Viewer Monaco Editor (Lean 4 Monarch tokenizer)
Database Prisma + SQLite
Auth Auth.js v5 (GitHub OAuth)
AI Translation OpenAI API
Proof Verification Lean 4 + Mathlib via Lake
File Upload react-dropzone
Chat Cursor-based polling (2s interval)

Getting Started

Prerequisites

  • Node.js 18+
  • elan (Lean 4 toolchain manager)

Setup

# Install dependencies
npm install

# Set up environment variables
cp .env .env.local
# Fill in AUTH_GITHUB_ID, AUTH_GITHUB_SECRET, OPENAI_API_KEY

# Initialize database
npx prisma db push
npm run db:seed

# Build Mathlib (first time only — downloads prebuilt oleans)
cd lean-verify && lake build && cd ..

# Start development server
npm run dev

Open http://localhost:3000.

Demo Flow

  1. Sign in with GitHub
  2. Browse seeded repositories (Prime Gaps, Spectral Graph Theory)
  3. Open a repo → click Lean files → hover over theorems to see LaTeX annotations
  4. Upload a .tex file → watch it parse, translate, and verify in real time
  5. Open repo chat → discuss proofs with collaborators

Project Structure

src/
├── app/                    # Next.js App Router pages + API routes
│   ├── api/
│   │   ├── upload/         # File upload endpoint
│   │   ├── channels/       # Chat channels + messages
│   │   ├── dm/             # Direct messaging
│   │   ├── annotations/    # Code annotations
│   │   └── repos/          # Repository CRUD
│   ├── dashboard/          # Repo listing
│   ├── repo/[slug]/        # Repo view + chat
│   ├── upload/             # Upload UI
│   └── messages/           # DM inbox
├── components/
│   ├── repo/               # CodeViewer, FileTree, RepoHeader
│   └── chat/               # MessageList, ChannelList, MessageInput
└── lib/
    ├── pipeline/
    │   ├── latex-parser.ts      # LaTeX environment extraction
    │   ├── lean-translator.ts   # LaTeX → Lean 4 translation
    │   ├── lean-verifier.ts     # Real Lean 4 verification
    │   ├── text-cleaner.ts      # LaTeX → Markdown
    │   ├── readme-generator.ts  # Auto-generate README
    │   └── pipeline.ts          # Orchestrator
    ├── prisma.ts
    └── auth.ts

lean-verify/                # Lake project with Mathlib dependency
prisma/
├── schema.prisma           # 10 models
└── seed.ts                 # Demo data

License

MIT

About

GitHub for Mathematical Sciences

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors