A fault-tolerant, collaborative whiteboard backed by a 3-replica RAFT consensus cluster — built with TypeScript (Gateway + Frontend) and Go (Replicas).
- Overview
- Architecture
- Tech Stack
- Project Structure
- RAFT Protocol Specification
- API Reference
- Getting Started
- Development Workflow
- Testing & Fault Injection
- Docker & Deployment
- Environment Variables
- Week-by-Week Milestones
- Submission Checklist
- Bonus Challenges
- Team
This project simulates a distributed system by implementing a collaborative drawing board where:
- Multiple browser clients draw simultaneously on a shared canvas
- A Gateway (TypeScript/Node.js) manages all WebSocket connections
- Three Replica nodes (Go) maintain a shared stroke log via a Mini-RAFT consensus protocol
- The system survives leader crashes, hot-reloads, and rolling replica restarts with zero downtime
This mirrors the architecture used in real-world systems:
| This Project | Real-World Equivalent |
|---|---|
| Replica cluster | etcd inside Kubernetes |
| Leader election | Kubernetes controller-manager |
| Gateway re-routing | AWS ALB with health-check failover |
| Hot-reload rolling restart | Blue-green / rolling deployment |
| Stroke log | Distributed append-only event log |
Browser 1 Browser 2 Browser 3
│ │ │
└────────────────┴────────────────┘
│ WebSocket
▼
┌──────────────────┐
│ Gateway │
│ (Port 8080) │
└─────────┬────────┘
│ HTTP RPC
┌──────────┼─────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│Replica 1 │ │Replica 2 │ │Replica 3 │
│ LEADER │ │ Follower │ │ Follower │
│ :9001 │ │ :9002 │ │ :9003 │
└──────────┘ └──────────┘ └──────────┘
Go · Fiber · air (hot-reload)
◄──────────────────────────────►
Shared Docker network
Flow for a single stroke:
Client draws → Gateway → Leader.AppendEntries
│
┌────────┴────────┐
▼ ▼
Replica 2 ACK Replica 3 ACK
│
Majority (2/3) reached
│
Leader commits → Gateway → Broadcast → All clients
| Layer | Language | Framework / Libraries |
|---|---|---|
| Frontend | TypeScript | Vite · HTML5 Canvas API · native WebSocket |
| Gateway | TypeScript | Node.js · Express · ws (WebSocket) · axios |
| Replicas | Go | Fiber v2 · air (hot-reload) · standard library |
| Containerization | — | Docker · docker-compose v3.8 |
| Testing | — | k6 (load) · curl / shell scripts (fault injection) |
.
├── docker-compose.yml
├── README.md
│
├── frontend/ # TypeScript · Vite
│ ├── src/
│ │ ├── canvas.ts # Drawing logic, stroke serialization
│ │ ├── socket.ts # WebSocket client, reconnect logic
│ │ └── main.ts # Entry point
│ ├── index.html
│ ├── package.json
│ └── tsconfig.json
│
├── gateway/ # TypeScript · Node.js
│ ├── src/
│ │ ├── server.ts # Express + ws WebSocket server
│ │ ├── leaderRouter.ts # Tracks current leader, re-routes on failover
│ │ └── broadcaster.ts # Pushes committed strokes to all clients
│ ├── package.json
│ └── tsconfig.json
│
├── replica1/ # Go · Fiber
│ ├── main.go
│ ├── raft/
│ │ ├── node.go # State machine: Follower / Candidate / Leader
│ │ ├── election.go # RequestVote logic, election timeout
│ │ ├── log.go # Append-only stroke log, commit index
│ │ └── replication.go # AppendEntries, heartbeat, sync-log
│ ├── handlers/
│ │ └── rpc.go # HTTP handlers for all RPC endpoints
│ ├── go.mod
│ └── .air.toml # air hot-reload config
│
├── replica2/ # Same structure as replica1
│ └── ...
│
├── replica3/ # Same structure as replica1
│ └── ...
│
└── logs/ # Captured failover event logs (for submission)
├── election-demo.log
├── failover-demo.log
└── sync-log-demo.log
| State | Behavior |
|---|---|
| Follower | Waits for heartbeats. Votes for candidates. Appends log entries from leader. |
| Candidate | Increments term, votes for self, sends RequestVote to all peers. |
| Leader | Sends heartbeats every 150ms. Replicates log entries. Commits on majority ACK. |
| Where | Why majority is needed |
|---|---|
| Election | A candidate needs ≥2/3 votes to become leader — prevents two leaders at the same time |
| Log commit | A leader needs ≥2/3 ACKs before marking a stroke as committed — prevents data loss if the leader crashes right after writing |
| Parameter | Value |
|---|---|
| Heartbeat interval | 150 ms |
| Election timeout | Random 500–800 ms (per node, re-randomized each election) |
| Majority quorum | ≥ 2 out of 3 nodes |
- Committed entries are never overwritten.
- A node seeing a higher term immediately reverts to Follower.
- Split votes trigger a new election after another random timeout.
- A restarted node always starts as Follower with an empty log and catches up via
/sync-log.
Restarted node (empty log, term=0)
│
│ receives AppendEntries from leader
│ prevLogIndex check FAILS
│
▼
Node responds with { success: false, logLength: 0 }
│
▼
Leader calls POST /sync-log on follower
sending all committed entries from index 0 onward
│
▼
Follower appends all entries, updates commitIndex
│
▼
Follower participates normally in future AppendEntries
All replica RPC endpoints are HTTP/JSON. The Gateway calls these internally.
Called by a Candidate during election.
Request:
{
"term": 3,
"candidateId": "replica2",
"lastLogIndex": 12,
"lastLogTerm": 2
}Response:
{
"term": 3,
"voteGranted": true
}Called by the Leader to replicate a log entry (or as a heartbeat when entries is empty).
Request:
{
"term": 3,
"leaderId": "replica1",
"prevLogIndex": 11,
"prevLogTerm": 2,
"entries": [
{
"index": 12,
"term": 3,
"stroke": {
"x0": 100, "y0": 200,
"x1": 150, "y1": 250,
"color": "#e63946",
"width": 3
}
}
],
"leaderCommit": 11
}Response:
{
"term": 3,
"success": true,
"logLength": 12
}Lightweight keep-alive from Leader to Followers (no log entries).
Request:
{
"term": 3,
"leaderId": "replica1"
}Response:
{
"term": 3,
"success": true
}Called by the Leader on a rejoining Follower to send all missing committed entries.
Request:
{
"fromIndex": 0,
"entries": [ "...all committed log entries from index 0 onward..." ]
}Response:
{
"success": true,
"syncedUpTo": 47
}Returns current node state. Used by the Gateway to discover the active leader.
Response:
{
"replicaId": "replica1",
"state": "leader",
"term": 3,
"commitIndex": 47,
"logLength": 48
}- Docker and docker-compose
- Node.js ≥ 18 (for local Gateway/Frontend development)
- Go ≥ 1.21 (for local Replica development)
# Clone the repo
git clone https://github.com/jeevan4476/MiniRaft
cd MiniRaft
# Start everything
docker-compose up --build
# Open the drawing board
open http://localhost:3000# Frontend (Vite dev server)
cd frontend && npm install && npm run dev
# Gateway
cd gateway && npm install && npm run dev
# Replica 1
cd replica1 && go mod download && airEach replica uses air for live-reload. Edit any .go file inside replica1/, replica2/, or replica3/ and the container automatically:
- Detects the file change (via bind mount)
- Gracefully shuts down the old instance
- Recompiles and restarts
- New instance joins the cluster as a Follower
- RAFT election runs — system stays live
# Watch a replica's logs while hot-reloading
docker-compose logs -f replica1The Gateway uses ts-node + nodemon. Same bind-mount behavior.
All replicas log key events to stdout in structured format:
[replica1] term=3 state=LEADER event=heartbeat_sent peers=2
[replica1] term=3 state=LEADER event=entry_committed index=47
[replica2] term=3 state=FOLLOWER event=vote_granted for=replica1
[replica3] term=4 state=CANDIDATE event=election_started timeout=612ms
[replica3] term=4 state=LEADER event=election_won votes=2
docker-compose stop replica1
# Watch replica2 or replica3 win the election
docker-compose logs -f replica2 replica3
# Bring it back — it should catch up automatically
docker-compose start replica1# Touch any Go file inside replica2 to trigger air reload
touch replica2/raft/node.go
docker-compose logs -f replica2# Open 5 browser tabs at http://localhost:3000 and draw simultaneously
# Or use k6:
k6 run tests/load-test.js- Open two browser tabs at
http://localhost:3000 - Draw several strokes in tab 1
- Run
docker-compose stop replica1to kill the leader - Immediately draw more strokes in tab 2
- Confirm both tabs show identical canvas state after failover completes (~1–2 seconds)
version: "3.8"
services:
frontend:
build: ./frontend
ports: ["3000:3000"]
volumes: ["./frontend/src:/app/src"]
gateway:
build: ./gateway
ports: ["8080:8080"]
volumes: ["./gateway/src:/app/src"]
environment:
- REPLICA_URLS=http://replica1:9001,http://replica2:9002,http://replica3:9003
depends_on: [replica1, replica2, replica3]
replica1:
build: ./replica1
ports: ["9001:9001"]
volumes: ["./replica1:/app"]
environment:
- REPLICA_ID=replica1
- PORT=9001
- PEERS=http://replica2:9002,http://replica3:9003
replica2:
build: ./replica2
ports: ["9002:9002"]
volumes: ["./replica2:/app"]
environment:
- REPLICA_ID=replica2
- PORT=9002
- PEERS=http://replica1:9001,http://replica3:9003
replica3:
build: ./replica3
ports: ["9003:9003"]
volumes: ["./replica3:/app"]
environment:
- REPLICA_ID=replica3
- PORT=9003
- PEERS=http://replica1:9001,http://replica2:9002
networks:
default:
name: miniraft-network| Variable | Service | Description |
|---|---|---|
REPLICA_ID |
Replica | Unique node identifier (replica1, replica2, replica3) |
PORT |
Replica | HTTP port for RPC endpoints |
PEERS |
Replica | Comma-separated URLs of all other replicas |
REPLICA_URLS |
Gateway | Comma-separated URLs of all replicas (for leader discovery) |
GATEWAY_PORT |
Gateway | WebSocket + HTTP port (default: 8080) |
ELECTION_TIMEOUT_MIN |
Replica | Minimum election timeout in ms (default: 500) |
ELECTION_TIMEOUT_MAX |
Replica | Maximum election timeout in ms (default: 800) |
HEARTBEAT_INTERVAL |
Replica | Leader heartbeat interval in ms (default: 150) |
- System diagram
- RAFT state transition diagram
- API spec for all 4 RPCs (VoteRequest, AppendEntries, Heartbeat, SyncLog)
-
docker-compose.ymldraft - Failure scenario list with expected system behavior per scenario
- Go: RAFT state machine (Follower / Candidate / Leader)
- Go:
RequestVote+AppendEntries+Heartbeathandlers - TypeScript: Gateway WebSocket server with leader routing
- TypeScript: Frontend canvas with stroke serialization over WebSocket
- End-to-end: single client draws, stroke appears on all connected tabs
- Go: Catch-up sync (
/sync-log) for restarted replicas - TypeScript: Gateway graceful failover (no client disconnects on leader change)
- Docker: Hot-reload triggers RAFT election, system stays live throughout
- Demo: kill leader mid-draw, canvas remains consistent across all clients
- Logs: captured election, failover, and sync-log events saved in
/logs
- Source code in
/gateway,/replica1,/replica2,/replica3,/frontend -
docker-compose.yml— full cluster starts withdocker-compose up --build -
/logs— at least 3 captured failover event logs - Architecture document (2–3 pages): cluster diagram, state transitions, API definition, failure-handling design
- Demo video (8–10 min):
- Multiple clients drawing simultaneously
- Leader killed → automatic failover shown
- Hot-reload of a replica → system stays live
- Canvas state consistent after restarts
- System under chaotic conditions (multiple rapid failures)
- Simulate a network partition (split-brain scenario with iptables rules)
- Add a 4th replica dynamically without downtime
- Implement undo/redo using log compensation entries
- Build a live dashboard showing leader, term, and log size per replica
- Deploy to AWS EC2 or Google Cloud VM
- The RAFT Paper — Ongaro & Ousterhout (2014)
- RAFT Visualization
- Go Fiber Docs
- air — Go live-reload
- ws — Node.js WebSocket library
- Vite — Frontend tooling
| Member | Primary Responsibility |
|---|---|
| TBD | Go Replicas — RAFT state machine & RPC handlers |
| TBD | TypeScript Gateway — WebSocket server & leader routing |
| TBD | TypeScript Frontend — Canvas, stroke serialization |
| TBD | DevOps — Docker, docker-compose, hot-reload, integration |
