-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Context
Part of load balancer architecture work (see PR #228 for full analysis).
Risk: High | Reward: Very High | Code Changes: Extensive
This is the core architectural change that enables horizontal scaling.
Architecture Change
Uses config service for assignments → Deterministic sharding
Gateway StatefulSet (3-10 pods)
├── gateway-0 → handles guilds where hash(guildId) % 3 == 0
├── gateway-1 → handles guilds where hash(guildId) % 3 == 1
└── gateway-2 → handles guilds where hash(guildId) % 3 == 2
↓
Each pod: SQLite → Litestream → DO Spaces
Implementation
Shard Calculation
// app/helpers/guildSharding.ts
import crypto from 'crypto';
const POD_ORDINAL = parseInt(process.env.POD_ORDINAL || '0', 10);
const NUM_GATEWAY_PODS = parseInt(process.env.NUM_GATEWAY_PODS || '3', 10);
export function getShardForGuild(guildId: string): number {
const hash = crypto.createHash('md5').update(guildId).digest();
const num = hash.readUInt32BE(0);
return num % NUM_GATEWAY_PODS;
}
export function isGuildMine(guildId: string): boolean {
return getShardForGuild(guildId) === POD_ORDINAL;
}Guild Filtering
// app/discord/gateway.ts
import { isGuildMine } from './guildSharding';
client.on(Events.MessageCreate, async (msg) => {
if (!msg.guildId) return;
if (!isGuildMine(msg.guildId)) return; // Filter by shard
// Handle message
});Environment Variables
env:
- name: POD_ORDINAL
valueFrom:
fieldRef:
fieldPath: metadata.labels['apps.kubernetes.io/pod-index']
- name: NUM_GATEWAY_PODS
value: "3" # Update when scalingScaling Procedure
When changing NUM_GATEWAY_PODS:
- Update
NUM_GATEWAY_PODSenv var in all services - Scale StatefulSet:
kubectl scale statefulset gateway --replicas=N - Rolling restart HTTP pods to pick up new env
- Guilds automatically redistribute based on new modulo
Note: Some guilds will move to different pods. Their SQLite data needs migration or will be rebuilt.
Tasks
- Implement
getShardForGuild()/isGuildMine()functions - Add
POD_ORDINALandNUM_GATEWAY_PODSenv vars - Filter all Discord event handlers by shard
- Configure Litestream sidecar with per-pod backup paths
- Update
cluster/proposed/gateway-service.yaml(remove CONFIG_SERVICE_URL) - Test multi-pod gateway in staging
- Test scaling up/down procedure
- Test pod failure recovery
- Document scaling runbook
Dependencies
- Add Litestream backup sidecar to existing StatefulSet #229 (Litestream backup - validates backup strategy)
- Split monolith into smaller gateway/http services #232 (SERVICE_MODE support)
Migration Strategy
- Deploy new gateway StatefulSet with 3 pods
- Guilds auto-distribute based on hash
- Each pod builds its SQLite from Discord events (or restore from backup)
- Switch ingress to new HTTP service
- Decommission old StatefulSet
References
- PR Architecture analysis: Load balancing with SQLite constraint #228 (architecture analysis)
- Closed: Deploy PostgreSQL StatefulSet for config service #231, Build config service for guild-to-pod assignments #233 (PostgreSQL/config service - no longer needed)
Metadata
Metadata
Assignees
Labels
No labels