We need 2 microservices to handle the scenario in which around 40 million users try to redeem vouchers with the restrictions:
- Voucher redemption cannot exceed the voucher limit (i.e., voucher with 300 limit can only be used by 300 unique users)
- No downtime or disruption is expected.
- All operations must be atomic.
Service will listen through HTTP API for voucher redemption requests. Redeeming is done atomically to avoid inconsistency.
┌───SQL-Database──┐
│ Highly Available│
External Network Internal Network │ Replicated │
│ │ │
│ │ ┌────────┐ │
┌───────┐ │ ┌───Wallet──────────────────────────────┐ │ │ │ │
│Client │ │ │ │ │ ┌─┼──────┐ │ │
│ │ │ │ │ │ │ │ │ │ │
│ │ │ │ ┌────────────┬───Application────┐ │ │ ┌─┼─┼────┐ │ │ │
│ ◄──┼─REST─┼───► REST API │ │ │ ┌────────────► │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │ └────┼─┼─┘ │
└───────┘ │ │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ Monotiring ◄───┼──────TCP─────┘ │ │ └──────┼─┘ │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ Repository │ │ │ └────────┘ │
│ │ │ │ │ │ ┌─────────► │
│ │ └─────▲──────┴──────────────────┘ │ │ │ │
│ │ │ │ │ │ │
│ └─────────┼─────────────────────────────┘ ┌─TCP─────┘ └─────────────────┘
│ │ │
│ │ │ ┌─REDIS-Sentinel─┐
│ └─────REST─────────┐ │ │ │
│ │ │ │ ┌──────┐ │
│ ┌──Voucher───────────────────┼──────────┐ │ │ │ │ │
┌────────┐ │ │ │ │ │ │ ┌──┼───┐ │ │
│ Client │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ ┌─────────┬─────────────▼─────┐ │ │ │ ┌─┼──┼─┐ │ │ │
│ │ │ │ │ REST API│ Application ◄────┼───────┘ │ │ │ │ │ │ │ │
│ ◄──┼──────┼────► │ │ │ │ │ │ └─┼─┼──┘ │
│ │ │ │ │ │ ◄────┼────TCP──────────► │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ └────┼─┘ │
└────────┘ │ │ │ │ │ │ │ │ │ │
│ │ └─────────┴───────────────────┘ │ │ └──────┘ │
│ │ │ │ │
│ └───────────────────────────────────────┘ └────────────────┘
│
│
│
│
Redis is used as an in memory lock mechanism to handle high load key contentions. Redis setup is considered to have :
- High availability guarantees via Redis-Sentinel
- Using sentinel we can't have load sharding and data partitioning schemes hence, we can't scale redis cluster out easily. One possible solution is to use Redis cluster but for sake of simplicity we consider Sentinel.
- Resident in private network with reserved bandwidth
- Periodic disk persistence
Redis is a key part of the system and any failure in connection or cluster (network failure) will bring the whole system down.
CockraochDB is used as SQL database setup. CockroachDB is a cloud native, distributed DBMS with postgres-like schematics. Cockroach provides High-Availability and tolerance to multi-node failures.
Setting up an Highly-Available cluster of Postgresql is tricky, and it's maintenance would require extra effort. But any HA setup of postgresql is fully supported in application.
Voucher <-> Wallet communication is done via HTTP REST APIs. any failure in the connection will cause the whole system to go down.
Following technologies are used in application:
- HTTP Server: gofiber
- Database ORM: GORM
- Metrics: Prometheus
Both Voucher and Wallet applications are designed and implemented to scale per load. But Redis/SQL connection limits must be considered and tweaked to support autoscaling completely.
A Postman collection of all possible API calls are available in challenge.json.
- In Memory Cache Check
- Cache is updated periodically. Any Update in vouchers will need
CacheDuration
time to take effect across all instances.
- Cache is updated periodically. Any Update in vouchers will need
- Redis Check (Atomic)
- Each voucher is represented with a key in redis with
value=voucher_limit
- Each request is followed by a
Decr
redis command to check voucher availability. - Any failure in Redemption will trigger a revert
Incr
command to the corresponding key.
- Each voucher is represented with a key in redis with
- Apply voucher transaction to wallet service (failure from this stage forward will cause Revert)
- Add voucher redemption to Database.
we could skip using Redis as a temporary storage altogether. But we need another distributed lock mechanism to keep atomicity of record updates. SQL Database Transactions could be possible but this would add to both system complexity and latency.
If we can tolerate approximation in voucher usage times. we can use load balancing with replica syncing in any distributed key-value store.
Intra-service communication could be done using CMQ services (i.e., NATS, Kafka). This approach can help total system availability but can break data consistency in some scenarios.