-
Notifications
You must be signed in to change notification settings - Fork 12
Distributed System Design Workshop
kimschles edited this page Mar 26, 2019
·
1 revision
SRECon 2019
- Requirements and Scaling
- ID SLIs and SLOs
- Example: 99%ile of queries returns valid result within 100ms
- Service Level Agreement (SLA) is the contract containing all relevant SLOs and the punishment if it's violated
- Scaling via Microservices
- Replace a monolith with distinct microservices
- If you put a load balancer in front of your microservices, it is easier to scale horizonally
- Other types of scaling:
- geographical (more physical locations)
- functional (different feature sets)
- Dealing with Loss and Failure
- Failure is not an option, it's a certainty
- Cloud providers offer some ready-made solutions; they handle some failures
- Decouple: spread responsibilities across multiple processes
- Avoid global changes: use a multi-tiered canary
- Spread risk: don't depend on one backend
- Degrade gracefully: keep serving if configs are corrupt or fail to push
- Achieving Reliability: run n + 2 geographically distributed
- n = deployment large enough to deal with standard load
- Why +2?
- Planned maintenance
- Unplanned maintenance
- Keeping State and Data
- Useful for consistency, performance and reliability
- Regardless of the amount of data, it all comes down to global consensus
- Find authoritative instances of other services (leader election)
- Always prefer stateless. It's easier.
- CAP: consistency, availability and partition resilience
- Networks are not reliable, but partitions are rare
- Hot data and hotspotting
- Some data is accessed more frequently than others
- Frequent access to the same data can cause servers to overload
- Capacity vs. performance cache. Capacity cache can be dangerous
- Should not form part of the request-per-second capacity guarantee for a service
- Non-abstract design
- For each microservice, consider:
- Disk I/O
- QPS
- Network bandwidth
- For each microservice, consider: