# System Design Summary

## Step-by-Step Guide

1. Clarify Functional Requirements (users / use cases / platform)
2. Clarify Non-functional Requirements (QPS, latency, availability, consistency, cost)
3. High-level Design (user, browser, load balancer, server, DBs)
4. Core Component Design
   1. Data Storage
       1. __What__ to store (aggregated data / raw data)
       2. __How__ to store (SQL vs. NoSQL)
   2. App Servers / Services (main services)
   3. Communication (RPC / REST / message queues / task queues)
   4. Load Balancing
   5. Caching (client cache, app cache, DB cache, CDN)
5. Scale & Secure (look for gaps, check back with requirements)

### 1. Clarify Functional Requirements

System design questions are usually not solvable within the time limit, so you must clarify which problem you are trying to solve.

- __Users__: Who is going to use the system? (E.g. all users, admins, ML system, monthly marketing reports, etc.)
- __Use cases__: How are the users going to use it? What is the input & output?
- __Platform__: Desktop, server-to-server, or mobile?

Next, take the use cases and turn these sentences into an API / services.

__Example__

_The system has to count video view events._

```python
countViewEvent(videoId) ->

countEvent(videoId, eventType) -> 

# Aggregation could be sum, avg, etc. E.g. processEvent(123, views, sum)
processEvent(videoId, eventType, aggregationMethod) ->

# E.g. processEvents(views<videoId, Stats>, sum)
processEvents(listOfEvents)
```

Likewise, we could do this for something like `getViews...getEventStats(view)...getStats(view, sum)...`.

### 2. Clarify Non-functional Requirements

How the system is supposed to be in terms of speed, security, and so on.

Normally the interviewer won't mention specific requirements. Instead they will say __high scale__ and __fast performance__. Because these two are hard to achieve at the same time, we will need to find tradeoffs.

- __Throughput__: How many requests/second (QPS)? What is the ratio of read/write requests? Can there be spikes?
- __Latency__: How fast must requests be served? Can we delay stat counting?
- __Availability__: What is acceptable downtime?
- __Consistency__: Since replicating data is slow, do you want to wait for data to become available, or would you rather show some data ASAP, even if it is out of date? What is the write-to-read data delay? E.g. For bank transactions we need things to be consistent, for Twitter posts it might not be a problem if there is a delay in some friends seeing it.
- __Cost__: What budget constraints do we have? (Can we build our own, do we have maintenance budget? If not we could consider some managed cloud solution.)

### 3. High-level Design

![title](img/sd1.jpg)

Now the interviewer may jump into any of these components. As interviewee, it will be good if you can drive the conversation.

### 4. Core Component Design

#### 4.1 Data Storage

##### 4.1.1 __What__ to store

Will you store aggregate data, or individual (raw) data?

If it's okay to have a delay, we could store raw data and process it in the background (batch data processing), or if we need real-time, we need a pipeline to continuously aggregate data (stream data processing). We can also combine both approaches at the cost of money and complexity.

You may also want to think of nouns in your service for which DBs / tables you need. Although remember relational DBs and NoSQL DBs will store things slightly differently (tables of relational data vs. rows of various data).

##### 4.1.2 __How__ to store (SQL vs. NoSQL)

###### SQL DBs

_Example: Cloud Spanner, BigQuery__

* A collection of items represented by tables and keys
* Support strict schemas
* Indexing allows fast lookups
* Supports atomic transactions
* Use denormalization to make reads faster but writes slower
* Joins across sharded DBs or tables can be slow

__Scaling__

* __Federation__: Scaling by having DB for each entity. E.g. User DB, Product DB, etc.
* __Sharding__: Distribute data across DBs based on, e.g., Users A-C, Users D-F, etc.

These approachs mean we can store more data in our DBs, but adds complexity, potential shard overloading, and more expensive joins.

##### NoSQL DBs

_Example: Bigtable_

* Semi-structured schemas, non-relational
* No need for joins, so easier to horizontally scale -- everything can be a JSON or value object
* Eventual consistency: Data is replicated so everything will eventually be up-to-date. So maybe okay for Twitter posts, but not great for bank transactions.

__Example__

```
Users
- alovelace
  - first: Ada
  - last: Lovelace
- sride
  - first: Sally
  - last: Ride
  
Rooms
- software
  - messages
    - message_1
      - from: alovelace
      - to: sride
      - message: ...
```

There are 4 main types of NoSQL DBs:

##### Key-value Stores

_Abstraction: Hash table_

Allows O(1) reads and writes. Maintains keys in lexicographic order for efficient retrieval of key values.

##### Document store

_Abstraction: Key-value store with documents as the values._

Stores documents as the values, where documents can be XML, JSON, binary, etc. Provide some API or query language to query based on the internal structure of the document. Examples: MongoDB, CouchDB.

##### Wide-column store

_Abstraction: nested map (ColumnFamily<RowKey, Columns<ColKey, Value>>

Basic data unit is a column (name/value pair). Columns are grouped into column families. Super column families group column families. You can access each row with a key, and then individual elements within the column in that row.
Examples: Google Bigtable, Cassandra.

![title](img/sd4.jpg)

##### Graph DB

Each node is a record and arcs represent relationships between two nodes. Not widely used.

##### Replication / Failover

* Master-slave
  * Master database writes to slaves
  * If master goes down, a slave can be promoted to master
  * Slaves can also be used for reads, since data is replicated to them
  
* Master-master
  * More write capacity, but can have write latency due to synchronization

#### 4.2 App Servers / Services

Now you have a bunch of servers serving your requests. You might want to have replication here.

#### 4.3 Communication

Decide how we will send our data:

* REST
* RPC calls

Think about human-readable vs. smaller/faster

__Blocking Systems__

In blocking systems, when a request comes in, we make a connection to the server via a socket, and that request is served by a single thread. It's easy to debug, but it means the thread gets blocked while processing and we can run out of connections.

__Non-blocking Systems__

Non-blocking systems don't block when a new request comes in. This can become hard to debug as a request is split over multiple threads.

__Buffering & Batching__

Instead of sending one event at a time, we should save our events in a buffer and send them in a batch together. This can save costs, be more efficient as we compress events, etc. However, it also adds complexity. For example, several events in the batch may fail, while some succeed.

__Message queues__

If the job is too slow to process inline, you can use a message queue as follows:

* An application puts the job in the queue and notifies the user of the stats
* A worker picks up the job, processes it, and signals when it is complete

The user is not blocked and the job is processed in the background. The client might also do some work to make it seem like it is completed. For example, a tweet could be posted to your timeline, even though it hasn't actually been delivered to all of your followers.

__Examples__

* RabbitMQ

Message queues can become backfilled or full, and the asynchronism adds extra complexity and processing.

#### 4.4 Load Balancing

__Software load balancing__: Can run on cloud or normal servers.

__Hardware load balancing__: Powerful, optimized to handle high throughput. Millions of requests per seconds.

Different load balancers can handle different types of traffic (TCP, HTTP, etc).

Different algorithms can be used to determine which server to send the request to:

* Hash-based (IP etc)
* Round robin
* Current load
* etc.

Load balancers can ping servers to see if they are healthy / current load.


__Replication__


* __Active-active__ replication: Send a heartbeat between the two load balancers
* If stop receiving a heartbeat, one is down and needs to be replaced

__DNS__

Translates domain names to IP addresses.
We register our partitioning service and associate it with the load balancer. The load balancer knows the IP addresses of the various servers running our service.

`Client (myservice.foo.com) -> DNS -> load balancer -> service servers`

__Sticky Sessions / Distributed Sessions__

* Using IP, cookie etc, always send user to the same server
+ No need to pass information between servers
+ Generally faster, since all data in one server
- Servers can become overloaded, eg. if we had on IP and many users come from similar IP
- If the server goes down, you have to shift all of the user session data to another server

##### Distributed Sessions

* Set up a shared cache like Redis that all servers can share.
+ Can load balance each request, rather than tying users/sessions to a server
+ No problem if server goes down -- session info stored in a distributed cache
- Slower, since we have to call over the network to the cache
- More complicated architecture

#### 4.5 Caching

* __Client caching__: Cache on OS or browser
* __CDN caching__
* __Database caching__
* __Application caching__: For example, memcache or redis

### 5. Scale & Secure

* Check through the above:
  - Is there any single point of failure?
  - Does it match our original throughput / latency requirements?
  - What are the tradeoffs?
  
Interviewer will likely start to question the design. This is where we can start to talk about the tradeoffs of your choices.

You may want to talk about load testing to verify the system is scalable and identify any bottle necks.

Stress testing could be used to identify where the system will break first -- for example, memory, DB, network, etc.

We could also add some health monitoring to make sure the system is always healthy.

We would also do testing to make sure it gives the correct results. We could build an audit system to do this. A weak audit system could run requests and make sure it gets the expected results. A strong audit system calculates stats using some totally different system (e.g. using MapReduce) and then check the results matches our system.

## Core Concepts

### Vertical Scaling

* Add more CPU, RAM, etc to a server
* Hit technological limits fast, and expensive

### Horizontal scaling

* Buy more, cheaper servers
* Route users to a particular server using a load balance

### Latency vs. Throughput

__Latency__: Time to perform some action

__Throughput__: The number of actions per unit of time

Generally we should aim for _maximal throughput_ with _acceptable latency_.

### CAP Theorem

CAP:

* __Consistency__: Every read receives the most recent write or error
* __Availability__: Every request receives a response, without the guarantee that is it the most recent
* __Partition Tolerance__: The system continues to operate despite arbitrary partitioning due to network failures

Since networks are by nature unreliable, we need to support partition tolerance. This leaves us with two options: consistency vs. availability.

#### CP: Consistency & Partition tolerance

Waiting for response might result in time out. But this is a good choice if you need atomic reads and writes.

#### AP: Availability & Partition tolerance

Responses return most readily available version of the data, but it might not be the latest. Writes might take some time to propogate. But this is a good choice when the system requires eventual consistency.

### Consistency Patterns

#### Weak Consistency

After a write, reads may or may not see it. A best effort approach.

Works well for things like VoIP, video chat, games. For example, if your connection lags during a call, you don't hear what was missed during the connection lag.

#### Eventual Consistency

After a write, reads will eventually see it. Data is replicated asynchronously. Seen in DNS and email. Works well in highly available systems.

#### Strong Consistency

After a write, reads see it. Data is replicated synchronously. Seen in file systems. Works well for systems that need transactions.


### Availability Patterns

#### Failover: 1. Active-passive

Heartbeats sent between active and passive server on standby. If the heartbeat is interrupted, the passive server takes over and resume service.

#### Failover: 2. Active-active

Both servers manage traffic.

#### Failure: Disadvantages

* Adds more hardware and additional complexity
* Potential data loss if active system fails before new data can be replicated


Q: How is DNS used in system design? Just for routing external traffic from our load balancer to our servers?

### CDN Recap

A globally distributed network of proxy servers, for serving content closer to the user.

Usually they server static content such as HTML, CSS, JS, media etc. Though some serve dynamic content.

Using a CNS means assets can be served by closer servers and your backend servers don't have to fulfil these requests.

#### Push CDNs

Receive new content whenever changes occur on your server. Your server has responsibility to push the content to the CDN server and rewrite URLs to point to the CDN.

Sites with a small amount / non-frequently changing content suit push CDNs.

#### Pull CDNs

Grab new content from your server when the first user requests the content. A TLL defines how long the content lives before it is refreshed.

This results in a slow initial request, but is good for sites with heavy traffic and frequently changing content.

#### CDN Disadvantages

* Extra cost
* Content could become stale on the CDN server
* Require static content URLs to point to CDN URLs

### Sticky sessions / Distributed sessions

#### Sticky Sessions

* Using IP, cookie etc, always send user to the same server
+ No need to pass information between servers
+ Generally faster, since all data in one server
- Servers can become overloaded, eg. if we had on IP and many users come from similar IP
- If the server goes down, you have to shift all of the user session data to another server

#### Distributed Sessions

* Set up a shared cache like Redis that all servers can share.
+ Can load balance each request, rather than tying users/sessions to a server
+ No problem if server goes down -- session info stored in a distributed cache
- Slower, since we have to call over the network to the cache
- More complicated architecture

### Push vs. Pull

In distributed systems, there are two basic systems (and they can be mixed)' Push and Pull.

#### Push Architectures

In a "push" architecture, a client requests work, the work is "pushed" to the server.

This is probably the most common pattern. REST, RPC, gRPC, etc. are all examples of "push" architecures.

```

Client -> Request -> Server
  ^                     |
  |    <- Response    <-
```

* Uses persistent connection for request and response (HTTP, RPC)
* Need to know where to send your request
* Load balancing can be an issue

#### Pull Architectures

Often referred to as polling.

The server requests work, usually not directly to the client, but through some intermediary.

Usually there is some kind of work queue clients enqueue messages on, and the server pulls messages from that queue.

```

Client -> Work -> Message queue <- Work request <- Server
                        |                           ^
                        ----------------------------| Queue delivers work
```

* Can distribute work to those that can process it
* Servers only request work if they have capacity
* Complex routing

## Design a REST API

### 1. Consider the elements we have (nouns)

E.g., users, messages, etc.

### 2. Think of the relationships between them (verbs)

This will give you the verbs you need.

Think in terms of CRUD operations:

* __Create__: POST
* __Read__: GET
* __Update__: PUT
* __Delete__: DELETE

### 3. Create the requests

These should be in the form `https://api-domain/version/collection/resource`

E.g.

* `/users/ GET` -- READS all users
* `/users/{ID} GET` -- READS a single user
* `/users/{ID} PUT`, Body: user data -- UPDATES a user
* `/users/{ID} POST`, Body: user data

For filtering, add parameters:

* `users/lastName=Smith GET`

For related resources:

* `users/{ID}/profile GET`

### 4. Create the responses

For responses, use JSON and HTTP status code:

```json

{
    "data": {
        "id": "123",
        "name": "John",
    }
}
```

| Code  | Use  | Example  |
|---    |---|---|
| 100   | Information  |   |
| 200   | Success  | OK  |
| 300   | Redirect  | Moved, temp redirect  |
| 400   | Client error  | Not found, unauthorized  |
| 500   | Server error  | Server error, unavailable  |

### Back of Envelope Calculations

[Useful ref](https://itsallbinary.com/system-design-back-of-envelop-calculations-for-storage-size-bandwidth-traffic-etc-estimates/)

* Use round numbers that will make it easy to calculate: 5 million, 3 billion, etc.


#### Sizes

| Type      | Size |
| ----------- | ----------- |
| Char      | 2 bytes       |
| Long/double   | 8 bytes        |
| Avg photo   | 200 KB        |
| Quality photo   | 2 MB       |
| Video in post   | 2 MB        |
| Streaming video   | 50 MB/min        |

__Examples__

140 chars * 2 bytes = 280 bytes per post.
200 chars per URL * 2 bytes = 400 bytes per URL

#### Calculations

| Size (x)  | Users (y)   | Unit (xy)  | Example  |
|---        |---          |---         |---       |
| x Bytes       | y Million          | xy __MB__           | 1 byte * 2 M users = 2 MB          |
| x Bytes       | y Billion          | xy __GB__           | 1 byte * 2 B users = 2 GB          |
| x KB          | y Million          | xy __GB__           | 1 KB * 2 M users = 2 GB            |
| x KB          | y Billion          | xy __TB__           | 1 KB * 2 B users = 2 TB            |
| x MB          | y Million          | xy __TB__           | 1 MB * 2 M users = 2 TB            |
| x MB          | y Billion          | xy __PB__           | 1 MB * 2 B users = 2 PB            |
| x GB          | y Million          | xy __1000 TB__      | 1 GB * 2 M users = 2000 TB         |
| x GB          | y Billion          | xy __1000 PB__      | 1 GB * 2 B users = 2000 PB         |

1000


Hint: To calculate, just keep dividing by 1000.

For example:

1 bytes * 2 M users = 2,000,000 bytes / 1000 = 2000 KB // 1000 = 2 MB.

#### Example

```
User:

  Name: 8 chars * 2 bytes = 16 bytes
  ID: 10 long * 8 = 80 bytes
  = 100 bytes (rounded up)
  
* 3B users = 300 GB

Check:

100,000,000,000 * 3 = 300,000,000,000 bytes / 1000 = 300,000,000 KB / 1000 = 300,000 MB / 1000 = 300 GB

```

(Consider adding speed / latency calculation estimates to this section too if we want to focus more on system design.)