# Mastering Scalability in System Design: Consistency, Write-Heavy Workloads, Partitioning, and Real-World Applications

## Table of Contents
1. [Flowchart for Write Request Process](#flowchart-for-write-request-process)
2. [Consistency Problem](#consistency-problem)
3. [Acknowledgement Messages for Master-Slave Replication](#acknowledgement-messages-for-master-slave-replication)
4. [Replica Distribution in ML Models](#replica-distribution-in-ml-models)
5. [Multi-threaded Writing of Replica](#multi-threaded-writing-of-replica)
6. [Replica Distribution Among Slave Nodes](#replica-distribution-among-slave-nodes)
7. [Handling Write Heavy Cases](#handling-write-heavy-cases)
8. [Sharding and Hash Partitioning](#sharding-and-hash-partitioning)
9. [Handling Hot Spots](#handling-hot-spots)
10. [Benchmarking Server Capacity](#benchmarking-server-capacity)
11. [Like Counting Mechanism](#like-counting-mechanism)
12. [SQS vs Kafka](#sqs-vs-kafka)
13. [YouTube Video Upload Costs](#youtube-video-upload-costs)

---

## Flowchart for Write Request Process

```mermaid
flowchart TD
    A[User] --> B[Write Request]
    B --> C[API Gateway]
    C --> D[API Server]
    D --> E[Master DB]
    E --> F[Load Balancer]
    F --> G(Replica 1)
    F --> H(Replica 2)
    F --> I(Replica 3)
```
### Explanation:

- **User** sends a write request (e.g., editing a Wikipedia article).
- **API Gateway** interacts with the incoming requests and directs them to the appropriate service.
- **API Server** processes the incoming request and sends the data to the **Master Database (DB)**.
- The **Load Balancer** distributes the load to multiple **Slave Databases (DBs)** for read operations.

In context, when you're writing something on Wikipedia, this data is stored in the Master DB, and it's replicated in the slave machines for read operations, ensuring data redundancy and availability.

---

## Consistency Problem

### Scenario:

When a user writes an article on Wikipedia and shares its link immediately, the link may not work instantly. This happens because the backup has not been created yet, which results in a **Consistency Problem**.

**Definition:**
- **Consistency** in distributed systems means that all nodes see the same data at the same time. In this case, the inconsistency arises when the newly created data has not yet propagated to the replicas.

### Key Points:
- There is often a lag between the Master and Slave databases.
- This inconsistency creates a poor user experience when immediate feedback is expected, such as having friends not seeing a Facebook post shortly after writing it. 

---

## Acknowledgement Messages for Master-Slave Replication

### Question: Before sending the data from master to slave, if the master crashes, what to do?

This scenario is known as a **Data Integrity Issue**.

### Solution:
To mitigate this, we need an acknowledgment message confirming that the data has been written successfully to at least one Slave machine before accepting read requests.

### Explanation:
- This mechanism ensures that clients only read data that is known to be safely replicated, thus maintaining consistency.
- It's a trade-off between latency (speed of responses) and safety (ensuring correctness even during failures).

### Example:
- If a user writes an article that is particularly important, they can be informed of slight delays in availability while the system confirms the writing completion.

---

## Replica Distribution in ML Models

### Explanation:
Yes, machine learning models can also be replicated across multiple slave machines. 

**Reasons:**
- **Load Balancing**: Distributes inference requests across multiple replicas, preventing any single machine from becoming a bottleneck.
- **Fault Tolerance**: If one replica fails, others can still serve requests.
- **Latency Reduction**: Allows faster responses by bringing data closer to where it's needed geographically.

### Example:
In a real-world scenario, if a company is using a recommendation model for their e-commerce platform, they may want to replicate the model across several regions to improve response times.

---

## Multi-threaded Writing of Replica

### Question: Does writing the replica to a slave machine occur in a multi-threaded manner? 

**Explanation:**
Yes, writing replicas is typically managed using multi-threading for efficiency. 

### Details:
- Multi-threaded operations allow simultaneous writing to multiple slave databases, substantially enhancing throughput.
- In Python, this can be accomplished using libraries like `threading`or `concurrent.futures` for parallel task management.

```python
from concurrent.futures import ThreadPoolExecutor

def write_to_slave(slave_db, data):
    # Function to write data to a slave database
    pass

with ThreadPoolExecutor(max_workers=5) as executor:
    for slave in slave_dbs:
        executor.submit(write_to_slave, slave, data)
```

### Benefits:
- Improved performance, as multiple requests can be handled concurrently.

---

## Replica Distribution Among Slave Nodes

### Question: Does replication happen between slave to slave?

**Answer:**
Generally, the replication process is typically **Master-to-Slave** for consistency.

### Reasoning:
- Slave databases are primarily used for read operations and are updated from the Master. 

### Process:
1. The Master DB sends changes to the Slave DBs.
2. Some systems, however, can be set up to allow **Asynchronous Replication** between slaves to ensure that read operations remain efficient.

### Conclusion:
- While copying data from Slave to Slave can introduce complexity and potential inconsistency, it can also help distribute load among read replicas.

---

## Handling Write Heavy Cases

### Explanation:

In scenarios where there are frequent writing operations (like in order processing or high-frequency trading), handling the overwhelming number of write requests becomes critical.

**Flowchart for Write-Heavy Cases:**
```mermaid
flowchart TD
    A[User] --> B[Write Request]
    B --> C[API Gateway]
    C --> D[API Server]
    D --> E[Sharded DBs]
    E --> F[Load Balancer]
    F --> G(Replica 1)
    F --> H(Replica 2)
    F --> I(Replica 3)
```
### Description:
- Write requests go through an API Gateway to multiple API Servers and then get directed to Sharded Databases, where data is managed through vertical or horizontal partitioning.

### Sharding:
- **Definition**: Sharding is the process of splitting a dataset into smaller, more manageable parts called "shards".
- **Example**: For instance, you could shard a customer database alphabetically (A-C, D-M, etc.).

### Why Shard?
- To reduce the load on a single database and improve performance by distributing the data across multiple machines.

---

## Sharding and Hash Partitioning

### Explanation:

**Hash Partitioning**: This technique distributes data evenly across partitions by using a hash function.

### Example:
- If you have user IDs, you could assign each user to a partition based on the hash value of their user ID (modulus with the number of partitions).

### Benefits:
- Even distribution of requests and minimal hot-spot creation, allowing for smoother performance under load.

### Range vs. Hash Partitioning:
- **Range Partitioning**: Data is partitioned based on specific ranges (e.g., A-C, D-M). Risk of hot spots if many records fall within certain ranges.
- **Hash Partitioning**: Provides more even distribution, reducing the risk of hot spots.

---

## Handling Hot Spots

### Explanation:

**Hot spots** occur when a few database partitions receive a significant amount of reads/writes compared to others, which can lead to performance degradation.

### Strategy to Handle Hot Spots:
1. **Monitoring**: Use tools like Grafana and Prometheus to monitor user activity.
2. **Isolation**: Shift hot spot users to dedicated machines to manage load.

### Flowchart for Handling Hot Spots:
```mermaid
flowchart TD
    A[User Requests] --> B[Hot Spot Detection]
    B --> C[Move to Isolated DB]
    C --> D[Scale Up Replicas]
```
This flow allows you to manage hot spots efficiently by isolating users causing excessive load.

---

## Benchmarking Server Capacity

**Question: How to benchmark a server’s capacity for workload?**

### Explanation:
To determine how many instances can run on a server, follow these steps:
1. **Direct Testing**: Start instances incrementally while monitoring for CPU, memory, and disk usage.
2. **Load Testing Tools**: Use tools like Apache JMeter or locust.io to simulate load and observe point of failure.

### Solution:
- Whenever a new instance is needed, transitioning to cloud resources may be a more efficient way of scaling.

### Trade-off:
- **Cost vs. Performance**: High costs may not be justifiable if performance can be achieved at lower costs or different configurations.

---

## Like Counting Mechanism

### Question: Is it a good idea to count likes through API requests from the server?

### Explanation:
It's generally **not efficient** for server resources to count likes in real-time due to the ephemeral high volume of requests.

### Flowchart:
```mermaid
flowchart TD
    A[User Likes Video] --> B[API Gateway]
    B --> C[Rate Limiter]
    C --> D[Workers Queue]
    D --> E[Count Likes]
    E --> F[Update DB]
```

### Techniques Employed:
1. **Delegation**: Move the counting to worker nodes to handle asynchronously.
2. **Queue Systems**: Using Kafka queues for asynchrony, ensuring that likes do not overwhelm database operations.

---

## SQS vs Kafka

### Comparison:

| Feature                | SQS                   | Kafka                 |
|------------------------|-----------------------|-----------------------|
| Message Type           | One type              | Multiple types        |
| Persistence            | Non-persistent        | Persistent            |
| Ordering               | FIFO                   | Partially Ordered    |
| Throughput             | Limited                | High                  |
| Price                  | Low                    | Moderate               |

### Explanation:
- **SQS (Simple Queue Service)** is simpler and cheaper, suitable for basic queuing needs, while **Kafka** is more complex but provides more advanced functionalities, such as real-time analytics and message persistence.

---

## YouTube Video Upload Costs

### Flowchart for Video Upload:
```mermaid
flowchart TD
    A[User Uploads Video] --> B[Video Service]
    B --> C[Post Service]
    C --> D[S3 Storage]
    D --> E[Database]
```

### Cost Breakdown:

| Cost Type          | Description                                   |
|--------------------|-----------------------------------------------|
| Incoming Costs      | Costs to upload data to S3                   |
| Storage Costs       | Ongoing costs of storing video files         |
| Outgoing Costs      | Costs associated with video streaming         |

### Explanation:
As the streaming service grows, companies have to manage incoming, storage, and outgoing costs effectively. Efficiently managing these costs is critical for profitability as user interaction increases.

---

This concludes the detailed notes for the scaling problems in system design.