# System Designs Overview


## Web Service System for Millions of Users
![Final Million User System](Resources/FinalWebSrvr.JPG)

(One Data Center Shown as Example)
* **Load Balaner** 
    * Sends traffic to correct data center based upon GEO DNS routing
    * Distributes traffic envenly to web servers within a data center
    * Routes data only healthy web servers or data centers
* **CDN - Content Delivery Network**
    * Geographically closest server serves content
    * Can cache and deliver based upon request path, cookines, query strings, and headers
* **Stateless Webs Server Architecture**
    * Session state data stored in persisten data store
    * Allows for scaling web tier
    * Session state store across all data centers to re-route traffic due to outage
* **Data Tier follows a Master-Slave Replication design with Sharding**
    * Master database receives all writes, updates, and deletes
    * All data is copied to slaves which receive all read requests
    * Provides high availability and reliability
    * Data is sharded across multiple Master-Slave constellations
* **Message Queue Follows Pub-Sub Pattern for Task Processing**
* **Logging and Metrics Performed**

## Rate Limiter Design

![Rate Limiter](Resources/RateLimiter.JPG)

* **Rate Limiter Middleware**
    * Assuming microservice design, could also use rate limiter functionality in API Gateway
    * Uses Leaking Bucket rate limiting Algorithm (See Below)
    * Requests that are rate limited receive a HTTP 429 response
    
### Rate Limiting Algorithms
* **Token Bucket**
    * Parameters: bucket capactiy of *N* tokens and tokens are added to bucket at rate *R* (how many tokens per unit of time)
    * Each request needs a token for processing
    * If no tokens available, request is dropped
    * Can either have a global bucket or a bucket for each API endpoint
* **Leaking Bucket**
    * Parameters: FIFO queue size *S* and outflow rate *R* (number of requests to process in a fixed time unit)
    * Requests are put on the queue if there is room and processed at a fixed rate
    * Otherwise request is dropped
* **Fixed Window Counter**
    * Parameters: time window size *S* and max counter of *C*
    * Timeline is divided into windows of size *S* and each window can process upto *C* requests, with each request incrementing the counter that is initialized at 0
    * When counter is full for a window, requests are dropped
    * Issue: Can exceed *C* if there is a spike in requests at the edges of windows
* **Sliding Window Log**
    * Parameters: max request count *C* per unit of time window *T* (e.g. 60 seconds lookback)
    * Request timestamp stored in a cache and all requests older than current timestamp minus time window *T* are removed
    * If log/cache size is less than max request count *C*, request is processed
    * Else, the request is dropped but kept in the cache/log
* **Sliding Window Counter**
    * Hybrid between Fixed Window Counter and Sliding Window Log algorithms
        * Fixed window length *FW*
        * Rolling window length *RW*
        * Max requests *C*
        * Number of requests for a rolling window calculated using:    
        REQUESTS_IN_CURR_WINDOW + REQUESTS_IN_PREV_WINDOW * OVERLAP_PERCENT_OF_ROLLING_WINDOW_AND_PREV_WINDOW
        * Example: If 3 requests in current window, 7 in previous window, and the rolling window has a 70% overlap with previous window $\rightarrow 3 + 7\times .70 = 6.5$ 
    * Can round up or down the calculated max requests in a rolling window
    * Reject or accept requests if number is greater than *C*