* Who is going to use it
* How are they going to use?
* How many users?
* What does the system do?
* What are the inputs and outputs of the system?
* How much data do we expect to handle?
* How much request per second?
* Expected 
* Communicate, ask questions.
* Go broad first
* Estimate when needed
* State assumptions explicitly
* Scope the problem
    - Will user able to specify own tiny url
    - Or it is auto generated
    - Do we need to keep track of clicks on the url?
    - Is there any time out?
    - Manage user account and links
* Assumption should be reasonable
    - Only 100 user inquiry per day or available infinite memory, is not good. Max 1 million URL per day is good assumption. Which lead to total data storage requirement.
    - 10 minutes stale is ok? Not for activating tiny url, but yes for analytics of it.
* Some URL accessed rarely, some frequently. We can not access database every time.
* Joins are slow, avoid them. Denormalize the data means add redundant info in db to speed up the reads.
* Any part of system can fail, plan for failure
* Availability vs Reliability
    - Availability is % of time the system is operational. Reliability is function of the probability that system  is operational in given period of time.
* Breaking down a complex task into small chunks helps us handle the problem at a better pace and in a more actionable way
    - Pinterest is a highly scalable photo-sharing service:
        - features: user profile, upload photos, news feed, etc.
        - scaling out: horizontal scalability and micro services.
* Sketch Out High Level Design 
* Discuss individual components and how they interact in detail
* Back-of-the-envelope Calculation

#### Performance vs Scalability
* If you have a performance problem, your system is slow for a single user.
* If you have a scalability problem, your system is fast for a single user but slow under heavy load. architecture of their system limits their ability to grow their service.
* A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added.

#### Latency vs Throughput
* We need max throughput but in acceptable latency
* Latency is the time to perform some action or to produce some result.
* Throughput is the number of such actions or results per unit of time.




![](images/fail_over.PNG)

#### Availability vs Consistency
* In centralized system no network partition so we get AC
* In distributed system we have network partition so choose from C and A
    - System can be CP or AP
    - CP Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.
    - AP Responses return the most recent version of the data available on a node, which might not be the latest. Writes might take some time to propagate when the partition is resolved. AP is a good choice if the business needs allow for eventual consistency or when the system needs to continue working despite external errors.
#### BASE
* Basically Available
* Soft State
* Eventually consistent


* What are the different architectural pieces that can be used?
* How do these pieces work with each other?
* How can we best utilize these pieces: what are the right tradeoffs?

* Prepare something in non peak hours. Have backup for resiliency (Master - slave)
* Microservice architecture: All usecases are defined and nothing outside responsibilities are handled.
* Separation of responsibility : Decoupling. Delivery agent only deliver food(regardless which food is). Pizza shop uses deliver system.
* Log events/metrics
* Horizontal vs vertical scaling
    - Load balancer needed vs not needed
    - Resilient vs single point of failure
    - Network calls(RPC) vs inter process communication
    - Data consistency is issue vs consistent
    - Scales well vs hardware limit
* How to know server is dead?
    - THere is a extra thing which periodically sends heart beats to know servers are alive. If does not receive ack then that server is dead.
    - To avoid duplicates we can use load balancing with consistent hashing.
    - Message queue also useful here, it takes task, assign to server, if server is dead then assign it to other server. Ex. Rabit MQ, ZeroMQ, JMS

* Amazon.com has many ip address associated with it at DNS server, which can connects us to one of the many LB.

## Requirement Clarification
* Questions are open ended, with one or more correct answer
* Define the scope of your focus? Focus on 2-3 big parts, not all
* For twitter
    - will user able to tweet and follow other people?
    - User's timeline
    - Will tweet has photos and videos?
    - Only backend ? or front end too?
    - Will user able to search tweet?
    - Hot trending tweets?
    - Push notification feature?
    
## System interface notification
* Define the API we expect from system. It will define exact contract from the system

```
postTweet(user_id, tweet_data, tweet_location, user_location, timestamp, …) 
generateTimeline(user_id, current_time, user_location, …)  
markTweetFavorite(user_id, tweet_id, timestamp, …)  
```

### Estimate scale of the system
* Helps to focus on scaling, partitioning, load balancing, caching
* What scale is expected from system? # of new tweets, # of tweet views, # of timeline generation per sec
* How much storage we need?
* What network bandwidth we expecting?

### Define data model
* Clarify how data flow among different component of the system. Later it will guide for data partitioning and management
* Identify entity of the system, interaction between them. Storage, transportation, encryption etc.

* User : UserId, Name, Email, DOB, creationData, LastLogin
* Tweet : tweetId, content, TweetLocation, NumberOfLikes, TimeStamp
* UserFollow : UserID1, UserID2
* FavoriteTweet: UserID, TweetID, TimeStamp

### High Level Design
* Draw block diagram with 5-6 boxes representing the core component of our system.
* Multiple application server to serve all read/write request with load balancer sitting in front of them for traffic distribution. If we have lots of read traffic than write, have different server
* On backend have database that can store all of the tweets and can support huge number of reads. Distributed file storage to store Photos and Videos
![](images/High_level_design.PNG)

### Detailed Design
* Take deep dive in 2 or 3 components
* Present different approach, pros and cons.
* Since we are storing massive data, how should we partition our data to distribute it to multiple database? Should we try to store all the data of a user on the same database? What issue could it cause?
* How will we handle hot users who tweet a lot or follow lots of people?
* Since users’ timeline will contain the most recent (and relevant) tweets, should we try to store our data in such a way that is optimized for scanning the latest tweets?
* How much and at which layer should we introduce cache to speed things up?
* What components need better load balancing?

### Identifying and resolving bottlenecks
* Is there any single point of failure in our system? What are we doing to mitigate it?
* Do we have enough replicas of the data so that if we lose a few servers we can still serve our users?
* Similarly, do we have enough copies of different services running such that a few failures will not cause total system shutdown?
* How are we monitoring the performance of our service? Do we get alerts whenever critical components fail or their performance degrades?

### Scalability
* Capability of system, process, network to grow and manage increased demand.
* Any distributed system that can continuously evolve in order to support the growing amount of work is considered to be scalable.
* Increased data volume, increased amount of work. Ex. # of transaction.
* Scalable system should be scaled without performance loss.
* Some task which are atomic cannot be distributed which can slow down the performance. Scalable architecture prevents it.
* Horizontal scaling
    - Add more resources in pool of resources
    - Casssandra, MongoDB
    - Scaling horizontally introduces complexity and involves cloning servers
        - Servers should be stateless: they should not contain any user-related data like sessions or profile pictures
        - Sessions can be stored in a centralized data store such as a database (SQL, NoSQL) or a persistent cache (Redis, Memcached)
    - Downstream servers such as caches and databases need to handle more simultaneous connections as upstream servers scale out
* Vertical Scaling
    - Scale by adding more power (upgrade CPU, add more RAM, add HDD)
    - Comes with upper limit
    - Can involve downtown when we upgrade
    - MySQL

### Reliability
* Probability the system will fail in given period of time. Reliable system keeps giving a service even if some SW or HW components fails.
* In Amazon, user transaction can not be failed if machine handling the transaction not working anymore.
* Redundancy of both SW and HW component is useful to achieve reliability. Also data are replicated.
* If the server carrying the user’s shopping cart fails, another server that has the exact replica of the shopping cart should replace it.
* Eliminates single point of failure

### Availability
* It is a simple measure of the percentage of time that a system, service, or a machine remains operational under normal conditions. 
* Availability takes into account maintainability, repair time, spares availability, and other logistics considerations.
* If a system is reliable, it is available. However, if it is available, it is not necessarily reliable. high reliability contributes to high availability, but it is possible to achieve a high availability even with an unreliable product by minimizing repair time and ensuring that spares are always available when they are needed.

### Efficiency
* response time (or latency) that denotes the delay to obtain the first item and the throughput (or bandwidth) which denotes the number of items delivered in a given time unit (e.g., a second). 

### Serviceability or Manageability
* How easy it can operate or maintain.
* simplicity and speed with which a system can be repaired or maintained; if the time to fix a failed system increases, then availability will decrease.
* ease of diagnosing and understanding problems when they occur, ease of making updates or modifications, and how simple the system is to operate

## Load Balancing
* Load balancing is the process of spreading requests across multiple resources according to some metric
* Helps to spread the traffic across a cluster of servers to improve responsiveness and availability of applications, websites or databases.   LB also keeps track of the status of all the resources while distributing requests. If a server is not available to take new requests or is not responding or has elevated error rate, LB will stop sending traffic to such a server.
* load balancer sits between the client and the server accepting incoming network and application traffic.
* To utilize full scalability and redundancy, we can try to balance the load at each layer of the system.
![](images/load_balancer3.PNG)
* every server contains exactly the same codebase and does not store any user-related data, like sessions or profile pictures, on local disc or memory. Sessions need to be stored in a centralized data store which is accessible to all your application servers.  Such external HW to store session, WHat is it fails?
    - RAID: Redundant array of independent disk
    - RAID0 (writing some data to disk 1, some to disk2 kind more performance), RAID1 (Duplication of data in both disk. If one die then other is there), RAID2, RAID10 (4 disk 2 for redundency 2 for striping), RAID5 (only 1 is for redundency, other 4 stores partial data), RAID6(2 disk die, still we are good)
    - What is someone strip power from all RAID? Even redundant power supply is gone?
        - Shared storage (File server, )
* Using Amazon Machine image (AMI) we can start new instance/clone fast.
* ELB(elastic load balancer) by amazon, HAProxy, LVS(Linux virtual server) are software LB.

* Users experience faster, uninterrupted service. Users won’t have to wait for a single struggling server to finish its previous tasks. Instead, their requests are immediately passed on to a more readily available resource.
* Service providers experience less downtime and higher throughput. Even a full server failure won’t affect the end user experience as the load balancer will simply route around it to a healthy server.
* Smart load balancers provide benefits like predictive analytics that determine traffic bottlenecks before they happen. As a result, the smart load balancer gives an organization actionable insights. These are key to automation and can help drive business decisions.

### Load balancing Algo
* First check which servers are up and healthy (regularly attempt to connect to backend servers to ensure that servers are listening.). From healthy nodes, it will choose server as per algo.
* Least connection method:
    - Directs traffic to server with fewest active connections.
    - This approach is quite useful when there are a large number of persistent client connections which are unevenly distributed between the servers.
* Least Response time method:
    - directs traffic to the server with the fewest active connections and the lowest average response time.
* Least Bandwidth method:
    - selects the server that is currently serving the least amount of traffic measured in megabits per second (Mbps).
* Round Robin method:
    -  cycles through a list of servers and sends each new request to the next server. When it reaches the end of the list, it starts over at the beginning. 
    - most useful when the servers are of equal specification and there are not many persistent connections.
* Weighted Round Robin:
    - designed to better handle servers with different processing capacities. Each server is assigned a weight (an integer value that indicates the processing capacity).
    - servers with higher weights get more connections than those with less weights.
* IP Hash
    - hash of the IP address of the client is calculated to redirect the request to a server.
* The load balancer can be a single point of failure; to overcome this, a second load balancer can be connected to the first to form a cluster. Each LB monitors the health of the other and, since both of them are equally capable of serving traffic and failure detection, in the event the main load balancer fails, the second load balancer takes over.

* Both horizontal scalability and redundancy are usually achieved via load balancing.
* A listener is a process that checks for connection requests. It is configured with a protocol and a port for front-end (client to load balancer) connections, and a protocol and a port for back-end (load balancer to back-end instance) connections.
* Elastic Load Balancing supports the following protocols:
    - HTTP
    - HTTPS (secure HTTP) (The HTTPS protocol uses the SSL protocol to establish secure connections over the HTTP layer)
    - TCP
    - SSL (secure TCP) (establish secure connections over the TCP layer.)
* If the front-end connection uses TCP or SSL, then your back-end connections can use either TCP or SSL. If the front-end connection uses HTTP or HTTPS, then your back-end connections can use either HTTP or HTTPS.
* The Secure Sockets Layer (SSL) protocol is primarily used to encrypt confidential data over insecure networks such as the Internet. The SSL protocol establishes a secure connection between a client and the back-end server, and ensures that all the data passed between your client and your server is private and integral.
* TCP/SSL protocol(Layer 4)
    - Because load balancers intercept traffic between clients and your back-end instances, the access logs for your back-end instance contain the IP address of the load balancer instead of the originating client. You can enable Proxy Protocol, which adds a header with the connection information of the client, such as the source IP address, destination IP address, and port numbers. The header is then sent to the back-end instance as a part of the request. You can parse the first line in the request to retrieve the connection information.
* HTTP/HTTPS protocol
    - When you use HTTP (layer 7) for both front-end and back-end connections, your load balancer parses the headers in the request and terminates the connection before sending the request to the back-end instances. For every registered and healthy instance behind an HTTP/HTTPS load balancer, Elastic Load Balancing opens and maintains one or more TCP connections. These connections ensure that there is always an established connection ready to receive HTTP/HTTPS requests. When you use HTTP/HTTPS, you can enable sticky sessions on your load balancer. A sticky session binds a user's session to a specific back-end instance. This ensures that all requests coming from the user during the session are sent to the same back-end instance
* SSL Server Certificates
    - If you use HTTPS or SSL for your front-end connections, you must deploy an X.509 certificate (SSL server certificate) on your load balancer. The load balancer decrypts requests from clients before sending them to the back-end instances (known as SSL termination). 
* Back-End Server Authentication
    - If you use HTTPS or SSL for your back-end connections, you can enable authentication of your registered instances. You can then use the authentication process to ensure that the instances accept only encrypted communication, and to ensure that each registered instance has the correct public key.
#### Load balancer and SSL
* Secure Socket Layer (SSL) is the standard security technology for establishing an encrypted link between a web server and a browser.  When a load balancer decrypts traffic before passing the request on, it is called SSL termination. The load balancer saves the web servers from having to expend the extra CPU cycles required for decryption. This improves application performance. The traffic between the load balancers and the web servers is no longer encrypted. This can expose the application to possible attack.

#### Software LB Vs Hardware LB
* Flexibility to adjust changing needs
* Ability to scale beyond capacity by adding more sw instances
* Cheaper
* Provide LB in cloud.

#### Load balancer in Network layer
* It happens in Transport, Session, presentation and application layer
* L4 - Directs traffic based on data from network and transport layer protocol. Ex. source/destination IP address, TCP port in the header, But does not use content of the packet. load balancer’s IP address is the one advertised to clients for a web site or service via DNS. As a result, clients record the load balancer’s address as the destination IP address in their requests. it also performs Network Address Translation (NAT) on the request packet, changing the recorded destination IP address from its own to that of the content server it has chosen on the internal network. Similarly, before forwarding server responses to clients, the load balancer changes the source address recorded in the packet header from the server’s IP address to its own.  A Layer 4 load balancer is often a dedicated hardware device supplied by a vendor and runs proprietary load-balancing software, and the NAT operations might be performed by specialized chips rather than in software. Popular when commodity hw was not powerful as it requires less computation than L7 LB.  Transmission Control Protocol (TCP) is the Layer 4 protocol for Hypertext Transfer Protocol (HTTP) traffic on the Internet. They can make limited routing decisions by inspecting the first few packets in the TCP stream. Actually it used Level 3 and 4
* L7: Adds content switching to LB. Routing decision based on HTTP header, URI, SSL session ID and HTML form data. Layer 7 load balancers look at the application layer(HTTP is the dominant protocol at this layer) to decide how to distribute requests. This can involve contents of the header, message,such as the URL, the type of data (text, video, graphics), and cookies. Layer 7 load balancers terminates network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers. Because a Layer 7 load balancer can determine what type of data (video, text, and so on) a client is requesting, you don’t have to duplicate the same data on all of the load-balanced servers. NGINX Plus and the open source NGINX software, generally operate at Layer 7 and serve as full reverse proxies.  It then makes a new TCP connection to the selected upstream server (or reuses an existing one, by means of HTTP keepalives) and writes the request to the server. Layer 7 load balancing is more CPU‑intensive than packet‑based Layer 4 load balancing, but rarely causes degraded performance on a modern server. A device that performs Layer 7 load balancing is often referred to as a reverse‑proxy server. A user visits a high‑traffic website. Over the course of the user’s session, he or she might request static content such as images or video, dynamic content such as a news feed, and even transactional information such as order status. Layer 7 load balancing allows the load balancer to route a request based on information in the request itself, such as what kind of content is being requested. So now a request for an image or video can be routed to the servers that store it and are highly optimized to serve up multimedia content. Requests for transactional information such as a discounted price can be routed to the application server responsible for managing pricing. Actually it uses level 5 to 7.
* UDP load balancing is often used for live broadcasts and online games when speed is important and there is little need for error correction. UDP has low latency because it does not provide time-consuming health checks.
*  TCP load balancing provides a reliable and error-checked stream of packets to IP addresses, which can otherwise easily be lost or corrupted.
* Geographic load balancing redistributes application traffic across data centers in different locations for maximum efficiency and security. While local load balancing happens within a single data center, geographic load balancing uses multiple data centers in many locations.
* what is a smart client? It is a client which takes a pool of service hosts and balances load across them, detects downed hosts and avoids sending requests their way (they also have to detect recovered hosts, deal with adding new hosts
* HAProxy is an example of SW LB.
* Load balancers can be scaled out by point DNS to multiple IP addresses and using DNS Round Robin for IP address lookup. Other option is to front another load balancer which distributes load to next level load balancers. Adding multiple Load balancers is rare as a single box running nginx or HAProxy can handle more than 20K concurrent connections per box compared to web application boxes which can handle few thousand concurrent requests. So a single load balancer box can handle several web application boxes.
* The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.
* Introducing a load balancer to help eliminate single points of failure results in increased complexity.
* A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.

## Caching
* Caching is storing result of an operation which can be used later instead of repeating the operation again.
* Save network load
* Avoid repeated computations
* Reduces DB loads
* Caching improves page load times and can reduce the load on your servers and databases. In this model, the dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.

* Databases often benefit from a uniform distribution of reads and writes across its partitions. Popular items can skew the distribution, causing bottlenecks. Putting a cache in front of a database can help absorb uneven loads and spikes in traffic.
* Need to maintain consistency between caches and the source of truth such as the database through cache invalidation.

#### Cache-aside
* The application is responsible for reading and writing from storage. The cache does not interact with storage directly. The application does the following:
    - Look for entry in cache, resulting in a cache miss
    - Load entry from the database
    - Add entry to cache
    - Return entry
```
def get_user(self, user_id):
    user = cache.get("user.{0}", user_id)
    if user is None:
        user = db.query("SELECT * FROM users WHERE user_id = {0}", user_id)
        if user is not None:
            key = "user.{0}".format(user_id)
            cache.set(key, json.dumps(user))
    return user
```
* Memcached is used in this order
*  Cache-aside is also referred to as lazy loading. Only requested data is cached, which avoids filling up the cache with data that isn't requested.
* Each cache miss results in three trips, which can cause a noticeable delay.
* Data can become stale if it is updated in the database. This issue is mitigated by setting a time-to-live (TTL) which forces an update of the cache entry, or by using write-through.
* When a node fails, it is replaced by a new, empty node, increasing latency.

#### Write through
* The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database:
    * Application adds/updates entry in cache
    * Cache synchronously writes entry to data store
    * Return
    
* Application code
```
set_user(12345, {"foo":"bar"})
```
* Cache Code
```
def set_user(user_id, values):
    user = db.query("UPDATE Users WHERE id = {0}", user_id, values)
    cache.set(user_id, user)
```
* Write-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast. Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale.
* When a new node is created due to failure or scaling, the new node will not cache entries until the entry is updated in the database. Cache-aside in conjunction with write through can mitigate this issue.
* Most data written might never be read, which can be minimized with a TTL.

#### Write-behind
* In write-behind, the application does the following:
    - Add/update entry in cache
    - Asynchronously write entry to the data store, improving write performance
* There could be data loss if the cache goes down prior to its contents hitting the data store.
* It is more complex to implement write-behind than it is to implement cache-aside or write-through.

#### Refresh ahead
* You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.
* Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future.
* Not accurately predicting which items are likely to be needed in the future can result in reduced performance than without refresh-ahead.

### Caching at the database query level
* Whenever you query the database, hash the query as a key and store the result to the cache. This approach suffers from expiration issues:
    - Hard to delete a cached result with complex queries
    - If one piece of data changes such as a table cell, you need to delete all cached queries that might include the changed cell
### Web server caching
* Reverse proxies and caches such as Varnish can serve static and dynamic content directly. Web servers can also cache requests, returning responses without having to contact application servers.

### Database Caching
* Your database usually includes some level of caching in a default configuration, optimized for a generic use case. Tweaking these settings for specific usage patterns can further boost performance.

## Application Server cache
* Placing a cache directly on a request layer node enables the local storage of response data.
* node will quickly return local cached data if it exists. If it is not in the cache, the requesting node will query the data from disk or network storage.
* If the request layer is expanded to multiple nodes, it’s still quite possible to have each node host its own cache. However, if your load balancer randomly distributes requests across the nodes, the same request will go to different nodes, thus increasing cache misses, use distributed cache or global cache
* In distributed cache, each node has own part of cached data. Cache is divided using consistent hashing. So request node know where to look.
* In global cache, all nodes use same single cache space.
* In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. 
* Redis has the following additional features:
    - Persistence option
    - Built-in data structures such as sorted sets and lists
 
### Content Distribution Network (CDN)
* globally distributed network of proxy servers, serving content from locations closer to the user. The site's DNS resolution will tell clients which server to contact.
* Users receive content at data centers close to them
* Your servers do not have to serve requests that the CDN fulfills
* Useful when there are large amount of static media  HTML/CSS/JS, photos, and videos are served from CDN,
* Improve static component of the website by distributing on number of server around the world.
* a request will first ask the CDN for a piece of static media; the CDN will serve that content if it has it locally available. If it isn’t available, the CDN will query the back-end servers for the file, cache it locally, and serve it to the requesting user.
* CDNs take the burden of serving static media off of your application servers (which are typically optimzed for serving dynamic pages rather than static media),
* Write-through cache: Under this scheme, data is written into the cache and the corresponding database at the same time. The cached data allows for fast retrieval and, since the same data gets written in the permanent storage, we will have complete data consistency between the cache and the storage. Also, this scheme ensures that nothing will get lost in case of a crash, power failure, or other system disruptions. Higher latency for write operations.
![](images/write_through.PNG)
* Write-around cache: but data is written directly to permanent storage, bypassing the cache. This can reduce the cache being flooded with write operations that will not subsequently be re-read, but has the disadvantage that a read request for recently written data will create a “cache miss” and must be read from slower back-end storage and experience higher latency.
![](images/write_around.PNG)
* Write-back cache: Under this scheme, data is written to cache alone and completion is immediately confirmed to the client. The write to the permanent storage is done after specified intervals or under certain conditions. This results in low latency and high throughput for write-intensive applications, however, this speed comes with the risk of data loss in case of a crash or other adverse event because the only copy of the written data is in the cache.
* Cache hit ratio is the number of cache hits divided by total number of requests for an operation. For efficient utilisation of memory, cache hit ration should be high.
* If not critical data wait till bunch if entries updated in cache and then in bunch write to the db.
* Content might be stale if it is updated before the TTL expires it.
* CDNs require changing URLs for static content to point to the CDN.
* Pull CDN
    - let’s have your hosting server be in Boston. You’ve just published your latest travel blog post and your biggest fan in Japan wants to read it. With a pull CDN, the very first time she does, the content isn’t on the CDN. During this first request, the CDN “pulls” the images and so forth to CDN server nearest your Japanese fan. That could be Tokyo or Hong Kong, whichever it is, the very first time the CDN has to pull the post, meaning your server and reader won’t see any gain in speed. The second time however the CDN has the content loaded and it’s will be available to everyone who is closest to that Tokyo or Hong Kong CDN server.
    - Typically you don’t have control over how long the pull CDN cache lasts, so if you update a photo or theme, it might take up to 24 hours for all of your readers (and you) to see it. 
    - Pull CDNs grab new content from your server when the first user requests the content. You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN.
    - A time-to-live (TTL) determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they have actually changed.
    - Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested content remaining on the CDN.
* Push CDN
    - Going along with the example above, instead of waiting around for the CDN to pull the content when it’s needed, you simply upload the entire content of your travel blog to the CDN beforehand. That way your pictures, theme files, videos, and the rest are always on the CDN servers around the world.
    - push CDN can put added strain on your server if it’s underpowered for your traffic, or you have lots of changing content in a given day. The reason being, pushing all of your data and any changes as they happen to the CDN takes work on your server’s part.
    - Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage.

    - Sites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals.

### Cache eviction policies
* First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before.
* Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before.
* TTL(Time to live)
* Least Recently Used (LRU): Discards the least recently used items first.
* Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first.
* Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first.
* Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary.

* Redis has features like persistence and built in data structure like list and set. But if you just need to cache, take Memcached, because it scales like a charm.

* In  memory cache like memcache and Redis are even faster as data are stored in RAM.

## Concurrency
* Shared state:
    - Everyone can access anything anytime
    - Indeterministic
    - Using lock we can introduce determinism
* An asynchronous network is one in which there is no bound on how long messages may take to be delivered by the network or processed by a machine. The important consequence of this property is that there's no way to distinguish between a machine that has failed, and one whose messages are getting delayed.

* A data store is available if and only if all get and set requests eventually return a response that's part of their specification. This does not permit error responses, since a system could be trivially available by always returning an error.

* There is no requirement for a fixed time bound on the response, so the system can take as long as it likes to process a request. But the system must eventually respond.
* A partition is when the network fails to deliver some messages to one or more nodes by losing them (not by delaying them - eventual delivery is not a partition).

* The term is sometimes used to refer to a period during which no messages are delivered between two sets of nodes. This is a more restrictive failure model. We'll call these kinds of partitions total partitions.
* 

### Load Balancer:
* dispatcher that determines which worker instance will handle the request based on different policies. The application should best be "stateless" so any worker instance can handle the request.
![](images/1.PNG)

* Load balancers distribute incoming client requests to computing resources such as application servers and databases. In each case, the load balancer returns the response from the computing resource to the appropriate client. Load balancers are effective at
    - Preventing requests from going to unhealthy servers
    - Preventing overloading resources
    - Helping eliminate single points of failure
    - SSL termination : Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations. Removes the need to install X.509 certificates on each server
    - Session persistence - Issue cookies and route a specific client's requests to same instance if the web apps do not keep track of sessions
* To protect against failures, it's common to set up multiple load balancers, either in active-passive or active-active mode.

### Scatter and Gather
* Dispatcher multicast the request to all workers of the pool. Each worker will compute a local result and send it back to the dispatcher, who will consolidate them into a single response and then send back to the client.
* This pattern is used in Search engines like Yahoo, Google to handle user's keyword search request
![](images/2.PNG)

### Result cache
* dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.
![](images/3.PNG)

### Shared Space
* also known as "Blackboard"; all workers monitors information from the shared space and contributes partial knowledge back to the blackboard. The information is continuously enriched until a solution is reached.
* Used in JavaSpace, GigaSpace
![](images/4.PNG)

### Pipe and filter
![](images/5.PNG)

### Map Reduce
* The model is targeting batch jobs where disk I/O is the major bottleneck. It use a distributed file system so that disk I/O can be done in parallel.
![](images/6.PNG)

###  Bulk Synchronous Parellel
* based on lock-step execution across all workers, coordinated by a master.
* Each worker read data from input queue
* Each worker perform local processing based on the read data
* Each worker push local result along its direct connection
![](images/7.PNG)

### Execution Orchestrator
* This model is based on an intelligent scheduler / orchestrator to schedule ready-to-run tasks (based on a dependency graph) across a clusters of dumb workers.
![](images/8.PNG)



## Data Partitioning
![](images/sharding.PNG)
* Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster.
* Similar to the advantages of federation, sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you'll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput.
* Common ways to shard a table of users is either through the user's last name initial or the user's geographic location.
* You'll need to update your application logic to work with shards, which could result in complex SQL queries.
* Data distribution can become lopsided in a shard. For example, a set of power users on a shard could result in increased load to that shard compared to others.
    - Rebalancing adds additional complexity. A sharding function based on consistent hashing can reduce the amount of transferred data.
* A parallel backend means you can do more work simultaneously. You can handle higher user loads, especially when writing data, because there are parallel paths through your system. You can load balance web servers, which access shards over different network paths, which are processed by separate CPUs, which use separate caches of RAM and separate disk IO paths to process work. Very few bottlenecks limit your work.
* Joining data from multiple shards is more complex.
* Sharding adds more hardware and additional complexity.
* We can scale RDBMS using sharding/partitioning
* Scaling Reads to RDBMS is hard, scaling write to RDBMS is impossible.
* Vertical splitting is partitioning and horizontal splitting is sharding.

* process of splitting up a DB/table across multiple machines to improve the manageability, performance, availability, and load balancing of an application. 
* The justification for data partitioning is that, after a certain scale point, it is cheaper and more feasible to scale horizontally by adding more machines than to grow it vertically by adding beefier servers.
* Horizontal partitioning:
    - put different rows into different tables. For example, if we are storing different places in a table, we can decide that locations with ZIP codes less than 10000 are stored in one table and places with ZIP codes greater than 10000 are stored in a separate table.
    - also called a range based partitioning as we are storing different ranges of data in separate tables. Horizontal partitioning is also called as Data Sharding. Ex. in Tinder store all records pertaining to some region in one sharding.
    - If the value whose range is used for partitioning isn’t chosen carefully, then the partitioning scheme will lead to unbalanced servers.
   
* Vertical Partitioning
    - we divide our data to store tables related to a specific feature in their own server. For example, if we are building Instagram like application - where we need to store data related to users, photos they upload, and people they follow - we can decide to place user profile information on one DB server, friend lists on another, and photos on a third server.
    - if our application experiences additional growth, then it may be necessary to further partition a feature specific DB across various servers
   
* Directory Based Partitioning
    - create a lookup service which knows your current partitioning scheme and abstracts it away from the DB access code.we query the directory server that holds the mapping between each tuple key to its DB server. This loosely coupled approach means we can perform tasks like adding servers to the DB pool or changing our partitioning scheme without having an impact on the application.
    - Single point  of failure and higher load.
* We can use hierarchical sharding, sharding inside sharding. Also use index inside sharding (Find all people in San Diego with age range of this)

### Partitioning Criteria
* Key or Hash-based partitioning: we apply a hash function to some key attributes of the entity we are storing; that yields the partition number. The fundamental problem with this approach is that it effectively fixes the total number of DB servers, since adding new servers means changing the hash function which would require redistribution of data and downtime for the service. A workaround for this problem is to use Consistent Hashing.

* List Partitioning:  each partition is assigned a list of values, so whenever we want to insert a new record, we will see which partition contains our key and then store it there. For example, we can decide all users living in Iceland, Norway, Sweden, Finland, or Denmark will be stored in a partition for the Nordic countries.

* Round-robin partitioning: This is a very simple strategy that ensures uniform data distribution. With ‘n’ partitions, the ‘i’ tuple is assigned to partition (i mod n).

* operations across multiple tables or multiple rows in the same table will no longer run on the same server.
* Join and Denormalization
    - joins will not be performance efficient since data has to be compiled from multiple servers. A common workaround for this problem is to denormalize the database so that queries that previously required joins can be performed from a single table. Of course, the service now has to deal with all the perils of denormalization such as data inconsistency.
* Referential Integrity:
    - trying to enforce data integrity constraints such as foreign keys in a partitioned database can be extremely difficult.
    - Applications that require referential integrity on partitioned databases often have to enforce it in application code. Often in such cases, applications have to run regular SQL jobs to clean up dangling references.
* Rebalancing:
    - There could be many reasons we have to change our partitioning scheme:
    - The data distribution is not uniform, e.g., there are a lot of places for a particular ZIP code that cannot fit into one database partition.
    - There is a lot of load on a partition, e.g., there are too many requests being handled by the DB partition dedicated to user photos.
    - In such cases, either we have to create more DB partitions or have to rebalance existing partitions, which means the partitioning scheme changed and all existing data moved to new locations. Doing this without incurring downtime is extremely difficult. Using a scheme like directory based partitioning does make rebalancing a more palatable experience at the cost of increasing the complexity of the system and creating a new single point of failure (i.e. the lookup service/database).

* Always think do i need ACID? Does eventual consistent a better fit?

## Indexing
* make it faster to search through the table and find the row or rows that we want. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.
* Index is a data structure that can be perceived as a table of contents that points us to the location where actual data lives. So when we create an index on a column of a table, we store that column and a pointer to the whole row in the index.
![](images/indexing1.SVG)

* An index can dramatically speed up data retrieval but may itself be large due to the additional keys, which slow down data insertion & update.
* When adding rows or making updates to existing rows for a table with an active index, we not only have to write the data but also have to update the index. 
* adding unnecessary indexes on tables should be avoided and indexes that are no longer used should be removed.

## Proxies
* A proxy server is an intermediate server between the client and the back-end server. Clients connect to proxy servers to make a request for a service like a web page, file, connection, etc. In short, a proxy server is a piece of software or hardware that acts as an intermediary for requests from clients seeking resources from other servers.
* proxies are used to filter requests, log requests, or sometimes transform requests (by adding/removing headers, encrypting/decrypting, or compressing a resource). Another advantage of a proxy server is that its cache can serve a lot of requests
* Proxies can reside on the client’s local server or anywhere between the client and the remote servers. 
* An open proxy is a proxy server that is accessible by any Internet user. 
* Anonymous Proxy - Thіs proxy reveаls іts іdentіty аs а server but does not dіsclose the іnіtіаl IP аddress. Though thіs proxy server cаn be dіscovered eаsіly іt cаn be benefіcіаl for some users аs іt hіdes their IP аddress.
* Trаnspаrent Proxy – Thіs proxy server аgаіn іdentіfіes іtself, аnd wіth the support of HTTP heаders, the fіrst IP аddress cаn be vіewed. The mаіn benefіt of usіng thіs sort of server іs іts аbіlіty to cаche the websіtes.
* A reverse proxy retrieves resources on behalf of a client from one or more servers. These resources are then returned to the client, appearing as if they originated from the proxy server itself

## Redundancy and Replication
* Redundancy is the duplication of critical components or functions of a system with the intention of increasing the reliability of the system
* Redundancy plays a key role in removing the single points of failure in the system and provides backups if needed in a crisis.
* Replication means sharing information to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
* master-slave relationship between the original and the copies. The master gets all the updates, which then ripple through to the slaves. Each slave outputs a message stating that it has received the update successfully, thus allowing the sending of subsequent updates.
* Active replication: Push
* Passive replication: pull
    - Data not available, read from peer, then store locally
* Master slave replication
![](images/master_slave_Replication.PNG)

* Tree Replication
![](images/Tree_replication.PNG)

* Mater-Master replication
![](images/master_master.PNG)

* Buddy Replication
![](images/buddy.PNG)
![](images/buddy1.PNG)


### Database
![](images/db.PNG)

#### RDBMS
* A relational database like SQL is a collection of data items organized in tables.
* ACID is a set of properties of relational database transactions.
    - Atomicity - Each transaction is all or nothing
    - Consistency - Any transaction will bring the database from one valid state to another
    - Isolation - Executing transactions concurrently has the same results as if the transactions were executed serially
    - Durability - Once a transaction has been committed, it will remain so
* There are many techniques to scale a relational database: master-slave replication, master-master replication, federation, sharding, denormalization, and SQL tuning.

##### Master-Slave Replication
* The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.

##### Master-master replication
* Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.
* You'll need a load balancer or you'll need to make changes to your application logic to determine where to write.
* Most master-master systems are either loosely consistent (violating ACID) or have increased write latency due to synchronization.
* Conflict resolution comes more into play as more write nodes are added and as latency increases.

##### Replication Disadvantage
* There is a potential for loss of data if the master fails before any newly written data can be replicated to other nodes.
* Writes are replayed to the read replicas. If there are a lot of writes, the read replicas can get bogged down with replaying writes and can't do as many reads.
* The more read slaves, the more you have to replicate, which leads to greater replication lag.
* On some systems, writing to the master can spawn multiple threads to write in parallel, whereas read replicas only support writing sequentially with a single thread.
* Replication adds more hardware and additional complexity.

#### Federation
![](images/federation.PNG)
* Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: forums, users, and products, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.
* Federation is not effective if your schema requires huge functions or tables.
* You'll need to update your application logic to determine which database to read and write.
* You'll need to update your application logic to determine which database to read and write.
* Joining data from two databases is more complex with a server link.
    - You'll need to use sp_addlinkedserver to create a server link.  Once the server link is established, you'll construct the query as normal, just prefixing the database name with the other server
    ```
    SELECT *
    FROM [MyDatabaseOnDB1].[dbo].[MyTable] tab1
    INNER JOIN [DB2].[MyDatabaseOnDB2].[dbo].[MyOtherTable] tab2
        ON tab1.ID = tab2.ID
    ```
    - Once the link is established, you can also use OPENQUERY to execute a SQL statement on the remote server and transfer only the data back to you. This can be a bit faster, and it will let the remote server optimize your query. If you cache the data in a temporary (or in-memory) table on DB1 in the example above, then you'll be able to query it just like joining a standard table. 
    ```
    -- Fetch data from the other database server
    SELECT *
    INTO #myTempTable
    FROM OPENQUERY([DB2], 'SELECT * FROM [MyDatabaseOnDB2].[dbo].[MyOtherTable]')

    -- Now I can join my temp table to see the data
    SELECT * FROM [MyDatabaseOnDB1].[dbo].[MyTable] tab1
    INNER JOIN #myTempTable tab2 ON tab1.ID = tab2.ID
    ```
* Federation adds more hardware and additional complexity. 

### Denormalization
*  improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Some RDBMS such as PostgreSQL and Oracle support materialized views which handle the work of storing redundant information and keeping redundant copies consistent.
* Once data becomes distributed with techniques such as federation and sharding, managing joins across data centers further increases complexity. Denormalization might circumvent the need for such complex joins.
* Data is duplicated.
* Constraints can help redundant copies of information stay in sync, which increases complexity of the database design.
* A denormalized database under heavy write load might perform worse than its normalized counterpart.

### SQL Tuning
* It's important to benchmark and profile to simulate and uncover bottlenecks.
* Benchmark - Simulate high-load situations with tools such as ab.
* Profile - Enable tools such as the slow query log to help track performance issues.
    - The slow query log consists of SQL statements that take more than long_query_time seconds to execute and require at least min_examined_row_limit rows to be examined. The slow query log can be used to find queries that take a long time to execute and are therefore candidates for optimization. However, examining a long slow query log can be a time-consuming task. 
* Use CHAR instead of VARCHAR for fixed-length fields.
      - CHAR effectively allows for fast, random access, whereas with VARCHAR, you must find the end of a string before moving onto the next one.
* Use TEXT for large blocks of text such as blog posts. TEXT also allows for boolean searches. Using a TEXT field results in storing a pointer on disk that is used to locate the text block.
* Use INT for larger numbers up to 2^32 or 4 billion.
* Use DECIMAL for currency to avoid floating point representation errors.
* Avoid storing large BLOBS, store the location of where to get the object instead.
* Set the NOT NULL constraint where applicable to improve search performance. Oracle database do not index column with NULL value.
* Use good indexing
    - Columns that you are querying (SELECT, GROUP BY, ORDER BY, JOIN) could be faster with indices.
    - Indices are usually represented as self-balancing B-tree that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.
    - Placing an index can keep the data in memory, requiring more space.
    - Writes could also be slower since the index also needs to be updated.
    - When loading large amounts of data, it might be faster to disable indices, load the data, then rebuild the indices.
* Avoid expensive joins
    - Denormalize where performance demands it.
* Partition tables
    - Break up a table by putting hot spots in a separate table to help keep it in memory.

## SQL vs NoSQL
* Relational databases are structured and have predefined schemas. Non-relational databases are unstructured, distributed, and have a dynamic schema. 
* Relational databases store data in rows and columns. MySQL, Oracle, MS SQL Server, SQLite, Postgres, and MariaDB. 
* Key-Value Stores: Data is stored in an array of key-value pairs. The ‘key’ is an attribute name which is linked to a ‘value’. 
    - Redis, Voldemort, and Dynamo.
    - A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.
    - Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.
    - Disadvantage of Key-value over table
        - No way to make a column mandatory (equivalent of NOT NULL).
        - No way to use SQL data types to validate entries.
        - No way to ensure that attribute names are spelled consistently.
        - No way to put a foreign key on the values of any given attribute, e.g. for a lookup table.
        - Fetching results in a conventional tabular layout is complex and expensive, because to get attributes from multiple rows you need to do JOIN for each attribute.
* Document Databases: In these databases, data is stored in documents and these documents are grouped together in collections. Each document can have an entirely different structure.
    - CouchDB and MongoDB
    - A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself. Note, many key-value stores include features for working with a value's metadata, blurring the lines between these two storage types.
    - Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other.
    - Some document stores like MongoDB and CouchDB also provide a SQL-like language to perform complex queries. DynamoDB supports both key-values and documents.
    - Document stores provide high flexibility and are often used for working with occasionally changing data.

* Wide-Column Databases: Instead of ‘tables,’ in columnar databases we have column families, which are containers for rows. Unlike relational databases, we don’t need to know all the columns up front and each row doesn’t have to have the same number of columns. Columnar databases are best suited for analyzing large dataset
    - Cassandra, HBase
    - A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.
* Graph Databases: These databases are used to store data whose relations are best represented in a graph. Data is saved in graph structures with nodes (entities), properties (information about the entities), and lines
    - Neo4j, InfiniteGraph
    - In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships.
* Reasons for SQL:
    - Structured data
    - Strict schema
    - Relational data
    - Need for complex joins
    - Transactions
    - Clear patterns for scaling
    - More established: developers, community, code, tools, etc
    - Lookups by index are very fast
* Reasons for NoSQL:
    - Semi-structured data
    - Dynamic or flexible schema
    - Non-relational data
    - No need for complex joins
    - Store many TB (or PB) of data
    - Very data intensive workload
    - Very high throughput for IOPS
    - Sample data well-suited for NoSQL:

* Rapid ingest of clickstream and log data
    - Leaderboard or scoring data
    - Temporary data, such as a shopping cart
    - Frequently accessed ('hot') tables
    - Metadata/lookup tables
![](images/nosql.PNG)

* SQL stores data in tables where each row represents an entity and each column represents a data point. NoSQL databases have different data storage models. The main ones are key-value, document, graph, and columnar.
* In SQL, each record conforms to a fixed schema, meaning the columns must be decided and chosen before data entry and each row must have data for each column. The schema can be altered later, but it involves modifying the whole database and going offline. In NoSQL, schemas are dynamic. Columns can be added on the fly and each ‘row’ (or equivalent) doesn’t have to contain data for each ‘column.’
* SQL databases are vertically scalable. It is possible to scale a relational database across multiple servers, but this is a challenging and time-consuming process. master-slave replication (read from slaves, write to master) and upgrade your master server by adding RAM, RAM and more RAM NoSQL databases are horizontally scalable, meaning we can add more servers easily in our NoSQL database infrastructure to handle a lot of traffic.
* The vast majority of relational databases are ACID compliant. So, when it comes to data reliability and safe guarantee of performing transactions, SQL databases are still the better bet. Most of the NoSQL solutions sacrifice ACID compliance for performance and scalability.
* We need to ensure ACID compliance. ACID compliance reduces anomalies and protects the integrity of your database by prescribing exactly how transactions interact with the database. E-commerce and financial application needs ACID compliance.
* Your data is structured and unchanging. If your business is not experiencing massive growth that would require more servers and if you’re only working with data that is consistent, then there may be no reason to use a system designed to support a variety of data types and high traffic volume.
* Storing large volumes of data that often have little to no structure. A NoSQL database sets no limits on the types of data we can store together and allows us to add new types as the need changes. With document-based databases, you can store data in one place without having to define what “types” of data those are in advance.
* Cloud-based storage is an excellent cost-saving solution but requires data to be easily spread across multiple servers to scale up. 
*  NoSQL is extremely useful for rapid development as it doesn’t need to be prepped ahead of time. If you’re working on quick iterations of your system which require making frequent updates to the data structure without a lot of downtime between versions,
* In NoSQL Joins will need to be done in your application code.
* NoSQL all relevant data are at same place (denormalized), retrieval is fast. Join does not needed
* Schemna is flexible in NoSQL.
* Horizontal partittioning is in built. (Scaling)
* NoSql not good for updates, ACID not guaranteed
* Relations are not implicit in NoSQL.
* NoSQL is a collection of data items represented in a key-value store, document store, wide column store, or a graph database. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores lack true ACID transactions and favor eventual consistency.
* BASE is often used to describe the properties of NoSQL databases. In comparison with the CAP Theorem, BASE chooses availability over consistency.
    - Basically available - the system guarantees availability.
    - Soft state - the state of the system may change over time, even without input.
    - Eventual consistency - the system will become consistent over a period of time, given that the system doesn't receive input during that period.

## CAP
* it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability, and Partition tolerance. 
* Consistency: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before allowing further reads.
* Availability: Every request gets a response on success/failure. Availability is achieved by replicating the data across different servers.
* Partition tolerance: The system continues to work despite message loss or partial failure.
* To be consistent, all nodes should see the same set of updates in the same order. But if the network suffers a partition, updates in one partition might not make it to the other partitions before a client reads from the out-of-date partition after having read from the up-to-date one. The only thing that can be done to cope with this possibility is to stop serving requests from the out-of-date partition, but then the service is no longer 100% available.
* Consistency/Partition Tolerance - Wait for a response from the partitioned node which could result in a timeout error. The system can also choose to return an error, depending on the scenario you desire. Choose Consistency over Availability when your business requirements dictate atomic reads and writes.

## Asynchronous processing and Queue
* Asynchronous workflows help reduce request times for expensive operations that would otherwise be performed in-line. They can also help by doing time-consuming work in advance, such as periodic aggregation of data.
* Use cases such as inexpensive calculations and realtime workflows might be better suited for synchronous operations, as introducing queues can add delays and complexity.

#### Message Queue
* Asynchronous service to service communication used in serverless and microservice architecture.
* Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message queue with the following workflow:

* An application publishes a job to the queue, then notifies the user of job status
* A worker picks up the job from the queue, processes it, then signals the job is complete
* The user is not blocked and the job is processed in the background. During this time, the client might optionally do a small amount of processing to make it seem like the task has completed. For example, if posting a tweet, the tweet could be instantly posted to your timeline, but it could take some time before your tweet is actually delivered to all of your followers.
* Redis is useful as a simple message broker but messages can be lost.
* RabbitMQ is popular but requires you to adapt to the 'AMQP' protocol and manage your own nodes.\
* Amazon SQS is hosted but can have high latency and has the possibility of messages being delivered twice.

#### Task Queue
* Tasks queues receive tasks and their related data, runs them, then delivers their results. They can support scheduling and can be used to run computationally-intensive jobs in the background.
* Celery has support for scheduling and primarily has python support.

#### Back pressure
* If queues start to grow significantly, the queue size can become larger than memory, resulting in cache misses, disk reads, and even slower performance. Back pressure can help by limiting the queue size, thereby maintaining a high throughput rate and good response times for jobs already in the queue. Once the queue fills up, clients get a server busy or HTTP 503 status code to try again later.



* Slow operation must be done asynchronously, otherwise user gets stuck waiting to process to complete.
* For processing you'd like to perform inline with a request but is too slow, the easiest solution is to create a message queue. Ex. RabbitMQ. allow your web applications to quickly publish messages to the queue, and have other consumers processes perform the processing outside the scope and timeline of the client request.
    * perform almost no work in the consumer (merely scheduling a task) and inform your user that the task will occur offline
    * perform enough work in-line to make it appear to the user that the task has completed, and tie up hanging ends afterwards (posting a message on Twitter or Facebook likely follow this pattern by updating the tweet/message in your timeline but updating your followers' timelines out of band
* allow you to create a separate machine pool for performing off-line processing rather than burdening your web application servers.
* AP - Availability/Partition Tolerance - Return the most recent version of the data you have, which could be stale. This system state will also accept writes that can be processed later when the partition is resolved. Choose Availability over Consistency when your business requirements allow for some flexibility around when the data in the system synchronizes. Availability is also a compelling option when the system needs to continue to function in spite of external errors 
* doing the time-consuming work in advance and serving the finished work with a low request time. Very often this paradigm is used to turn dynamic content into static content.  Pages of a website, maybe built with a massive framework or CMS, are pre-rendered and locally stored as static HTML files on every change. imagine the scalability of your website if the script would upload these pre-rendered HTML pages to AWS S3 or Cloudfront or another Content Delivery Network
* A user comes to your website and starts a very computing intensive task which would take several minutes to finish. So the frontend of your website sends a job onto a job queue and immediately signals back to the user: your job is in work, please continue to the browse the page. The job queue is constantly checked by a bunch of workers for new jobs. If there is a new job then the worker does the job and after some minutes sends a signal that the job was done. The frontend, which constantly checks for new “job is done” - signals, sees that the job was done and informs the user about it.

## Networking Metrics
* Bandwidth: Max amount of data that can be transfered in unit of time. (bits per sec)
* Throughput: Actual amount of data transfered in unit time
* Latency: How long it takes to go from one end to other. Delay between sender sending info and receiver receives. (Data compression can help in latency)

## MapReduce
* Map takes data emits key,val pair
* Reduce takes key and set of associated values, reduce them in some way, emitting new key,val pair

## Platform Server
* Decouple web server and db
* When we need to add new API, we can add platform servers without adding unnecessary for the web application tier. Here we will have specialized server, with specialized hardware which can boost efficiency as per type of platform server
*  way to reuse your infrastructure for multiple products or interfaces (a web application, an API, an iPhone app, etc) without writing too much redundant boilerplate code for dealing with caches, databases, etc.

### Application Layer
* Separating out the web layer from the application layer (also known as platform layer) allows you to scale and configure both layers independently. Adding a new API results in adding application servers without necessarily adding additional web servers. The single responsibility principle advocates for small and autonomous services that work together. Small teams with small services can plan more aggressively for rapid growth.
![](images/application_server.PNG)
## Consistent Hashing
* Typical hash function is NOT horizontally scalable. Whenever a new cache host is added to the system, all existing mappings are broken. It will be a pain point in maintenance if the caching system contains lots of data. Practically, it becomes difficult to schedule a downtime to update all caching mappings.

* Centralised component A component in application architecture which can not be scaled out adds an upper limit on number of requests that entire architecture or request pipeline can handle.
* High latency component A slow component in request pipeline puts lower limit on the response time of the application. Usual solution to fix this issue is to make high latency components into background jobs or executing them asynchronously with queuing.

## Thread pool size
* thread pool size determines the number of concurrent requests that can be handled at any given time. If a web application gets more requests than thread pool size, excess requests are either queued or rejected.
* Concurrent requests are number of requests being processed while only few of them could be running on CPUs at any point of time. Parallel requests are number of requests being processed while all of them are running on CPUs at any point of time.
* In Non-blocking IO applications such as NodeJS, a single thread (process) can handles multiple requests concurrently. In multi-core CPUs boxes, parallel requests can be handled by increasing number of threads or processes.
* In blocking IO applications such as Java SpringMVC, a single thread can handle only one request concurrently. To handle more than one request concurrently we have to increase the number of threads.
* In CPU bound applications thread Pool size should be equal number of cpus on the box. Adding more threads would interrupt request processing due to thread context switching and also increases response time.
* Little's Law:
    - The average number of threads in a system (Threads) is equal average web request arrival rate (WebRequests per sec), multiplied by the average response time (ResponseTime)
    - Threads = (request/sec) * ResponseTime
* In most web applications, few types of web request take much longer to process than other web request types.The slower web requests might hog all threads and bring down entire application.
    - Couple of ways to handle this issue is
        - to have separate box to handle slow web requests.
        - to allocate a separate thread pool for slow web requests within the same application.
* Application with blocking IO
    - CPU process, wait for IO. If we have other instanced of process, we can context switch to other process while previous process is waiting for an IO.
* Application with Non blocking IO
    - If we treat processing busts as events and queue events to be executed on a single thread, we can make sure the thread (CPU) is always occupied and thus giving us maximising CPU usage and avoiding unnecessary context switching.

## Long polling vs WebSockets vs Server-sent events
* Communication protocols between client like a web browser and a web server.

### AJAX polling
- Client repeatedly polls a server of the data.
- Client makes a request and waits for server to respond with data. If no data empty response is sent back
- Client has to keep asking for update, which creates HTTP overhead.
![](images/AJAX_polling.SVG)

### HTTP Long Polling
* variation of the traditional polling technique that allows the server to push information to a client whenever the data is available.
* With Long-Polling, the client requests information from the server exactly as in normal polling, but with the expectation that the server may not respond immediately. That’s why this technique is sometimes referred to as a “Hanging GET”.
* If the server does not have any data available for the client, instead of sending an empty response, the server holds the request and waits until some data becomes available.
* The client makes an initial request using regular HTTP and then waits for a response.
* The server delays its response until an update is available or a timeout has occurred.
* When an update is available, the server sends a full response to the client.
* The client typically sends a new long-poll request, either immediately upon receiving a response or after a pause to allow an acceptable latency period.
* Each Long-Poll request has a timeout. The client has to reconnect periodically after the connection is closed due to timeouts.

### WebSockets
* WebSocket provides Full duplex communication channels over a single TCP connection. It provides a persistent connection between a client and a server that both parties can use to start sending data at any time. 
* The client establishes a WebSocket connection through a process known as the WebSocket handshake
* The WebSocket protocol enables communication between a client and a server with lower overheads, facilitating real-time data transfer from and to the server. This is made possible by providing a standardized way for the server to send content to the browser without being asked by the client and allowing for messages to be passed back and forth while keeping the connection open.

### Server sent events
* Under SSEs the client establishes a persistent and long-term connection with the server. The server uses this connection to send data to a client. If the client wants to send data to the server, it would require the use of another technology/protocol to do so.
* Client requests data from a server using regular HTTP.
* The requested webpage opens a connection to the server.
* The server sends the data to the client whenever there’s new information available.
* SSEs are best when we need real-time traffic from the server to the client or if the server is generating data in a loop and will be sending multiple events to the client.
![](images/Server_sent_Events.SVG)

## Microservices
* described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. Pinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc.
* Microservices can add complexity in terms of deployments and operations.
* Monolithic
    - Any change in code has to be deployed and monitor entire application.
    - Faster 'NO' RPC
    - Developer has to understand whole system
    - Failure leads entire application down.
    - Stack Overflow is monolithic
* Microservice
    - Easy to scale
    - Easier to new team member
    - More servers for parts only which are used more.
* Engineering approach focused on decomposing applications into single function modules with well defined interfaces which are independently deployed and operated by small team who owns entire life cycle of the service.
    - Decomposing: Dividing one monolithic application in small services called microservices
* Instance of microservices are created when it is needed and thrown away when not needed.
* Service can be updated without affecting other.
* Component presented as service. Component can be updated without affecting other. Each service can be written in any languages.
* Data is distributed, In monolithic there is single database servicing application. In microservices, each service manages own data. So service can use best db system relevant to its task.
* API gateway is used to communicate with microservices from outside world.
* There is a registry, which keeps track of which microservice is running and what are it's instances. Other microservice ask registry when it wants to contact other microservice.

### Service Discovery
* Systems such as Consul, Etcd, and Zookeeper can help services find each other by keeping track of registered names, addresses, and ports. Health checks help verify service integrity and are often done using an HTTP endpoint. Both Consul and Etcd have a built in key-value store that can be useful for storing config values and other shared data.

## API design
* Naming is important
* Accurate parameter
* Make response concise and to the point
* Define error properly
* API should not have side effect (Doing everything in one place)
* Pagination and fragmentation is useful when the response of API is huge.

## Distributed Consensus
* Multiple nodes agrees on some value
* 2PC, 3PC, MVCC

## Hyperloglog
* Find all unique active connections 
```
SELECT DISTINCT userid from connections WHERE status = 'active'
```
* O(N) operation, O(N) space complecity

## Redis
* In memory key-val store
* Also persistent
* Types : list, set, sorted set, hash tables, append-able buffers
* All data is eventual persistent (synchronized to disk) Pick risk level vs persistence
* O(1) behavior
* ideal for write heavy workload
* Support atomic operation
* Support transaction
* Has pub/sub functionality
* if db is big, redis can handle swapping on its own. Keys remain in memory and least used values swapped to disk
* List operation allowed to push and pop at both side, random access and ranges at O(N)
* Client can subscribe to channels and receive notification when message are sent to channels. O(1) subscribing O(n) posting message. Ex. chats, comet applications, real time analytics, twitter

### Memcached
* No persistency
* No redundancy / No fail-over
* No replication
* No active cleanup / clean up allowed when more space needed (LRU)


### Consistency

#### Weak Consistency
* After a write, reads may or may not see it.
* This approach is seen in systems such as memcached. Weak consistency works well in real time use cases such as VoIP, video chat, and realtime multiplayer games. For example, if you are on a phone call and lose reception for a few seconds, when you regain connection you do not hear what was spoken during connection loss.
* Consistency when we use cache is weak consistency
* Real time multiplayer game

#### Eventual Consistency
* After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.
* This approach is seen in systems such as DNS and email. Eventual consistency works well in highly available systems.
* mail, search engine indexing, Amaozon S3, SimpleDB

#### Strong Consistency
* After a write, reads will see it. Data is replicated synchronously.
* This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.
* Datastore, File system, RDBMS

### Availability
#### Fail-over
* Active-Passive: Heartbeats are sent between active and passive server on standby. If is interrupted passive server will take over the active's IP address and resume service. The length of downtime is determined by whether the passive server is already running in 'hot' standby or whether it needs to start up from 'cold' standby. Only the active server handles traffic. 
* Active-Active: In active-active, both servers are managing traffic, spreading the load between them. If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.
* Fail-over adds more hardware and additional complexity.
* There is a potential for loss of data if the active system fails before any newly written data can be replicated to the passive.

#### Replication

* Availability is counted in uptime, % of time the service is available. It is meassured in 9s 99.99% means four 9 availability.
* Overall probability of availability is on layout of failure prone componenet

* Availability in parallel:
    - Overall probability increases.
    - Availability(Total) = 1 - ((1-Availability(Foo)) * ((1-Availability(Bar)))
* Availability in Sequence
    - Overall availability decrease then 2 components are in sequence.
    - Availability(total) = Availability(foo) * Availability(bar)

### DNS
![](images/DNS.JPG)

* translates a domain name such as www.example.com to an IP address.
* our router or ISP provides information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a certain period of time, determined by the time to live (TTL).
* Some DNS services can route traffic through various methods:
    - Weighted Round robin
        - Prevent traffic from going to servers under maintenance
        - Balance between varying cluster sizes
        - A/B testing
    - Latency based
    - Geolocation based
* Accessing a DNS server introduces a slight delay, although mitigated by caching
* DNS services have recently come under DDoS attack, preventing users from accessing websites such as Twitter without knowing Twitter's IP address(es).

### OCI model
![](images/OCI.JPG)
* The model separates network functions into seven abstracted layers
* At each layer there are standards that define how data is packaged and transported.

#### Internet Protocol (IP)
* operates at the internetwork layer (Layer 3). Its PDUs are called packets, and IP is responsible for delivering them from a origin host to a destination host, usually across the boundaries between the multiple smaller networks that make up the Internet. Each device that is directly connected to the Internet has a unique IP address, which is used to locate the device as the recipient of packets.

#### Transmission Control Protocol (TCP) 
* operates at the transport layer (Layer 4). TCP effectively creates a virtual connection between the host where the browser is running and the host where a server application is running. Because of the unreliable nature of networks, IP packets can be lost, corrupted, or arrive out of order. TCP has mechanisms for correcting these errors, transforming the stream of IP packets into a reliable communication channel. Each application is assigned a unique TCP port number to enable delivery to the correct application on hosts where many applications are running.
* TCP is a connection-oriented protocol over an IP network. Connection is established and terminated using a handshake. All packets sent are guaranteed to reach the destination in the original order and without corruption through:
* Sequence numbers and checksum fields for each packet
* Acknowledgement packets and automatic retransmission
* If the sender does not receive a correct response, it will resend the packets. If there are multiple timeouts, the connection is dropped. TCP also implements flow control and congestion control. These guarantees cause delays and generally result in less efficient transmission than UDP.
* To ensure high throughput, web servers can keep a large number of TCP connections open, resulting in high memory usage. It can be expensive to have a large number of open connections between web server threads and say, a memcached server. Connection pooling can help in addition to switching to UDP where applicable.
* a connection pool is a cache of database connections maintained so that the connections can be reused when future requests to the database are required.
* TCP is useful for applications that require high reliability but are less time critical. Some examples include web servers, database info, SMTP, FTP, and SSH.
* Use TCP over UDP when:
    - You need all of the data to arrive intact
    - You want to automatically make a best estimate use of the network throughput
    
#### User Datagram Protocol(UDP)
* UDP is connectionless. Datagrams (analogous to packets) are guaranteed only at the datagram level. Datagrams might reach their destination out of order or not at all. UDP does not support congestion control. Without the guarantees that TCP support, UDP is generally more efficient.
* UDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and realtime multiplayer games.
* Use UDP over TCP when:
    - You need the lowest latency
    - Late data is worse than loss of data
    - You want to implement your own error correction
#### Hypertext Transfer Protocol (HTTP) 
* operates at the application layer (Layer 7). It defines how data is encoded for communication between web browsers and web servers . 
* HTTP is a method for encoding and transporting data between a client and a server. It is a request/response protocol: clients issue requests and servers issue responses with relevant content and completion status info about the request. HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression.
* A basic HTTP request consists of a verb (method) and a resource (endpoint). Below are common HTTP verbs:
![](images/http_verb.PNG)
* Idempotent: Can be called many times without different outcomes.
* HTTP is an application layer protocol relying on lower-level protocols such as TCP and UDP.


### Reverse Proxy
* A reverse proxy is a web server that centralizes internal services and provides unified interfaces to the public. Requests from clients are forwarded to a server that can fulfill it before the reverse proxy returns the server's response to the client.
* Increased security - Hide information about backend servers, blacklist IPs, limit number of connections per client
* Increased scalability and flexibility - Clients only see the reverse proxy's IP, allowing you to scale servers or change their configuration
* SSL termination - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations
    - Removes the need to install X.509 certificates on each server
* Compression - Compress server responses
* Caching - Return the response for cached requests
* Static content - Serve static content directly, HTML/CSS/JS, Photos, Videos
* A proxy server is a go‑between or intermediary server that forwards requests for content from multiple clients to different servers across the Internet. A reverse proxy server is a type of proxy server that typically sits behind the firewall in a private network and directs client requests to the appropriate backend server. A reverse proxy provides an additional level of abstraction and control to ensure the smooth flow of network traffic between clients and servers.
* Common use of reverse proxy server
    - Load balancing : A reverse proxy server can act as a “traffic cop,” sitting in front of your backend servers and distributing client requests across a group of servers in a manner that maximizes speed and capacity utilization while ensuring no one server is overloaded, which can degrade performance. If a server goes down, the load balancer redirects traffic to the remaining online servers.
    - Web acceleration : Reverse proxies can compress inbound and outbound data, as well as cache commonly requested content, both of which speed up the flow of traffic between clients and servers. They can also perform additional tasks such as SSL encryption to take load off of your web servers, thereby boosting their performance.
    - Security and anonymity : By intercepting requests headed for your backend servers, a reverse proxy server protects their identities and acts as an additional defense against security attacks. It also ensures that multiple servers can be accessed from a single record locator or URL regardless of the structure of your local area network.
    - Increased security – No information about your backend servers is visible outside your internal network, so malicious clients cannot access them directly to exploit any vulnerabilities. Many reverse proxy servers include features that help protect backend servers from distributed denial-of-service (DDoS) attacks, for example by rejecting traffic from particular client IP addresses (blacklisting), or limiting the number of connections accepted from each client.
    - Increased scalability and flexibility – Because clients see only the reverse proxy’s IP address, you are free to change the configuration of your backend infrastructure. This is particularly useful In a load-balanced environment, where you can scale the number of servers up and down to match fluctuations in traffic volume.
    - Compression – Compressing server responses before returning them to the client (for instance, with gzip) reduces the amount of bandwidth they require, which speeds their transit over the network.
    - SSL termination – Encrypting the traffic between clients and servers protects it as it crosses a public network like the Internet. But decryption and encryption can be computationally expensive. By decrypting incoming requests and encrypting server responses, the reverse proxy frees up resources on backend servers which they can then devote to their main purpose, serving content.
    - Caching – Before returning the backend server’s response to the client, the reverse proxy stores a copy of it locally. When the client (or any client) makes the same request, the reverse proxy can provide the response itself from the cache instead of forwarding the request to the backend server. This both decreases response time to the client and reduces the load on the backend server.
    
#### Load balancer vs reverse proxy
* A reverse proxy accepts a request from a client, forwards it to a server that can fulfill it, and returns the server’s response to the client.
* A load balancer distributes incoming client requests among a group of servers, in each case returning the response from the selected server to the appropriate client.
* Deploying a load balancer is useful when you have multiple servers. Often, load balancers route traffic to a set of servers serving the same function vs Reverse proxies can be useful even with just one web server or application server
* Solutions such as NGINX and HAProxy can support both layer 7 reverse proxying and load balancing
* Introducing a reverse proxy results in increased complexity.
* A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a failover) further increases complexity.
* Load balancers are most commonly deployed when a site needs multiple servers because the volume of requests is too much for a single server to handle efficiently. Deploying multiple servers also eliminates a single point of failure, making the website more reliable. Most commonly, the servers all host the same content, and the load balancer’s job is to distribute the workload in a way that makes the best use of each server’s capacity, prevents overload on any server, and results in the fastest possible response to the client.
* A load balancer can also enhance the user experience by reducing the number of error responses the client sees. It does this by detecting when servers go down, and diverting requests away from them to the other servers in the group. In the simplest implementation, the load balancer detects server health by intercepting error responses to regular requests. Application health checks are a more flexible and sophisticated method in which the load balancer sends separate health-check requests and requires a specified type of response to consider the server healthy.
* Another useful function provided by some load balancers is session persistence, which means sending all requests from a particular client to the same server. Even though HTTP is stateless in theory, many applications must store state information just to provide their core functionality – think of the shopping basket on an e-commerce site. Such applications underperform or can even fail in a load-balanced environment, if the load balancer distributes requests in a user session to different servers instead of directing them all to the server that responded to the initial request.
* Whereas deploying a load balancer makes sense only when you have multiple servers, it often makes sense to deploy a reverse proxy even with just one web server or application server. You can think of the reverse proxy as a website’s “public face.” Its address is the one advertised for the website, and it sits at the edge of the site’s network to accept requests from web browsers and mobile apps for the content hosted at the website.

### RPC (Remote Procedure Call)
![](images/RPC1.PNG)
*  is way to describe a mechanism that lets you call a procedure in another process and exchange data by message passing.
* a client causes a procedure to execute on a different address space, usually a remote server. The procedure is coded as if it were a local procedure call, abstracting away the details of how to communicate with the server from the client program. Remote calls are usually slower and less reliable than local calls 
* Popular RPC frameworks include Protobuf, Thrift, and Avro.
* It typically involves generating some method stubs on the client process that makes making the call appear local, but behind the stub is logic to marshall the request and send it to the server process. The server process then unmarshalls the request and invokes the desired method before repeating the process in reverse to get whatever the method returns back to the client.
* RPC is a request-response protocol:
    - Client program - Calls the client stub procedure. The parameters are pushed onto the stack like a local procedure call.
    - Client stub procedure - Marshals (packs) procedure id and arguments into a request message.
    - Client communication module - OS sends the message from the client to the server.
    - Server communication module - OS passes the incoming packets to the server stub procedure.
    - Server stub procedure - Unmarshalls the results, calls the server procedure matching the procedure id and passes the given arguments.
    - The server response repeats the steps above in reverse order.  
* Sample RPC

```
GET /someoperation?data=anId

POST /anotheroperation
{
  "data":"anId";
  "anotherdata": "another value"
}
```
* RPC clients become tightly coupled to the service implementation.
* A new API must be defined for every new operation or use case.
* Remote Method Invocation (RMI) is closely related to RPC, but it takes remote invocation a step further by making it object oriented and providing the capability to keep references to remote objects and invoke their methods.

### Representational state transfer (REST)
* REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. The server provides a representation of resources and actions that can either manipulate or get a new representation of resources. All communication must be stateless and cacheable.
* There are four qualities of a RESTful interface:
    - Identify resources (URI in HTTP) - use the same URI regardless of any operation.
    - Change with representations (Verbs in HTTP) - use verbs, headers, and body.
    - Self-descriptive error message (status response in HTTP) - Use status codes, don't reinvent the wheel.
    - HATEOAS (HTML interface for HTTP) - your web service should be fully accessible in a browser.
* Sample REST calls

```
GET /someresources/anId

PUT /someresources/anId
{"anotherdata": "another value"}
```
* REST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP APIs. REST uses a more generic and uniform method of exposing resources through URIs, representation through headers, and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is great for horizontal scaling and partitioning.
* With REST being focused on exposing data, it might not be a good fit if resources are not naturally organized or accessed in a simple hierarchy. For example, returning all updated records from the past hour matching a particular set of events is not easily expressed as a path. With REST, it is likely to be implemented with a combination of URI path, query parameters, and possibly the request body.
* REST typically relies on a few verbs (GET, POST, PUT, DELETE, and PATCH) which sometimes doesn't fit your use case. For example, moving expired documents to the archive folder might not cleanly fit within these verbs.
* Fetching complicated resources with nested hierarchies requires multiple round trips between the client and server to render single views, e.g. fetching content of a blog entry and the comments on that entry. For mobile applications operating in variable network conditions, these multiple roundtrips are highly undesirable.
* Over time, more fields might be added to an API response and older clients will receive all new data fields, even those that they do not need, as a result, it bloats the payload size and leads to larger latencies.
![](images/REST_RPC.PNG)

* Hypertext Transfer Protocol (HTTP) is a RESTful protocol for exposing resources across distributed systems. In HTTP the server tells clients about resources by providing a representation about those resources in the body of an HTTP request. If the body is HTML, legal subsequent actions are given in A tags that let the client either get new representations via additional GET requests, or act on resources via POST/PUT or DELETE requests.

* cases where I've found RPC to be a better fit:
    - Tight coupling. The (distributed) components of the system are designed to work together, and changing one will likely impact all of the others. It is unlikely that the components will have to be adapted to communicate with other systems in the future.
    - Reliable communication. The components will communicate with each other either entirely on the same host or on a network that is unlikely to experience latency issues, packet loss, etc.. (This still means you need to design your system to handle these cases, however.)
    - Uniform language. All (or mostly all) components will be written in a single language. It is unlikely that additional components written in a different language will be added in the future.
    
* The fundamental problem with RPC is coupling. RPC clients become tightly coupled to service implementation in several ways and it becomes very hard to change service implementation without breaking clients:
    - Clients are required to know procedure names;
    - Procedure parameters order, types and count matters. It's not that easy to change procedure signatures(number of arguments, order of arguments, argument types etc...) on server side without breaking client implementations;
    - RPC style doesn't expose anything but procedure endpoints + procedure arguments. It's impossible for client to determine what can be done next.
*  in REST style it's very easy to guide clients by including control information in representations(HTTP headers + representation). For example:
    - It's possible (and actually mandatory) to embed links annotated with link relation types which convey meanings of these URIs;
    - Client implementations do not need to depend on particular procedure names and arguments. Instead, clients depend on message formats. This creates possibility to use already implemented libraries for particular media formats (e.g. Atom, HTML, Collection+JSON, HAL etc...)
    - It's possible to easily change URIs without breaking clients as far as they only depend on registered (or domain specific) link relations;
    - It's possible to embed form-like structures in representations, giving clients the possibility to expose these descriptions as UI capabilities if the end user is human;
    - Support for caching is additional advantage;
    - Standardised status codes;

* REST is an architectural approach and means that a RESTful system has the following properties:
It is client/server: the business logic is decoupled from presentation. So you can change one without impacting the other. The cons, it adds negligible latency, but who cares, the web is the platform and everything is client/server.
It is stateless: All messages exchanged between client and server has all the context needed to know what to do with the message. This visibility has several benefits: you can route a message where ever you want depending on its contents and any server can service a request. So you an just scale your server by creating several instances of it. You dont need to send all messages from the same client or user to the same server. And if you want to supertune the backend, you can rout messages to different server depending on the message. For example, having a CPU intensive request to one server and a memory intensive one to another. The cons: the client is sending all messages with redundant information. This adds bandwidth and again, negligible latency.
It is cacheable, so if you were worried about latency you save bandwidth cacheing responses from the server.
It has a uniform interface based on hypermedia (you know, that HATEOAS thing). The great thing about this is that you can greatly improve decoupling between client and server. If server responses contain hypermedia to all referenced resources and available actions within the context of the last request, the client does not need to know much about the server but an entry point and a few conventions about the hypermedia. Properly implemented, you could change many things in the server side without rewriting a single line in the client.
And finally, it's layered, like an onion. You can put several layers of components between client and server, for routing purposes, load balancing, cacheing or whatever you need. Of course it adds latency but also lots of flexibility. And if you change a layer, only the previous layer could be impacted, so propagation of change effects is limited.
In general the only cons are related to latency in request processing times and bandwidth usage. But its a great general purpose architecture that provides:
great flexibility
lower maintenance costs
high scalability
simplicity

![](images/power_2.PNG)

![](images/latency.PNG)

# Push Notification

* This is called a publish/subscribe model. A client "subscribes" to various information "channels" provided by a server; whenever new content is available on one of those channels, the server pushes that information out to the client.

## Ranking solution
### Snowflake
* Unlike MySQL, Cassandra has no built-in way of generating unique ids
* We needed something that could generate tens of thousands of ids per second in a highly available manner
* These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how we and most Twitter clients sort tweets
* Using MySQL based ticket server like flickr does
    - . Instead of storing all our data on one really big database, we have lots of databases, each with some of the data, and spread the load between them. Sometimes we need to migrate data between databases, so we need our primary keys to be globally unique. Additionally our MySQL shards are built as master-master replicant pairs for resiliency. This means we need to be able to guarantee uniqueness within a shard in order to avoid key collisions. We’d love to go on using MySQL auto-incrementing columns for primary keys like everyone else, but MySQL can’t guarantee uniqueness across physical and logical databases.
    -  why not use GUIDs? Mostly because GUIDs are big, and they index badly in MySQL. One of the ways we keep MySQL fast is we index everything we want to query on, and we only query on indexes. So index size is a key consideration. If you can’t keep your indexes in memory, you can’t keep your database fast.
    - Centralizing Auto-Increments. A Flickr ticket server is a dedicated database server, with a single database on it, and in that database there are tables like Tickets32 for 32-bit IDs, and Tickets64 for 64-bit IDs.
    - To generate the roughly-sorted 64 bit ids in an uncoordinated manner, we settled on a composition of: timestamp, worker number and sequence number. Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper

* To preserve session we can have shared session storage
* Other approach is in cookie store server id to which we were connected and request LB to direct to that server.

#### File based caching
    - Craiglist store job advertisement as static html page
    - So no need to dynamically produce page from DB.
    - Can take advantage of cache
    - Disadvantage: space (Redundency data will be both in db and static page). When you want to change background color of the page, we have to change in all precomputed html file.

#### My SQL query cache
    - In my.cnf , add `query_cache_type = 1`, restart server. Caching is enabled.

* DNS can be Data center based LB

### Transaction
* Consistency acrodd multiple operation
* Transfer money from A to B.
    * Subtract from A, add to B
* Gives correctness, Consistency, ACID

### Multihoming
* Multihoming is the practice of connecting a host or a computer network to more than one network. This can be done in order to increase reliability or performance

* Haaving multiple FS
    - Geographically nearby
    - Catastophy - losse data Ex. SVColo Power outage.
* Having multiple DS but second one is hot failover
    - Window of lost data
    - Failover data may be inscosistenct
    - Geolocated for read , not write.
* True multihoming
    - Simultaneous writes in different DC
    - Geolocality
    - Low latency for user

### Master slave replication
* Ususally asynchronous
* RDBMS
* Weak/eventual consistency : If we cut off half way in replication, state of replica is uncertain

### Multi master replicaton
* Merging concurrent writes
* Asynchrnous, eventual consistency
* Need serialization protocol
    - TImestamp, monotonically increasing timestamp
    - Either SPOF with master election
    - Distributed consensus protocol
* No global transaction, Do whatever told to do. We will figure out afterwards.

### 2 Phase commit
* Semi-distributed consesnsus protocol
* Propose, vote, commit/abort
* Heavyweight, synchrnous, high latency
* Poor throughput

### Paxos
* Fully distributed conensus protocol
* Similar to 2PC/3PC
* Lighter but stull high latency

![](images/transaction.PNG)

### AWS
* Region around the glob and each region has multiple availability zone, each isolated with each other.
* Inherently high scalable, available and fault tolerant
    - Cloudfront
    - Route 53
    - s3
    - DynamoDB
    - Elastic load balancing
    - Amazon EFS
    - AWS lambda
    - SQS
    - SNS
    - SES
    - Step function

* Other high scalable and available with right architecture
    - EC2
    - EBS
    - RDS
    - VPC

- Many EC2 instaces available like kargem micro. CPU intensive, GPU intensive, I/O intensive, high memory instances, high storage instances
![](images/1_user_aws.PNG)

* No failover
* Verical scaling so upper limit
* Too many eggs in one basket
* No redundency

* Amount of load in web tier can be different than  db. Just separate them out

![](images/aws2.PNG)

* Databse option
    - Self managed. Put on EC2 any db we want
    - Fully managed, RDS, dynamodb, RedShift
    - AMazon Aurora: MySQL or Postgres compatible db, Automatic sorage scaling up to 64 TB
        - up to 15 read replica
        - Continuous backup to S3
        - 6-way replication across 3 zones

* Intially use RDBMS
* Use NOSQL
    - Super low latency applications
    -  metadata driven dataset
    - Highly nonrelational datasets
    - need schema less data constructs
    - Rapid data igestion
    - Massive amount of data

![](images/aws3.PNG)

![](images/aws4.PNG)

In [None]:
![](images/aws5.PNG)

* Serve some static content using Cloudfront from S3

![](images/aws6.PNG)

* S3
    - Object based db
    - Highly durable
    - Great for static assets
    - infinitely scalable
    - Objects upto 5TB in size
    - Encryption at rest and in transit
* Cloudfront
    - Content delivery network
    - Cache content for faster delivery
    - Lower load on origin
    - Geographically closer
    - Reduce latency and response time.

![](images/aws7.PNG)

* ElastiCache
    - Managed memcached or redis
    - Sclae from 1 to many node
    - Self healing (Replace dead instance)
    - Single digit ms speed
    - Local to a single AZ for memcache
    - Multi AZ with redis



* Move session data to No SQL db liuke DynamoDB

![](images/aws8.PNG)

* DynamoDB
    - Managed NoSQL db
    - Provisioned throughput for read and write (We can set it)
    - JSON support
    - TTL
* Amazon Database Migration service can be used to switch between different database just point to source and destination. It will migrate woithout causing performance issue on source.

* DyananoDB Accelerator
    - Cache the dynamodb request
![](images/aws9.PNG)

### AutoScaling

* Having just fix provisioned capacity, we underutilize resources

* Automatic resizing of compute clusters
* Define min/max pool sizes
![](images/aws10.PNG)

### Service oriented arch
* All component has their individual responsibility
* They are connected with own database
* Different part of application communicate with each other via clear contract

* Event driven Computing
* AWS Lambda
* Functions triggered by events (Like change in S3 bucket, Change in DynamoDB db.)
* Function can be Java, Python etc
* It will load the code and run and unload it.

![](images/aws11.PNG)

* Where is UI 
    - Running application in user's browser (Single page application)

* Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications

* PUT = replace the ENTIRE RESOURCE with the new representation provided (no mention of related resources in the spec from what i can see)

* PATCH = replace parts of the source resource with the values provided AND|OR other parts of the resource are updated that you havent provided (timestamps) AND|OR updating the resource effects other resources (relationships) 

* Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service. It is designed to give developers and businesses an extremely reliable and cost effective way to route end users to Internet applications by translating names like www.example.com into the numeric IP addresses like 192.0.2.1 that computers use to connect to each other.

### Zookeeper
* Coordination, Group membership, Locking, Publisher/Subscribe, leader election, Synchrnoization
* Open source high performance coordination service for distributed applications
* Naming, configuration management, lock and synchnization, group service
    - Cluster member nodes can bootstap configuration from a centralized source
* Distributed cluster management
    - Node join/leave
    - Node status in real time
* Distributed sync : locks, barriers, queue
* Leader election
* 

## Asynchronous Process
* If you are calling service or API which blocks the thread which is calling that then it is synchronous
* On other side instead of waiting thread can move forward to other processing it is asycnhronous
![](images/async.PNG)

* We can use some callback function which will be called whenever we receive response

* Video conversion , send sms or an email 

* For python we can decorate the function with celery for async computation

![](images/async2.PNG)

* In the event of worker machines is down our service is still up and message will be persisted in the queue

### Distributed Transaction

* Making consensus on some kind of values
* RAFT, SAGA, 2-phase commit

![](images/monolithic_transaction.PNG)

#### 2-Phase commit

* In prepare phase all microservices lock particular variable on which they are acting
* If time out then abort

![](images/2_phase.PNG)

* Lock for the long time

* What if coordinator fail?
* To overcome problme 3-phase commit

### SAGA
* 2 phase and 3 phase are synchronous

* SAGA is async

* All MS will receive message sequentiallly no problem for the isolation

![](images/SAGA.PNG)

### Transaction
* Unit of work which we consider as a whole and that either full happen or nothing happen.

### Pessimistic and Optimistic lock

* Pessimistic lock
    - acquire lock, do something, release the lock
* Optimistic lock
    - Read record, take note of version number, timestamp or checksum and check that the version has not changed before you write back.

### Hashing and Consistent Hashing


### Bloom filtering
* Whether given element in list or not
* Signup endpoint to check unique username
* Probabilistic data structure

### Monolithic
* Use when you have small team
* Lesser moving parts. How to maintain server and all 
* Less duplication of code
* Faster (No RPC call)
* More complexioty for new engineer
* Frequent deployment of all code. All service might crash if something breaks

### Microservices
* Easier to scale
* Different team is responsibility of different parts

### API design
* 

### Transaction Isolation Levels

* Dirty Reads A dirty read occurs when a transaction reads data that has not yet been committed. For example, suppose transaction 1 updates a row. Transaction 2 reads the updated row before transaction 1 commits the update. If transaction 1 rolls back the change, transaction 2 will have read data that is considered never to have existed.

Nonrepeatable Reads A nonrepeatable read occurs when a transaction reads the same row twice but gets different data each time. For example, suppose transaction 1 reads a row. Transaction 2 updates or deletes that row and commits the update or delete. If transaction 1 rereads the row, it retrieves different row values or discovers that the row has been deleted.

Phantoms A phantom is a row that matches the search criteria but is not initially seen. For example, suppose transaction 1 reads a set of rows that satisfy some search criteria. Transaction 2 generates a new row (through either an update or an insert) that matches the search criteria for transaction 1. If transaction 1 reexecutes the statement that reads the rows, it gets a different set of rows.

![](images/isolation_level.PNG)