# 5. Facebook messenger

## Functional

- Supports 1-on-1 conversation between users.
- Tracks online/offline status of users.
- Persists chat history.

## Non-functional

- Minimum latency when chatting.
- Consistency: same chat history from all devices.
- Availability could take hit for preserving consistency.

## Extended

- Support group chats.
- Notify users of new messages when they are offline.

## Design

<img src="img/facebookmessenger1.png" style="width:500px;height:300px;">

### When user A sends a message to user B.
- Server receives the message and send ack back to A.
- Server stores the message into DB and sends message to B.
- B receives the message and sends ack back to the server.
- Server notifies A that the message has been delivered.

### How to handle message
- If users poll the servers to see if new messages, polling needs to be frequent to minimize latency for users and most polling would return empty response. 
- If active users keep connection with the server, users can receive messages as soon as servers receive messages.

### How to keep connection
- HTTP long polling: servers hold client requests until there is some response to send back. If timeout, client must send new request.
- If servers fail, users should reconnect.

### How to track open connections
- Servers can use hash table where key is UserID and value is connection object.

### How to handle offline user
- Servers can notify sender that delivery has failed.
- If receiver connects back, servers can ask sender to retry.

### How many servers are needed
- Assume 500M connections and one server can handle 50k concurrent conecctions. Then, 10k servers are needed.

### How to know which server has which user connection
- Use load balancers in front of servers that map UserID to server.

### How to maintain sequence of messages
- Attach sequence number to every message for each client.
    - Different user may see different sequence.

### Which DB to use
- RDBMS (MySQL) or NoSQL (MongoDB) are not good because we need to do frequent read/write.
- Choose wide-column DB like HBase. HBase stores data into memory buffer and move data into disk when buffer is full. This works well with variable-sized data.
- Store multiple copies of data in different servers.

### How to manage user status
- When users start the app, pull friends status from servers.
- When users start chatting with other users, pull status at that time.

<img src="img/facebookmessenger2.png" style="width:700px;height:500px;">

### Design summary
- Clients open connection to servers to send messages.
- Servers send messages to requested users.
- All active clients keeps connection to servers.
- Messages are stored in HBase.
- Clients pull relevant user status from servers.

### Group chat
- Have mapping between GroupChatID to the list of users in the group chat. 
- Chat servers can interate the list of users to deliver the message.

## Capacity

Assume 
- 500M daily active users.
- Each user sends 40 messages per day.
- 100 bytes per message.

Storage
- 500M * 40 messages = 20B messages per day.
- 20B messages per day * 100 bytes = 2TB per day.
- For 5 years, 2 TB * 365 days * 5 years = 3.6 PB

Bandwidth
- 2TB / 86400s = 25MB/s 
- Each message is going to other users, so need the same bandwidth for upload and download.

## API

## DB

## Data paritioning

- Partition based on UserID hash.
- All messages of a user is stored in the same DB.

## Caching

- Cache recent messages in recent conversations.

## Load balancing

- In front of chat servers and cache servers.

## DB cleanup