# Pastebin

- Store texts and access the data using URLs.

## Functional

- Users can upload data and access it via URL.
- Users can provide custom URL.
- Data and link expires after certain time. (users can specify this)
- Only allow text upload.

## Non-functional

- High reliability. (data cannot be lost)
- High availability. (users should be able to access links when server are down)
    - Availability is implied by reliability.
- Minimum latency when accessing the links.
- Links should not be guessable.
    
## Extended

- Analytics.
- Service should be accessible via REST API.

## Design

Consideration
- Size limit of text uploaded.
- Size limit of custom URL length.

Randomly generate key for paste (If user did not provide custom URL)
- Key generated could be duplicate.

Key generation service
- Same idea from TinyURL

<img src="img/pastebin1.png" style="width:600px;height:400px;">

Database
- Metadata DB: relational like MySQL or key-value like Dynamo/Cassandra.
- Object storage: S3

## Capacity

Traffic
- Assume 
    - 5:1 read:write ratio. 
    - 5M read and 1M writes per day.
- New pastes per second: 1M / (24 hrs * 3600 seconds) = 12 pastes/s
- Paste reads per second: 5M / (24 hrs * 3600 seconds) = 58 pastes/s

Storage
- Assume 
    - Each paste is 10KB on average.
- We store 1M * 10KB = 10GM per day.
- Storing this data for 10 years requires 36TB.
- With 70% capacity model (we don't use more than 70% capacity at any point), we need 51.4TB.

Bandwidth 
- With 12 pastes/s writes, we need 12/s * 10KB = 120KB/s ingress.
- With 58 pastes/s read, we need 58/s * 10KB = 580KB/s ingress.

Memory
- Assume 20/80 rule.
- 0.2 * 5M * 10KB = 10GM memory need to cache.
    
## API

- addPaste(api_dev_key, paste_data, custom_url, expire_date)
    - api_dev_key: API developer key of registered account.
    - paste_data: text data to paste.
    - custom_url: user specififed custom URL to use. (optional)
    - expire_date: if not specified, default to some value. (optional)
    - Returns URL to access to paste. 
- getPaste(api_dev_key, api_paste_key)
    - api_paste_key: string representing the paste key of paste to be retrieved.
    - Returns text data of paste.
- deletePaste(api_dev_key, api_paste_key)
    - Returns boolean.

## DB

- Billions of records.
- Each object is medium sized (max 10MB)
- Read heavy
- No relations

Schema "Paste"
- URLHash (varchar 16, PK)
- ContentKey (varchar) - reference to external object storage
- CreationDate (datetime)
- ExpirationDate (datetime)
- UserID (int)

Schema "User" 
- UserID(int, PK)
- Name(varchar)
- Email(varchar)
- CreationDate(datetime)
- LastLogin(datetime)

## Data partitioning

- Consistent hashing

## Caching

- LRU

## Load balancing

- Round robin

## DB cleanup

- Lazy cleanup
    - When users access expired links, delete it from DB, and return error to users.
    - Put the key back to key DB.