# TinyURL
- Short link. This is used to save a lot of space. This URL must not be guessable.
- When users clock short link, they are redirected to the original URL.

## Functional
- Given a URL, generate a short version.
- When users click short link, redirect them to original link.
- Link should expire after certain time.

## Non-functional
- High availability.
- Redirection should have minimum latency.
- Short link must not be guessable.

## Exteneded
- Service should be accessible via REST API.

## Design

### Encode the URL
- Want to generate characters at the end of the URL.
- Assume base64 encoding.
    - 6 letters 64^6 = 68.7 billion possible strings.
    - 8 letters 64^8 = 281 trillion possible strings.
- Duplication: if multiple users enter the same URL, they can get the same shortened URL, which is not good.
         
### Key generating service
- Generates random 6 letter strings and store them in key DB.
- What is two or more servers are trying the use the same key?
    - One table for keys not in used, one table for keys in use.
    - Load some keys in memory to give to the servers and at the same time move them to "Used" table.
        - If servers die, we lose keys. But that is okay given we've got enough keys to cover shortening requests.
- DB size: 6 (characters per key) * 68.7 billion (unique keys) = 412GB
- Single point of failure? Use replicas. 

<img src="img/tinyurl1.png" style="width:500px;height:300px;">
<img src="img/tinyurl2.png" style="width:500px;height:300px;">
     
## Capacitiy

### Traffic
- Assume 
    - 100:1 read:write ratio.
    - 100M read (redirect to original) and 1M write (shorten the URL) per month.
- URL shortening per second: 1M / (30 days * 24 hrs * 3600 seconds) = 0.386 URLs/s
- URL redirections per second: 100M / (30 days * 24 hrs * 3600 seconds) = 38.6k URLs/s 

### Storage
- Assume 
    - Store URL shortening request for 5 years.
    - Each object is around 1kB.
- Total number of objects to store: 1M * 5 years * 12 months = 60M
- Total storage: 60M * 1kB = 60B

### Bandwidth
- Write: 0.386 * 1kB = 0.386kB/s
- Read: 38.2 * 1kB = 38.6kB/s

### Memory
- Assume 
    - 20% of URLs generate 80% of traffic. (Hot URLs)
- Request per day: 38.6k URLs/s * 3600 seconds * 24 hrs = 3.33M
- To cache 20% of these requests: 0.2 * 3.33M * 1kB = 666MB
    - There will be duplicate requests of the same URLs, so the actual required memory will be less.
    
## API
- createURL(api_dev_key, original_url, expire_data)
    - api_dev_key: API developer key of registered account.
    - original_url: URL to be shortened.
    - expire_date: optional. If not specified, default to some value. 
    - Returns shorted URL.
- deleteURL(api_dev_key, short_url)
    - short_url: shortened URL.
    - Returns boolean.
- To prevent user abuse, limit "api_dev_key" to certain number of creations and redirections per time period.     
   
## DB
- Billions of records.
- Each object is small. (500k)
- Read heavy.
- Since there is no relationship between records, No SQL should be chosen. (Dynamo DB, Cassandra, etc)

### URL
- ShortURLHash (varchar 16, PK)
- OriginalURL (varchar)
- CreationDate (datetime)
- ExpirationDate (datetime)
- UserID (int) 

### User
- UserID (int)
- Name (varchar)
- Email (varchar)
- CreationDate (datetime)
- LastLogin (datetime)

## Data partitioning
- Consistent hashing

## Caching
- Use LRU eviction policy.

## Load balancing
- Between
    - Client and app servers.
    - App servers and cache servers.
    - App servers and DB servers.
- Round robin

## DB cleanup
- Lazy cleanup
    - When users access expired links, delete it from DB, and return error to users.
    - Put the key back to key DB.