# TinyURL System Design

Course from [TinyURL](https://www.educative.io/courses/grokking-the-system-design-interview/m2ygV4E81AR), providing short aliases redirecting to long URLs.

# Purpose

URL shortening is used to create shorter aliases for long URLs.

1. Save space
2. Avoid mistype

# Requirements and Goals

Before designing the system, the requirements should be clarified clearly.

## Functional Requirements

1. Given a URL, system should generate a shorter and unique alias of it. This link should be short enough to be easily copied and pasted into applications.
2. When users access a short link, our service should redirect them to the original link.
3. Users should optionally be able to pick a custom short link for their URL.
4. Links will expire after a standard default timespan, which users should be able to specify.

## Non-Functional Requirements

1. Highly available
2. Real-time response
3. Shortened links should not be guessable

## Extended Requirements

1. Analytics
2. Open REST APIs

# Capacity Estinmation and Constraints

The system should be read-heavy. There should be lots of redirection requests compared to new URL shortenings.

At first we can assume **the ratio is 100:1** between read and write.

## Traffic Estimates

Assume we have 500M new shortenings per month, we can expect 50B redirections during the same period:

$$
100 * 500M => 50B
$$

And Queries Per Second (QPS) can be estimated:

$$
500M / (30 days * 24 hours * 3600 seconds) \approx 200 URLs/s
$$

Based on the 100:1 read/write ratio, URLs redirections per second will be:

$$
100 * 200 URLs/s = 20K/s
$$

## Storage Estimates

Assume the lifetime of URL shortening request is 5 years. We expect to have 500M new URLs every month, and thus the total requests will be 30B:

$$
500M * 5 years * 12 months = 30B
$$

If each stored object is 500 bytes, we will need 15TB for total storage:

$$
30B * 500 bytes = 15TB
$$

## Bandwidth Estimates

Based on the QPS 200 URLs per second, we can estimate the bandwidth is 100KB per second:

- Write:

$$
200 * 500 bytes = 100 KB/s
$$

- Read:

$$
20K * 500 bytes \approx 10MB/s
$$

## Memory Estimates

Because some of the URLs are more popular than most of them, and we can use Pareto Rule (80/20) first to cache the 20% URLs.

Assume we have 20K requests per second, we expect to get 1.7B requests per day:

$$
20K * 3600 * 24 \approx 1.7B
$$

To cache 20% of the 1.7B requests, we will require 170GB to cache hot URLs:

$$
1.7B * 20\% * 500 bytes \approx 170GB
$$

In the real condition, there should be many duplicate URLs, and thus our actual memory usage might be less than 170GB.

## High-Level Estimates

To sum up all the estimations above:

| Types of URLs | Time Estimates |
| --- | --- |
| New URLs | 200/s |
| URL redirections | 20K/s |
| Incoming Data | 100KB/s |
| Outcoming Data | 10MB/s |
| Storage for 5 years |  15TB |
| Memory for cache | 170GB |

In [None]:
200 * 500

100000