# Youtube

## Requirement

### Functional

- Users can upload, share, view, like/dislike, search videos.
- Users can add, view comments.
- Users can view thumbnauls, number of views.

### Non-functional

- High availability. 
    - Consistency can take a hit. 
    - For example, subscribers need not get notifications for uploaded videos immediately.
- High reliability.
- High scalability.
- Performance
    - Users should not feel lag watching videos.

## Estimation

### Traffic
- Assume
    - 1000 videos are uploaded every minute.
    - Ratio of upload:view is 1:100. 
        - Every uploaded video gets 100 views.
- Video uploads: 1000 per minute.
- Video views: 1000 * 100 = 100k per minute.

### Storage
- Assume
    - Each video before encoding is 1GB and 10 minutes long. 
        - After encoding, size becomes 100MB.
        - 1 minute video takes 10MB storage. (Compressed)
        - 1 minute vidoe takes 1GB storage. (Uncompressed)
- 10MB * 1000 = 100GB per minute.
    
### Bandwidth
- Incoming (upload): 1G * 1000 = 10TB per minute. (Video is uncompressed during upload)
- Outgoing (stream): 10MB * 100k = 10TB per minute.

### Server
- Assume
    - 1B active users and 100M daily active users.
    - A single server can handle 10k connections.
- 100M / 10k = 10,000 servers are needed.

## Design
- User uploads video.
- Server sends video to encoder.
- Server sends user data and metadata to DB.
- Encoder encodes video and store it in storage.

<img src="img/youtube1.png" style="width:800px;height:400px;"> 

## API

- upload_video(user_id, video_file, title, description, tags, default_language)
    - Return 202 on success.
- search_video(user_id, query, user_location, length, max_videos_to_return)
    - Return JSON with the list of videos.
- stream_video(user_id, video_id, resolution)
    - Return stream.
- like_video(user_id, video_id)
- dislike_video(user_id, video_id)
- view_thunbnail(user_id, video_id)
- comment_video(user_id, video_id, text)

## DB
- Store video metadata in relational DB.

### User
- id (int, pk)
- name (varchar)
- email (varchar)
- password (varchar)

### Video
- id (int, pk)
- title (varchar)
- description (varchar)
- date (datetime)
- uploader (int)
- size (longint)
- Thumbnail (varchar)
- likes (int)
- dislikes (int)
- views (int)

### Comment
- id (int, pk)
- video_id (int)
- user_id (int)
- comment (varchar)
- date (datetime)
- likes (int)
- dislikes (int)

## Design
- Load balancer: distribute user load.
- Web server: take user request, decouple client request from business logic, cache frequently accessed pages.
- App server: business logic.
- User DB: user related data, decoupled from metadata storage for easy scalability.
- Metadata DB: video related data.
- BigTable: thumbnail storage.
    - Combines multiple files into one block to store.
    - Very efficient for large number of small files with low retrieval latency.
- Encoder: compress vidoes and generate thumbnail.
- Blob storage: store vidoes.
- CDN: forward videos to closer proximity to serve users faster.

<img src="img/youtube2.png" style="width:800px;height:500px;">

### How to manage read traffic
- Segregate read traffic and write traffic.
- Distribute read traffic to different servers.
- For video metadata, write to primary and read from secondary.

### Metadata sharding
- Store based on UserID? It is hard to maintain uniform distribution because some users upload more than others.
- Store based on VideoID? Solves above problem. Can cache hot vidoes in front of DB servers.

### Deduplication
- When users upload vidoes, run video matching algorithm.

### Load-balancing
- Consistent hashing between cache servers.

### Caching
- Store hot vidoes in cache servers.
- Use LRU eviction rule.