# Youtube

## Requirement

### Functional

- Users can upload, share, view, like/dislike, search videos.
- Users can add, view comments.
- Users can view thumbnauls, number of views.

### Non-functional

- High availability. 
    - Consistency can take a hit. 
    - For example, subscribers need not get notifications for uploaded videos immediately.
- High reliability.
- High scalability.
- Performance
    - Users should not feel lag watching videos.

## Estimation

### Assume
- 1B active users.
- A single server can handle 10k requests per minute.
- Each video before encoding is 1GB and 10 minutes long. 
    - After encoding, size becomes 100MB.
    - 1 minute video takes 10MB storage. (Compressed)
    - 1 minute vidoe takes 100MB storage. (Uncompressed)
- 1000 videos are uploaded every minute.
- Ratio of upload:view is 1:100. 
    - Every uploaded video gets 100 views.
    
### Traffic
- Video uploads: 1000 per minute.
- Video views: 1000 * 100 = 100k per minute.

### Storage
- 10MB * 1000 = 100GB per minute.
    
### Bandwidth
- Incoming (upload): 100MB * 1000 = 1TB per minute. (Video is uncompressed during upload)
- Outgoing (stream): 10MB * 100k = 10TB per minute.

## Design

<img src="img/youtube1.png" style="width:800px;height:400px;"> 

- Processing queue: each uploading video shall enter this queue first. 
- Encoder: encode each uploaded vidoe into different formats.
- Thumbnails generator: generate thumbnaul for each video.
- Video and Thumbnail storage: store video and thumbnail.
- User Database: store user information.
- Video metadata storage: store video metadata.

<img src="img/youtube2.png" style="width:800px;height:500px;">

## API

- upload_video(user_id, video_file, title, description, tags, default_language)
    - Return 202 on success.
- search_video(user_id, query, user_location, length, max_videos_to_return)
    - Return JSON with the list of videos.
- stream_video(user_id, video_id, resolution)
    - Return stream.
- like_video(user_id, video_id)
- dislike_video(user_id, video_id)
- view_thunbnail(user_id, video_id)
- comment_video(user_id, video_id, text)

## DB
- Store video metadata in relational DB.

### Video
- id
- title
- description
- date
- uploader
- size
- Thumbnail
- likes
- dislikes
- views

### Comment
- id
- video_id
- user_id
- comment
- date
- likes
- dislikes

### User
- id
- name
- email
- password

## Discussion

### Where to store videos
- HDFS

### How to manage read traffic
- Segregate read traffic and write traffic.
- Distribute read traffic to different servers.
- For video metadata, write to primary and read from secondary.

### Where to store thumbnail
- Bigtable
    - Combines multiple files into one block to store.
    - Very efficient reading small amount of data.

### Metadata sharding
- Store based on UserID? It is hard to maintain uniform distribution because some users upload more than others.
- Store based on VideoID? Solves above problem. Can cache hot vidoes in front of DB servers.

### Deduplication
- When users upload vidoes, run video matching algorithm.

### Load-balancing
- Consistent hashing between cache servers.

### Caching
- Store hot vidoes in cache servers.
- Use LRU eviction rule.