# Youtube

## Requirement

### Functional

- Users can upload, share, view, like/dislike, search videos.
- Users can add, view comments.
- Users can view thumbnauls, number of views.

### Non-functional

- High availability. 
    - Consistency can take a hit. 
    - For example, subscribers need not get notifications for uploaded videos immediately.
- High reliability.
- High scalability.
- Performance
    - Users should not feel lag watching videos.

## Estimation

### Assume
- 2B total users and 1B daily active users.
- Each user watches 10 videos per day.
- Each video before encoding is 1GB and 10 minutes long. 
    - After encoding, size becomes 100MB.
    - 1 minute video takes ?MB storage.
    - 1 minute vidoe takes ?MB bandwidth.
- 10000 minutes worth of videos are uploaded every minute.
- Ratio of upload:view is 1:100. 
    - Every uploaded video gets 100 views.
    
### Traffic
- Video views: 1B * 10 = 10B videos per day.
- Video uploads per second: 10B / 100 = 100M per day.

### Storage
- 100MB * 1000 = 100GB per minute.
    
### Bandwidth
- Incoming (upload): 100GB per minute.
- Outgoing (stream): 100GB * 100 = 10TB per minute.

## Design

<img src="img/youtube1.png" style="width:800px;height:400px;"> 

- Processing queue: each uploading video shall enter this queue first. 
- Encoder: encode each uploaded vidoe into different formats.
- Thumbnails generator: generate thumbnaul for each video.
- Video and Thumbnail storage: store video and thumbnail.
- User Database: store user information.
- Video metadata storage: store video metadata.

<img src="img/youtube2.png" style="width:800px;height:500px;">

## API

- upload_video(api_dev_key, video_title, video_description, tags[], category_id, default_language, recording_details, video_contents)
    - Return 202 on success.
- search_video(api_dev_key, search_query, user_location, max_videos_to_return)
    - Return JSON with the list of videos.
- stream_video(api_dev_key, video_id, offset, codec, resolution)
    - Return stream.
- view_thunbnail()
- comment_video()
- like_video()
- dislike_video()

## DB
- Store video metadata in relational DB.

### Video
- VideoID
- Title
- Description
- Size
- Thumbnail
- Uploader
- Total number of like
- Total number of dislikes
- Total number of views

### VideoComment
- CommentID
- VideoID
- UserID
- Comment
- TimeOfCreation

### User
- UserID
- Name
- Email

## Discussion

### Where to store videos
- HDFS

### How to manage read traffic
- Segregate read traffic and write traffic.
- Distribute read traffic to different servers.
- For video metadata, write to primary and read from secondary.

### Where to store thumbnail
- Bigtable
    - Combines multiple files into one block to store.
    - Very efficient reading small amount of data.

### Metadata sharding
- Store based on UserID? It is hard to maintain uniform distribution because some users upload more than others.
- Store based on VideoID? Solves above problem. Can cache hot vidoes in front of DB servers.

### Deduplication
- When users upload vidoes, run video matching algorithm.

### Load-balancing
- Consistent hashing between cache servers.

### Caching
- Store hot vidoes in cache servers.
- Use LRU eviction rule.