# Twitter

## Functional
- Users can post new tweets.
- Users can follow other users.
- Users can mark tweets as favs.
- Display users timeline with top tweets.
- Tweets contain photos and videos.

## Non-functional
- High availability.
- 200 ms latency for timeline generation.
- Consistency can take a hit. (It's okay for users not to see Tweets for a while)

## Extended
- Search for Tweets.
- Reply to Tweets.
- Display trending topics.
- Tag other users.
- Tweets notification.
- Suggestion on who to follow.

## Capacity

Assume 
- 1B total users.
- 200M daily active users.
- 100M Tweets every day.
- Each user follows 200 people.
- Each user favs 5 tweeks per day.
- Each user visits their timeline 2 times per day.
- Each user visits 5 other people.
- Each user sees 20 Tweets when visiting a timeline.
- Each Tweet takes up 300 bytes. (Including the metadata) 
- Photo in every 5 Tweets with 200KB.
- Video in every 10 Tweets with 2MB.
 
Favorites per day
- 200M * 5 favs = 1B favs per day.

Number of Tweet views per day
- 200M * (2+5) * 20 = 28B per day.

Storage
- Tweet: 100M * 30 bytes = 30GB per day.
- Media: (100M/5) * 200KB + (100M/10) * 2MB = 24TB per day.

Bandwidth
- Ingress: 24TB / 86400 = 290MB per second.
- Egress: 28B * 300 bytes + (28B/5) * 200 KB + (28B/10) * 2MB = 35GB per second

## API
- tweet(api_dev_key, tweet_data, tweet_location, user_location, media_ids)
    - Returns URL to access the Tweet on success.

## Design
- Write per day: 100M
- Read per day: 28B

<img src="img/twitter1.png" style="width:500px;height:300px;">

Data sharding
- Construct Tweet ID such that
    - [epoch_timestamp]-[auto-incrementing-sequence]
- Assign shard based on the second part. (auto-incrementing-sequence)
- Reset auto-incrementing-sequence every second.
- Indexing Tweet ID makes querying the latest Tweets fast.
- Still need to query all the servers for timeline generation.
- Aggregate results from DBs and return it to users.

Caching
- App servers can check cache servers before going to DB.
    - LRU.
- Cache the latest data.
    - Hash table where key is UserID and value is doubly linked list containning all Tweets from the user in the past 3 days.
    - Always insert new Tweets at the head of doubly linked list.
    - Evict Tweets from the tail.
 
Replication
- Each DB has multiple replicas.
- Writes go to primary and reads go to secondary.

<img src="img/twitter2.png" style="width:800px;height:600px;">

## DB

Tweet
- TweetID (int, pk)
- UserID (int)
- Content (varchar)
- CreationDate (datetime)
- NumFavs (int)

User
- UserID (int, pk)
- Name (varchar)
- Email (varchar)
- DateOfBirth (datetime)
- CreationDate (datetime)
- LastLogin (datetime)

UserFollow
- UserID1 (int, pk)
- UserID2 (int, pk)

Favorite
- TweetID (int, pk)
- UserID (int, pk)
- CreationDate (datetime)