This editor is synced in real time with your peer.

Use it to share thoughts and resources, such as:

Features scope
API design
Pseudo code for specific components
Data model/schema
Back-of-the-envelope calculations
Reference links
Link to whiteboard or diagram such as https://sketchboard.me/new

Good luck!

40 minutes - each 5-10 mins feedback

Social Network (Instagram, Facebook, LinkedIn )

Functional Requirements:

A user should be able to search for another user
1. Every Users profile is searchable
2. Search with Filter (Out of scope)
  1. Search City
  2. Job Title
Search result will have
- Name of User
- User id
We want to the shortest number of connections between uers on the search result
The user should see the shortest path number only
Show Friends in common (out of scope)
Messaging, Timeline, User Pictures
Order of the search results People with lower distance first
No Fuzzy search. Only exact search. (later)

Users: 1B users

Number of search QPS: 2.5M QPMonth

Average user has 50 connections?

Non-Functional Requirments:

Scalability
Availability / Fault Tolerant
Performance / Latency ( 300MS search time P99 99% )

1B

Data Schema

Users -user_id (pk) -username_ -DOB ...

Connection -user_id (pk) (fk) -user_id (pk) (fk) -information

Data

NO back of envelope

500 Bytes user * 1B = 500GB

https://excalidraw.com/#room=d9e24dc848e8b746c386,VWqDwDO0J3bTtqTH4dHmRg

50* 1B users * 200 bytes = 1TB of storage

Connections:

1-> [2,4,5,6] 5-> [2,4] 6-> [2,4]

1 distance 2 distance 3 +

1000 connections max to limit

3+

"Barack Obama"

BFS Algorithm Start the user

add start user to the queue

distance = 0

Queue:

for i in size of the queue: pop from the queue

if you find the user:
    return the distance

add the connections to the queue

distance ++

if distance >3: return

Should I do back of enveloppe estimation?

50 connections

1st degree 50

2nd 50 * 50 = 2500

3rd dregree 50 * 50 * 50 = 125,000

OPtimizaion for Social Search:

1Caching

2Start with the friends with higher number of connections

3Parallel BFS

4Graph Storage would be faster

5Use the Location

Sharding SQL also possible

ES

Distributed Partitioned

Index Key-Value Storage

DOCUMENTS insert in elastic (document propert1 property2)

proerty1(name):"Billy Joe Obama"prperty2:locaiton("USA,Virginia") "Billy"

Key = Billy -> doc1, doc2

Joe -> doc1

Obamab -> doc1

Special optimization for fuzzy search

Tokenization is Customizable

Master slave

Index PK

Create your own index

20 mins on requirements:

Explain the Cache more

Like Template

Protocols

Communication Protocols

(HTTP / gRPC)

Split the Search service: User Finding Connection Finding

Why not SQL ?

Should I do back of enveloppe estimation?

Designt he schema later

Data Schema

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Files

readme.md

Latest commit

History

readme.md

File metadata and controls