DCP-Solutions/solutions/354 at main · peterwzhang/DCP-Solutions

History

Name		Name	Last commit message	Last commit date
parent directory ..
Madeline		Madeline
Notes		Notes
Peter		Peter
README.md		README.md

README.md

Good morning! Here's your coding interview problem for today.

This problem was asked by Google.

Design a system to crawl and copy all of Wikipedia using a distributed network of machines.

More specifically, suppose your server has access to a set of client machines. Your client machines can execute code you have written to access Wikipedia pages, download and parse their data, and write the results to a database.

Some questions you may want to consider as part of your solution are:

How will you reach as many pages as possible?
How can you keep track of pages that have already been visited?
How will you deal with your client machines being blacklisted?
How can you update your database when Wikipedia pages are added or updated?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

354

354

README.md

Files

354

Directory actions

More options

Directory actions

More options

Latest commit

History

354

Folders and files

parent directory

README.md