No description, website, or topics provided.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.terraform/plugins/darwin_amd64
env
src
statistics
tests
.eslintrc.json
Dockerfile
Pipfile
Pipfile.lock
README.md
crawler.key
example.js
manageInstances.py
notes.md
package.json
prepare-alexa-top-million.js
seed.json
test.js
yarn.lock

README.md

Purpose

The goal of this project is to build a distributed, decentralized crawler to scrape 1 billion web pages using a couple hundred dollars of commodity AWS hardware. The trick is using AWS spot instances, which allow compute on-demand with Amazon's idle resources. When demand is low, these spot instances can cost anywhere from 10-60% of their original price.