Skip to content
Helper scripts to launch disposable Disco clusters for map/reduce tasks using EC2 spot instances
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
utils
.gitignore
README.md
create_cluster.py
create_config.py
disposabledisco.py
kill_cluster.py
master-init.sh
slave-init.sh

README.md

disposabledisco

Helper scripts to launch disposable Disco clusters for map/reduce tasks using EC2 spot instances

Project Scope

  1. Ability to launch/reconfigure Disco clusters on the whim.
  2. Use cheaper AWS spot instances.
  3. Don't care about permanent storage in DDFS. Instead we use S3 or something else.
  4. Being able to (re)configure number of slaves and instance types.
  5. Using only Official Ubuntu Cloud Guest Amazon Machine Images (AMIs) - http://cloud.ubuntu.com/ami/
  6. Probably will use Instance store, since Map/Reduce is supposed to be more CPU bound than IO bound.

Dependencies

  1. boto
  2. paramiko

Usage

  1. Create config file with some defaults
   $ python create_config.py > config.json
  1. Edit config file based on Appendix below

  2. Run this script many times to take care of pending tasks.

   $ python create_cluster.py config.json

Appendix

NameRequired?Note
BASE_PACKAGESYesDo not override.
ADDITIONAL_PACKAGESYesAdditional packages to be installed using apt-get
PIP_REQUIREMENTSYesPackages installed via pip. run after all apt-get tasks
AWS_ACCESSYesREQUIRED. Your AWS access key
AWS_SECRETYesREQUIRED. Your AWS secret key
AMIYesWhat AMI to use. Hint: http://cloud.ubuntu.com/ami/
MAX_BIDYesHow much to bid for each instance.
INSTANCE_TYPEYesWhat instance type to use?
MASTER_INSTANCE_TYPENoDefaults to INSTANCE_TYPE
SLAVE_INSTANCE_TYPENoDefaults to INSTANCE_TYPE
KEY_NAMEYesThe key name in EC2 console. To be able to login as ubuntu user for debugging
SECURITY_GROUPSYesAt-least 1 security group needed. Settings example next section.
MGMT_KEYYesPublic key(s) that is added on instances. Set this as your public keys. Multiple seperated by newline.
TAG_KEYYeskey used for tagging. If you are making multiple clusters, then make these ubique
NUM_SLAVESYesNumber of slaves
MASTER_MULTIPLIERNoRatio of workers compared to cores in master
SLAVE_MULTIPLIERNoRatio of workers compared to cores in each slave
POST_INITNoBash scripts, runs as root after rest of the initialization

Security Group

  1. Allow all udp, tcp traffic from within the group on all ports.
  2. Allow ssh from 0.0.0.0/0 ... or atleast from your workstation.

TODO

  1. Ability to select region. Currently it is hardcoded to default (us-east)
  2. Specify different instance types with counts for slaves.
  3. Ability to stop unresponsive slaves.
  4. Ability to scale down cluster based on config changes. It can already scale up.
  5. Make pip installable, so that we can install it globally, and keep config file inside relavent project directories.
You can’t perform that action at this time.