This repository has been archived by the owner on Sep 23, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
EPU Management
License
nimbusproject/epumgmt
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
====================================== Preview Documentation ====================================== *WARNING* Use the preview documentation here instead: http://github.com/nimbusproject/epumgmt/wiki/Preview There is some information below that is relevant for advanced users. ====================================== Dependencies & Installation ====================================== A Linux/OSX environment is currently required. You may want to use a virtualenv: $ virtualenv --python=python2.6 --no-site-packages epumgmt-deps $ source epumgmt-deps/bin/activate Sample setup: $ easy_install fabric boto simplejson $ git clone http://github.com/nimbusproject/epumgmt.git $ cd epumgmt $ ./sbin/check-dependencies.sh ====================================== Preparations ====================================== Copy "share/epumgmt/environment.sample" somewhere outside the repository. Edit that, adding your credentials, and then always source it before using the program. For example, by adding something like this to your shell rc file: alias epumgmt='. ~/code/environment-epumgmt && cd ~/code/epumgmt' Copy "share/epumgmt/variables.json.sample" somewhere outside the repository. Unique scope for your launches: change the "cei_hello1" variable in that file to something unique. Also change the RabbitMQ broker IP address to an appropriate value. ====================================== Usage ====================================== $ ./bin/epumgmt.sh --action create --haservice provisioner --name run1 --jsonvars ~/myvars.json $ ./bin/epumgmt.sh --action create --haservice sleeper --name run1 --jsonvars ~/myvars.json (or: ./bin/epumgmt.sh -a create -s provisioner -n run1 -j ~/myvars.json ./bin/epumgmt.sh -a create -s sleeper -n run1 -j ~/myvars.json See -h or --help for shortcuts.) Note how each invocation gets the same run name. This will let you do coordinated things with the whole run. For example, the 'killrun' action (terminates all the involved instances via IaaS) and the 'logfetch' action (grabs all the logs from the involved instances that have not been terminated). Fetching + gathering: $ ./bin/epumgmt.sh -a logfetch -n run1 $ ./bin/epumgmt.sh -a update-events -n run1 The 'find-workers-once' action puts together the 'logfetch' and 'update-events' cmds, seeking out provisioner events recording worker launches. It adds new VMs to the run, all future run-based commands (like 'killrun', 'logfetch', and 'update-events') will include the worker VMs as well (unless they are scoped by --haservice or at least this scoping is the plan, not fully implemented). To fetch logs from the workers, this command needs to detect hostnames first, which might not happen when the worker is first detected. $ ./bin/epumgmt.sh -a find-workers-once -n run1 The 'find-workers' action will run that in a loop which is more for testing and development. The 'fetchkill' action is a convenience for experiments or administrator intervention: fetch logs from N workers and kill them. This causes 'find-workers-once' to happen first in order to be up to date when picking the workers to kill. $ ./bin/epumgmt.sh -a fetchkill -n run1 -k 2 The 'status' action will print out information about the VMs that epumgmt knows about. 'find-workers' will not be run first but IaaS status queries on each node will be. $ ./bin/epumgmt.sh -a status -n run1 ====================================== Sleeper service ====================================== The work messages for sleeper can be invoked by HTTP messages, this will be built into a python module eventually. SLEEPERHOST="address sleeper gets..." BATCHNAME="name of this batch of jobs" START_IDX="integer of first job id, rest are incremented" NUMJOBS="number of jobs to kick off" SLEEPSECS="length in seconds worker should sleep" wget http://$SLEEPERHOST:8000/$BATCHNAME/$START_IDX/$NUMJOBS/$SLEEPSECS So for example: wget http://$SLEEPERHOST:8000/run34/0/200/30 That launches 200 jobs that sleep for 30 seconds with batchid "run34" and job IDs 0,1,2,...,199 ====================================== Under active development ====================================== More functionality to come...