Skip to content
This repository has been archived by the owner on Sep 23, 2020. It is now read-only.

nimbusproject/epumgmt

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

======================================
Preview Documentation
======================================

*WARNING* Use the preview documentation here instead:

http://github.com/nimbusproject/epumgmt/wiki/Preview

There is some information below that is relevant for advanced users.


======================================
Dependencies & Installation
======================================

A Linux/OSX environment is currently required.

You may want to use a virtualenv:

$ virtualenv --python=python2.6 --no-site-packages epumgmt-deps
$ source epumgmt-deps/bin/activate

Sample setup:

$ easy_install fabric boto simplejson
$ git clone http://github.com/nimbusproject/epumgmt.git
$ cd epumgmt
$ ./sbin/check-dependencies.sh

======================================
Preparations
======================================

Copy "share/epumgmt/environment.sample" somewhere outside the repository.

Edit that, adding your credentials, and then always source it before using
the program.

For example, by adding something like this to your shell rc file:

    alias epumgmt='. ~/code/environment-epumgmt && cd ~/code/epumgmt'

Copy "share/epumgmt/variables.json.sample" somewhere outside the repository.
Unique scope for your launches: change the "cei_hello1" variable in that
file to something unique.

Also change the RabbitMQ broker IP address to an appropriate value. 

======================================
Usage
======================================


$ ./bin/epumgmt.sh --action create --haservice provisioner --name run1 --jsonvars ~/myvars.json
$ ./bin/epumgmt.sh --action create --haservice sleeper --name run1 --jsonvars ~/myvars.json


(or: ./bin/epumgmt.sh -a create -s provisioner -n run1 -j ~/myvars.json
     ./bin/epumgmt.sh -a create -s sleeper -n run1 -j ~/myvars.json

     See -h or --help for shortcuts.)

Note how each invocation gets the same run name.  This will let you do
coordinated things with the whole run.

For example, the 'killrun' action (terminates all the involved instances
via IaaS) and the 'logfetch' action (grabs all the logs from the involved
instances that have not been terminated).

Fetching + gathering:

$ ./bin/epumgmt.sh -a logfetch -n run1
$ ./bin/epumgmt.sh -a update-events -n run1

The 'find-workers-once' action puts together the 'logfetch' and 'update-events'
cmds, seeking out provisioner events recording worker launches.  It adds new
VMs to the run, all future run-based commands (like 'killrun', 'logfetch',
and 'update-events') will include the worker VMs as well (unless they
are scoped by --haservice or at least this scoping is the plan, not fully
implemented).  To fetch logs from the workers, this command needs to detect
hostnames first, which might not happen when the worker is first detected.

$ ./bin/epumgmt.sh -a find-workers-once -n run1

The 'find-workers' action will run that in a loop which is more for testing
and development.

The 'fetchkill' action is a convenience for experiments or administrator
intervention: fetch logs from N workers and kill them.  This causes
'find-workers-once' to happen first in order to be up to date when picking
the workers to kill.

$ ./bin/epumgmt.sh -a fetchkill -n run1 -k 2

The 'status' action will print out information about the VMs that epumgmt
knows about.  'find-workers' will not be run first but IaaS status queries
on each node will be.

$ ./bin/epumgmt.sh -a status -n run1


======================================
Sleeper service
======================================

The work messages for sleeper can be invoked by HTTP messages, this will be
built into a python module eventually.

SLEEPERHOST="address sleeper gets..."
BATCHNAME="name of this batch of jobs"
START_IDX="integer of first job id, rest are incremented"
NUMJOBS="number of jobs to kick off"
SLEEPSECS="length in seconds worker should sleep"

  wget http://$SLEEPERHOST:8000/$BATCHNAME/$START_IDX/$NUMJOBS/$SLEEPSECS
  
So for example:

  wget http://$SLEEPERHOST:8000/run34/0/200/30
  
That launches 200 jobs that sleep for 30 seconds with batchid "run34" and
job IDs 0,1,2,...,199


======================================
Under active development
======================================

More functionality to come...